Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need verification of my Differential Expression pipeline

    This is all gonna be in pseudo code/explanations, but can someone verify my pipeline, or let me know if there's a better method to doing something?

    The project has 10 samples (5 male: 1 control, 2 experiments with a replicate each, same w/ female) of an organism w/o a reference genome. We are using de novo assembly to assemble all 10 samples. (Trinity/Oases/Bridger, etc)

    After we assemble the samples, we want to create a reference so we can use it for Differential Expression. We will merge the 10 assembles together, run CD-HIT-EST to remove redundancy, and then proceed to annotate the fasta. We plan on using blastx, and save the output to an xml. Import the xml into Blast2GO, remove all the non-annotated transcripts, and export the annotated fasta. This fasta will use as our reference for mapping.

    We take the above annotated fasta, and map it back to the raw reads using bowtie2 or BWA, generate SAMs. Then use samtools to sorted BAMs.

    Convert our annotated reference to gff3, and use HTseq-count to evaluate counts. Then run DESeq to get our DE genes.

    Does this sound like a good plan?

    We're currently at the "reference transcript" stage, and we will be submitting the reference to our local blast cluster in the next few days. I just want to verify that what I'm thinking is correct, or if there's something else I should be doing.

    Thank you!

  • #2
    Are you assembling the genome or transcriptome?

    Comment


    • #3
      It's illumina RNA-seq data, so it'll be a transcriptome.

      My organism has no reference genome at all, so we're doing denovo assembly via Bridger.
      Last edited by Blaze9; 06-30-2015, 10:20 PM.

      Comment


      • #4
        I'd evaluate how many non-annotated transcripts get removed after Blast2GO.

        Comment


        • #5
          I suggest merging all of the reads prior to assembling, so you just get one assembly. That should give a better and less-redundant assembly compared to assembling 10x and deduplicating the results.

          Comment


          • #6
            Would it be better to assemble male/female data separately?

            Comment


            • #7
              I wouldn't... as long as they're all the same organism, it's easiest to assemble them all together both in terms of recovering low-expression genes and avoiding redundancy, which really is hard to remove without loss of real information.

              The ideal method of assembly (combining first or not combining first) may vary, though, depending on the ploidy and SNP rate. High-SNP-rate haploids, for example, might be better assembled individually.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X