Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat2 run on individual or multiple samples

    Hello,

    I am trying to decide what the best approach for running TopHat2 is. We have multiple RNA-Seq samples for two different conditions. Is it preferable to run TopHat2 for each individual sample, for all samples in each condition (2 runs only) or for all samples together? Does it make any difference in terms of the results? Running the software for each sample would be much faster, and seems preferable, unless there is a rationale for pulling the samples together (e.g. some of the junctions might be missed when individual samples are used).

    Thank you!
    Alexandra

  • #2
    I usually do this on samples where the sequencing depth is relatively less. If the depth of sequencing is around 30-40 million reads per sample, I don't see much of a difference between individual mappings and merged mappings. I obtained a bit more reads mapped when I pooled samples from condition1 together and condition2 together and mapped these two separately, when I had over 6-9 million reads per sample.

    Comment


    • #3
      Thank you, cedance! The number of sequences for each of our samples is at least 25 million, so I will start mapping each sample individually as a first step. I might go back to mapping all samples per condition, just to see what the differences are.

      Also, in your analyses, have you tried using the junction files that you get out of tophat for the mapping process? More specifically, have you ever merged these files and rerun tophat, while supplying the obtained set of merged junctions, to see how different the results might be? I would be very interested in learning more about how much of an improvement this additional information might bring.

      Alexandra

      Comment


      • #4
        Sure, no problem. 25 million reads is plenty. I don't expect much of a difference.

        I have not done it by supplying junctions from the merged run. Certainly that seems to be a nice idea.

        What I have done is just to supply the junctions from the paired end run to the single end files. This is because, when I preprocess a paired end library by clipping for adapters, trimming for quality, there are some reads that get removed on one of the paired end fastq. So, I save these reads as single-end and re-run with junctions obtained from paired end run.

        Tophat2 also seems much better than the older versions in my opinion. I have been using tophat from 1.2.0 with bowtie 1. Bowtie 2 seems also much better. The number of reads mapped are much better, even though some errors that are particular to RNA-seq still are there. But for the same samples, tophat2 results in more reads and better mapping in my opinion. Bowtie2 internal algorithm to use seed and extend approach (due to relatively longer reads available now) seems nice and fast.

        Comment


        • #5
          For differential expression analysis, if the multiple samples in each condition are biological replicates, you might want to keep them separate and do the analysis.

          Comment


          • #6
            Yes, the idea is to retrieve back the replicates after mapping by merging them.

            Comment


            • #7
              Yes, the idea is to retrieve back the replicates after mapping by merging them.
              Hi cedance, did you mean splitting (instead of merging) the biological replicates in your post? Is there a standard way of doing this, or did you have to write your own script to get the replicates out of tophat's accepted_hits file?

              Comment


              • #8
                I meant, first merge all files, then map them. Then split them back.
                Yes, I just changed the header of each read of FASTQ (the first line of every read) to an ID that I can recognize which sample/replicate it came from. For example: From @SOLEXA1_0420:7:1:1024:4338#0/1 to @01_S1_Rep1_0420:7:1:1024:4338#0/1.
                Of course this is not sticking to the standard of FASTQ format strictly, but it served my purpose.
                Last edited by cedance; 05-17-2012, 01:26 PM.

                Comment


                • #9
                  This was very useful! Thank you again.

                  Comment


                  • #10
                    For differential expression I generally run everything separately and combine bam files afterwards accordingly.

                    This gives me more flexibility as I can analyze individual lanes as well as the combined data.

                    Comment


                    • #11
                      chadn737, if you have biological replicates, its better to keep them rather than merging and then finding genes that are differentially expressed.

                      Comment


                      • #12
                        Originally posted by cedance View Post
                        chadn737, if you have biological replicates, its better to keep them rather than merging and then finding genes that are differentially expressed.
                        I'm not sure what you are getting at. I never said I merge my bio reps....these are always kept separate. But generally we multiplex all of our samples and spread this across multiple lanes. So when I say I run separately and merge later, I mean I align the individual fastq files from individual lanes and samples and then merge the individual bam files for each bio rep.

                        Comment


                        • #13
                          It wasn't clear what you were combining after, from your earlier post. Now it is.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-27-2024, 06:37 PM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-27-2024, 06:07 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X