Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How long to wait for Tophat2 to finish?

    Hi all,

    I'm trying to run tophat2 on a three paired-end datasets against hg19, but it has taken three days so far! How long do typical runs take for other people??

    I've run it with 8 threads on an 8-core machine with 32Gb of RAM and the bowtie2 alignment took about a day to run using all eight cores, but since then it is stuck using 100% of only one core at this step:
    [2012-05-19 11:45:52] Searching for junctions via segment mapping
    There are about 136M read pairs in total.

    Oh, and the parameters used are:
    -p 8 --coverage-search --microexon-search --b2-very-sensitive --mate-inner-dist 200

  • #2
    The last time I ran tophat2 I noticed similar things. It seems that the "Searching for junctions" step is single threaded. I think it took longer (probably about twice as long) to go through the "Searching for junctions..." step than it did to do the earlier alignment steps. You might let it go for another day or so and hopefully it'll complete by then. I agree that it's disconcerting, particularly because you don't get any feedback that it's not just stuck on something.

    Comment


    • #3
      Thanks for the reply.

      You're right it is very disconcerting. There's nothing in the output or log files or even in the tmp/ directory to indicate anything is happening. The only indicator is the cpu running under 100% load for a single thread.

      I'll just leave it running and see when it ends.

      Comment


      • #4
        With that many reads, it will likely take a long time. I have 132M reads mapping right now and it's been going for four days and will likely take another day.

        You could try STAR (http://gingeraslab.cshl.edu/STAR/).
        That same data set took only 1.5 days to map, and the results will likely be better than Tophat anyway if my other data sets are any indication.

        Comment


        • #5
          Originally posted by chris View Post
          Hi all,

          I'm trying to run tophat2 on a three paired-end datasets against hg19, but it has taken three days so far! How long do typical runs take for other people??

          I've run it with 8 threads on an 8-core machine with 32Gb of RAM and the bowtie2 alignment took about a day to run using all eight cores, but since then it is stuck using 100% of only one core at this step:


          There are about 136M read pairs in total.

          Oh, and the parameters used are:
          I managed to get 2 samples through at 100m reads & 130m reads in 2.5 days with 8 threads.

          Comment


          • #6
            Originally posted by pbluescript View Post
            With that many reads, it will likely take a long time. I have 132M reads mapping right now and it's been going for four days and will likely take another day.
            It really isn't that many reads. It's equivalent to just one lane of HiSeq.

            Originally posted by pbluescript View Post
            You could try STAR (http://gingeraslab.cshl.edu/STAR/).
            That same data set took only 1.5 days to map, and the results will likely be better than Tophat anyway if my other data sets are any indication.
            Thanks. I'll give it a try.

            Originally posted by Bukowski
            I managed to get 2 samples through at 100m reads & 130m reads in 2.5 days with 8 threads.
            Was that paired-end data? What parameters were you using?
            Last edited by chris; 05-21-2012, 05:41 AM.

            Comment


            • #7
              Just noticed that the file tophat_out/logs/segment_juncs.log is being updated every few hours. 17 chromosomes done so far, only ~60 to go...

              Comment


              • #8
                Originally posted by chris View Post
                Just noticed that the file tophat_out/logs/segment_juncs.log is being updated every few hours. 17 chromosomes done so far, only ~60 to go...
                On the bright side, at least you know it's still running

                Perhaps a vacation (or a bender) is in order.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  On the bright side, at least you know it's still running

                  Perhaps a vacation (or a bender) is in order.


                  If only...

                  Comment


                  • #10
                    As an update the job finished in 9 days - with an 'out of memory' error

                    I've started a new thread regarding the possibility of restarting the job a the point of failure:
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                    I did try running STAR on a third of the data and it finished in 16 minutes tophat2 is still going after 24hrs on the same data...

                    I'll compare what differences (if any) there are between the two programs, but so far the STAR output looks decent and is giving useful results.

                    Comment


                    • #11
                      Looking forward to your assessment of the differences, chris.

                      Does anyone know if STAR outputs genes resulting from fusions? The main website says it will do "'Ab initio' splice junctions --un-annotated, non-canonical, distal exons, chimeric ..." But I don't see in the manual what output would given me results similar to tophat-fusion.

                      Comment


                      • #12
                        Originally posted by NKAkers View Post
                        Looking forward to your assessment of the differences, chris.

                        Does anyone know if STAR outputs genes resulting from fusions? The main website says it will do "'Ab initio' splice junctions --un-annotated, non-canonical, distal exons, chimeric ..." But I don't see in the manual what output would given me results similar to tophat-fusion.
                        same question

                        Comment


                        • #13
                          The tophat run finished after 64hrs and a quick look in IGV seems to show very similar results to STAR.

                          The most obvious difference seems to be that STAR reports pairs even when the pair distance is several kb. Other than that, the alignments look very similar.

                          Chimeric transcripts are not my focus so can't comment, but I'd imagine that these 'spurious' large insert pairs could make looking for fusions difficult.

                          Comment


                          • #14
                            On average, my STAR mappings finish in about 20% of the time of the Tophat2 mappings. STAR mapping yield more alignments, more unique alignments, and more alignments with both pairs mapped. The only issue is a larger number of split reads that seem to be spurious splicing junctions. But, if I filter out the multi-mapped reads and focus on junctions with several reads supporting them, the junk is reduced quite a bit. And the results are still better than Tophat2.

                            I have never seen anything that looks like interchromosomal fusions with STAR, but there are potential intrachromosomal fusions. My research doesn't focus on fusions at all, so I haven't followed up on any of those.

                            Comment


                            • #15
                              update on STAR

                              Dear All,

                              I wanted to give a brief update on STAR and answer some questions posted above.
                              I am gearing up for a formal public release in 1-2 weeks, in the mean time you can find the latest version here:
                              ftp://ftp2.cshl.edu/gingeraslab/trac...release/2.1.2/
                              I am also happy to answer questions and help with problems at [email protected]

                              Chimeric alignments: at the moment STAR output chimeric (fusion) alignments as a separate .sam file. Each chimeric read alignment takes 2 or more .sam lines. I am planning to make a more friendly output of chimeric alignments and junctions. As far as I know there is no standard format for chimeric alignments, and I am open to user suggestions.

                              Long intron "spurious" alignments: as @pbluescript wrote, it's all about filtering. If you are interested in high confidence junctions only, I would recommend keeping only canonical GT/AG introns, unique mappers, and supported by 2 or more reads junctions. Annotated junctions can also be considered trustworthy. Another useful filtering option is to remove junctions which are supported by a very short overhang, STAR is using this filtering to create the collapsed junction file. If your species does not have long introns, you can make STAR (newest version) filter them out with --alignIntronMax and --alignMatesGapMax options.

                              I have made a quick calculation on one of my standard samples, 40M of 2x76 K562 (human cell line) poly-A+ sample. STAR detects 489 annotated and 2,605 unannotated junctions with introns longer than 100kb. The uannotated junctions are supported by 6,291 spliced reads, which is just ~0.06% of all spliced reads (10.5M). 1,984 of these junctions are supported by just one read - those junctions can be filtered out.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X