Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shurjo
    Senior Member
    • Jan 2009
    • 132

    Mixed read lengths in TopHat input file

    Hi everyone,

    After quite a bit of searching the threads on this forum, I'm still not exactly sure how TopHat deals with an input fastq file that contains reads of different lengths (e.g. 51bp and 76bp). I assume that equal read lengths throughout the file would be ideal, but I have four 76bp lanes and one 51bp lane from the same library and would like to use all the data if possible. Any advice from the TopHat/Bowtie power users out there?

    Thanks in advance,

    Shurjo
  • john_mu
    Member
    • May 2010
    • 88

    #2
    This is a shameless plug. But as far as I know TopHat does not yet support that.
    SpliceMap is able to deal with such kind of reads natively.
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment

    • marcowanger
      Senior Member
      • Dec 2008
      • 273

      #3
      I am not sure because previously Tophat manual explicitly mentioned "reads have to be in equal length". However, such sentence is now nowhere be found in the web.

      On the other hand, if such constraint is removed, then it should be a big improvement.

      Yet, there is no such announcement in tophat change log.

      I am also confused.

      Originally posted by john_mu View Post
      This is a shameless plug. But as far as I know TopHat does not yet support that.
      SpliceMap is able to deal with such kind of reads natively.
      Last edited by marcowanger; 10-21-2010, 12:23 AM. Reason: typo
      Marco

      Comment

      • marcowanger
        Senior Member
        • Dec 2008
        • 273

        #4
        Originally posted by john_mu View Post
        This is a shameless plug. But as far as I know TopHat does not yet support that.
        SpliceMap is able to deal with such kind of reads natively.
        The new tophat 1.1 showed

        min read length: xxbp, max read length: xxbp
        during run.

        So I think the reads do not need to be in equal length??
        Marco

        Comment

        • john_mu
          Member
          • May 2010
          • 88

          #5
          Originally posted by marcowanger View Post
          The new tophat 1.1 showed



          during run.

          So I think the reads do not need to be in equal length??
          Ah I see, sorry I had not played with the new version much yet. You are probably right.
          SpliceMap: De novo detection of splice junctions from RNA-seq
          Download SpliceMap Comment here

          Comment

          • marcowanger
            Senior Member
            • Dec 2008
            • 273

            #6
            As you seems to be affiliated to splicemap.

            May I ask you one question?

            I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

            What is the difference between these 2 files??

            Originally posted by john_mu View Post
            Ah I see, sorry I had not played with the new version much yet. You are probably right.
            Marco

            Comment

            • ersenkavak
              Junior Member
              • Feb 2010
              • 2

              #7
              i ran tophat 1.1.1 with varying lengths between 20 to 100.
              As far as I can tell, it is working just fine.Even though, i have not compared it systematically, it looks much more powerful then dividing into lengths and running it seperately for each length. This is probably due to the powerful splice mappability with more reads...

              cheers

              Comment

              • shurjo
                Senior Member
                • Jan 2009
                • 132

                #8
                Thanks for the input, everyone. I was at a talk by Steve Salzberg earlier this morning where he specifically mentioned that the latest TopHat can handle mixed read lengths in the input file, so I guess that answers my question.

                Comment

                • john_mu
                  Member
                  • May 2010
                  • 88

                  #9
                  Originally posted by marcowanger View Post
                  As you seems to be affiliated to splicemap.

                  May I ask you one question?

                  I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

                  What is the difference between these 2 files??
                  Hi marcowanger,

                  The cufflinks compatible file doesn't include the clipped part of the alignments. Since some alignments might be not be able to find the other end of the split read, we still keep the partial alignment.

                  Also, if you are interested in trying SpliceMap I suggest you wait until after this weekend. There was a small bug I just found regarding counting the number of multiply mapped reads.

                  John Mu
                  SpliceMap: De novo detection of splice junctions from RNA-seq
                  Download SpliceMap Comment here

                  Comment

                  • Steven Salzberg
                    Junior Member
                    • Aug 2009
                    • 3

                    #10
                    TopHat does support variable read lengths

                    The previous is poster is correct, I announced this feature at a recent talk. TopHat now supports variable read lengths, meaning you can mix multiple Illumina (or SOLiD) runs that use different lengths and run TopHat just once on them. Make sure you get the latest release, version 1.1.2 (or newer).

                    ALSO: this release of TopHat adds support for strand-specific RNA-Seq alignment for reads produced by a number of strand-specific protocols. Please see the manual for details.

                    Comment

                    • Steven Salzberg
                      Junior Member
                      • Aug 2009
                      • 3

                      #11
                      Originally posted by marcowanger View Post
                      The new tophat 1.1 showed



                      during run.

                      So I think the reads do not need to be in equal length??
                      This was always true - but TopHat handled all reads (of varying lengths) with the same algorithm. Now it dynamically adjusts the mapping strategy based on read length - longer reads are broken up into more pieces that are mapped separately.

                      Comment

                      • jkozubek
                        Member
                        • Mar 2011
                        • 18

                        #12
                        Does anyone know if Tophat is therefore ignoring reads less than the min read length? For instance, if it sets min read length at 20 bp and max read length at 26 bp, would it ignore mapping of a read that is 18 bp?

                        Comment

                        • jkozubek
                          Member
                          • Mar 2011
                          • 18

                          #13
                          Nevermind. i see from my output that it is mapping reads under the min read length.

                          Comment

                          • telos
                            Member
                            • Jan 2010
                            • 11

                            #14
                            If you have mixed read lengths (e.g. due to adaptor trimming) how then do you set the --mate-inner-dist (it would have been better to ask for the expected insert size rather than the inner distance)
                            Last edited by telos; 11-01-2012, 06:15 AM.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Pathogen Surveillance with Advanced Genomic Tools
                              by seqadmin




                              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                              03-24-2025, 11:48 AM
                            • seqadmin
                              New Genomics Tools and Methods Shared at AGBT 2025
                              by seqadmin


                              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                              The Headliner
                              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                              03-03-2025, 01:39 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-20-2025, 05:03 AM
                            0 responses
                            49 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-19-2025, 07:27 AM
                            0 responses
                            57 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-18-2025, 12:50 PM
                            0 responses
                            50 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-03-2025, 01:15 PM
                            0 responses
                            200 views
                            0 reactions
                            Last Post seqadmin  
                            Working...