Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mixed read lengths in TopHat input file

    Hi everyone,

    After quite a bit of searching the threads on this forum, I'm still not exactly sure how TopHat deals with an input fastq file that contains reads of different lengths (e.g. 51bp and 76bp). I assume that equal read lengths throughout the file would be ideal, but I have four 76bp lanes and one 51bp lane from the same library and would like to use all the data if possible. Any advice from the TopHat/Bowtie power users out there?

    Thanks in advance,

    Shurjo

  • #2
    This is a shameless plug. But as far as I know TopHat does not yet support that.
    SpliceMap is able to deal with such kind of reads natively.
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      I am not sure because previously Tophat manual explicitly mentioned "reads have to be in equal length". However, such sentence is now nowhere be found in the web.

      On the other hand, if such constraint is removed, then it should be a big improvement.

      Yet, there is no such announcement in tophat change log.

      I am also confused.

      Originally posted by john_mu View Post
      This is a shameless plug. But as far as I know TopHat does not yet support that.
      SpliceMap is able to deal with such kind of reads natively.
      Last edited by marcowanger; 10-21-2010, 12:23 AM. Reason: typo
      Marco

      Comment


      • #4
        Originally posted by john_mu View Post
        This is a shameless plug. But as far as I know TopHat does not yet support that.
        SpliceMap is able to deal with such kind of reads natively.
        The new tophat 1.1 showed

        min read length: xxbp, max read length: xxbp
        during run.

        So I think the reads do not need to be in equal length??
        Marco

        Comment


        • #5
          Originally posted by marcowanger View Post
          The new tophat 1.1 showed



          during run.

          So I think the reads do not need to be in equal length??
          Ah I see, sorry I had not played with the new version much yet. You are probably right.
          SpliceMap: De novo detection of splice junctions from RNA-seq
          Download SpliceMap Comment here

          Comment


          • #6
            As you seems to be affiliated to splicemap.

            May I ask you one question?

            I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

            What is the difference between these 2 files??

            Originally posted by john_mu View Post
            Ah I see, sorry I had not played with the new version much yet. You are probably right.
            Marco

            Comment


            • #7
              i ran tophat 1.1.1 with varying lengths between 20 to 100.
              As far as I can tell, it is working just fine.Even though, i have not compared it systematically, it looks much more powerful then dividing into lengths and running it seperately for each length. This is probably due to the powerful splice mappability with more reads...

              cheers

              Comment


              • #8
                Thanks for the input, everyone. I was at a talk by Steve Salzberg earlier this morning where he specifically mentioned that the latest TopHat can handle mixed read lengths in the input file, so I guess that answers my question.

                Comment


                • #9
                  Originally posted by marcowanger View Post
                  As you seems to be affiliated to splicemap.

                  May I ask you one question?

                  I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

                  What is the difference between these 2 files??
                  Hi marcowanger,

                  The cufflinks compatible file doesn't include the clipped part of the alignments. Since some alignments might be not be able to find the other end of the split read, we still keep the partial alignment.

                  Also, if you are interested in trying SpliceMap I suggest you wait until after this weekend. There was a small bug I just found regarding counting the number of multiply mapped reads.

                  John Mu
                  SpliceMap: De novo detection of splice junctions from RNA-seq
                  Download SpliceMap Comment here

                  Comment


                  • #10
                    TopHat does support variable read lengths

                    The previous is poster is correct, I announced this feature at a recent talk. TopHat now supports variable read lengths, meaning you can mix multiple Illumina (or SOLiD) runs that use different lengths and run TopHat just once on them. Make sure you get the latest release, version 1.1.2 (or newer).

                    ALSO: this release of TopHat adds support for strand-specific RNA-Seq alignment for reads produced by a number of strand-specific protocols. Please see the manual for details.

                    Comment


                    • #11
                      Originally posted by marcowanger View Post
                      The new tophat 1.1 showed



                      during run.

                      So I think the reads do not need to be in equal length??
                      This was always true - but TopHat handled all reads (of varying lengths) with the same algorithm. Now it dynamically adjusts the mapping strategy based on read length - longer reads are broken up into more pieces that are mapped separately.

                      Comment


                      • #12
                        Does anyone know if Tophat is therefore ignoring reads less than the min read length? For instance, if it sets min read length at 20 bp and max read length at 26 bp, would it ignore mapping of a read that is 18 bp?

                        Comment


                        • #13
                          Nevermind. i see from my output that it is mapping reads under the min read length.

                          Comment


                          • #14
                            If you have mixed read lengths (e.g. due to adaptor trimming) how then do you set the --mate-inner-dist (it would have been better to ask for the expected insert size rather than the inner distance)
                            Last edited by telos; 11-01-2012, 06:15 AM.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Recent Innovations in Spatial Biology
                              by seqadmin


                              Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                              3D Genomics
                              While spatial biology often involves studying proteins and RNAs in their...
                              01-01-2025, 07:30 PM
                            • seqadmin
                              Advancing Precision Medicine for Rare Diseases in Children
                              by seqadmin




                              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                              12-16-2024, 07:57 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 01-09-2025, 04:04 PM
                            0 responses
                            432 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 01-09-2025, 09:42 AM
                            0 responses
                            441 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 01-08-2025, 03:17 PM
                            0 responses
                            454 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 01-03-2025, 11:18 AM
                            1 response
                            50 views
                            1 like
                            Last Post Tonia
                            by Tonia
                             
                            Working...
                            X