Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mixed read lengths in TopHat input file

    Hi everyone,

    After quite a bit of searching the threads on this forum, I'm still not exactly sure how TopHat deals with an input fastq file that contains reads of different lengths (e.g. 51bp and 76bp). I assume that equal read lengths throughout the file would be ideal, but I have four 76bp lanes and one 51bp lane from the same library and would like to use all the data if possible. Any advice from the TopHat/Bowtie power users out there?

    Thanks in advance,

    Shurjo

  • #2
    This is a shameless plug. But as far as I know TopHat does not yet support that.
    SpliceMap is able to deal with such kind of reads natively.
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      I am not sure because previously Tophat manual explicitly mentioned "reads have to be in equal length". However, such sentence is now nowhere be found in the web.

      On the other hand, if such constraint is removed, then it should be a big improvement.

      Yet, there is no such announcement in tophat change log.

      I am also confused.

      Originally posted by john_mu View Post
      This is a shameless plug. But as far as I know TopHat does not yet support that.
      SpliceMap is able to deal with such kind of reads natively.
      Last edited by marcowanger; 10-21-2010, 12:23 AM. Reason: typo
      Marco

      Comment


      • #4
        Originally posted by john_mu View Post
        This is a shameless plug. But as far as I know TopHat does not yet support that.
        SpliceMap is able to deal with such kind of reads natively.
        The new tophat 1.1 showed

        min read length: xxbp, max read length: xxbp
        during run.

        So I think the reads do not need to be in equal length??
        Marco

        Comment


        • #5
          Originally posted by marcowanger View Post
          The new tophat 1.1 showed



          during run.

          So I think the reads do not need to be in equal length??
          Ah I see, sorry I had not played with the new version much yet. You are probably right.
          SpliceMap: De novo detection of splice junctions from RNA-seq
          Download SpliceMap Comment here

          Comment


          • #6
            As you seems to be affiliated to splicemap.

            May I ask you one question?

            I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

            What is the difference between these 2 files??

            Originally posted by john_mu View Post
            Ah I see, sorry I had not played with the new version much yet. You are probably right.
            Marco

            Comment


            • #7
              i ran tophat 1.1.1 with varying lengths between 20 to 100.
              As far as I can tell, it is working just fine.Even though, i have not compared it systematically, it looks much more powerful then dividing into lengths and running it seperately for each length. This is probably due to the powerful splice mappability with more reads...

              cheers

              Comment


              • #8
                Thanks for the input, everyone. I was at a talk by Steve Salzberg earlier this morning where he specifically mentioned that the latest TopHat can handle mixed read lengths in the input file, so I guess that answers my question.

                Comment


                • #9
                  Originally posted by marcowanger View Post
                  As you seems to be affiliated to splicemap.

                  May I ask you one question?

                  I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

                  What is the difference between these 2 files??
                  Hi marcowanger,

                  The cufflinks compatible file doesn't include the clipped part of the alignments. Since some alignments might be not be able to find the other end of the split read, we still keep the partial alignment.

                  Also, if you are interested in trying SpliceMap I suggest you wait until after this weekend. There was a small bug I just found regarding counting the number of multiply mapped reads.

                  John Mu
                  SpliceMap: De novo detection of splice junctions from RNA-seq
                  Download SpliceMap Comment here

                  Comment


                  • #10
                    TopHat does support variable read lengths

                    The previous is poster is correct, I announced this feature at a recent talk. TopHat now supports variable read lengths, meaning you can mix multiple Illumina (or SOLiD) runs that use different lengths and run TopHat just once on them. Make sure you get the latest release, version 1.1.2 (or newer).

                    ALSO: this release of TopHat adds support for strand-specific RNA-Seq alignment for reads produced by a number of strand-specific protocols. Please see the manual for details.

                    Comment


                    • #11
                      Originally posted by marcowanger View Post
                      The new tophat 1.1 showed



                      during run.

                      So I think the reads do not need to be in equal length??
                      This was always true - but TopHat handled all reads (of varying lengths) with the same algorithm. Now it dynamically adjusts the mapping strategy based on read length - longer reads are broken up into more pieces that are mapped separately.

                      Comment


                      • #12
                        Does anyone know if Tophat is therefore ignoring reads less than the min read length? For instance, if it sets min read length at 20 bp and max read length at 26 bp, would it ignore mapping of a read that is 18 bp?

                        Comment


                        • #13
                          Nevermind. i see from my output that it is mapping reads under the min read length.

                          Comment


                          • #14
                            If you have mixed read lengths (e.g. due to adaptor trimming) how then do you set the --mate-inner-dist (it would have been better to ask for the expected insert size rather than the inner distance)
                            Last edited by telos; 11-01-2012, 06:15 AM.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-27-2024, 06:37 PM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-27-2024, 06:07 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            69 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X