Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • amitm
    Member
    • Feb 2011
    • 52

    bowtie2 vs. TopHat

    hello everyone,
    I have been using Bowtie2 for Illumina 100x2 RNA-Seq datasets. I understand TopHat was built as Bowtie (older version) couldn't do gapped alignment. Now that Bowtie2 does that, what is the status of TopHat usage?
    • Would it be right to align using Bowtie2 and reach Cufflinks directly?
    • What would I be missing if I don't use TopHat but the new Bowtie?


    Kindly advice. I have followed this strategy ->
    Bowtie2 --> SAM/ BAM ---> Cufflinks (with GTF file) ---> transcripts with FPKM

    Till now for the 8 datasets processed, I obtained ~99% alignment with "proper-paired" @ ~85%.
    What am I missing by not using TopHat? Any suggestions or ideas, please..

    ---Bowtie2 STDOUT for one of the datasets ---
    Time loading reference: 00:00:08
    Time loading forward index: 00:00:19
    Time loading mirror index: 00:00:11
    Multiseed full-index search: 15:44:57
    70363764 reads; of these:
    70363764 (100.00%) were paired; of these:
    8612578 (12.24%) aligned concordantly 0 times
    34458617 (48.97%) aligned concordantly exactly 1 time
    27292569 (38.79%) aligned concordantly >1 times
    ----
    8612578 pairs aligned concordantly 0 times; of these:
    5036749 (58.48%) aligned discordantly 1 time
    ----
    3575829 pairs aligned 0 times concordantly or discordantly; of these:
    7151658 mates make up the pairs; of these:
    1238386 (17.32%) aligned 0 times
    2331211 (32.60%) aligned exactly 1 time
    3582061 (50.09%) aligned >1 times
    99.12% overall alignment rate
    Time searching: 15:45:35
    Overall time: 15:45:35
    Last edited by amitm; 03-27-2012, 01:18 AM. Reason: EDIT
  • kopi-o
    Senior Member
    • Feb 2008
    • 319

    #2
    With 99.12% alignment rate, there is hardly any room for improvement! Is it a prokaryote?

    In theory, you should use TopHat for RNA-seq because it considers splicing. Bowtie2 does not do gapped alignment in that sense (spliced alignment), although it allows for short gaps. Of course, for simpler organisms with no introns, there is not much point in using TopHat.

    Comment

    • amitm
      Member
      • Feb 2011
      • 52

      #3
      Originally posted by kopi-o View Post
      With 99.12% alignment rate, there is hardly any room for improvement! Is it a prokaryote?

      In theory, you should use TopHat for RNA-seq because it considers splicing. Bowtie2 does not do gapped alignment in that sense (spliced alignment), although it allows for short gaps. Of course, for simpler organisms with no introns, there is not much point in using TopHat.
      hi,
      na, its human cell line RNA. Yep, thats what I have been thinking but since I am interested in transcript isoform quantification, I would want to ensure the efficacy of the pipeline. I have also visualized the BAM file on IGV, looks fine.



      But what I may be missing out for not using TopHat has been nagging me. I have put up aliignment using TopHat and would compare the two results. Would update if I find any changes in the BAM files.

      thanks

      Comment

      • NateP
        Junior Member
        • Sep 2011
        • 9

        #4
        I think the main difference between Tophat and Bowtie2 is this:

        Say you have a read that spans two exons.

        With Tophat, that read will be mapped two both exons in the mapping to splice junctions phase.

        With Bowtie2 (--local setting i believe?), that read will be soft trimmed until it maps to only one of the two exons, which ever gives the higher mapping score.

        Someone please correct me if I'm mistaken there.

        Comment

        • Jon_Keats
          Senior Member
          • Mar 2010
          • 279

          #5
          Something is fishy. There is no way you should get that high of alignment with 100x100 human RNA sequencing using bowtie2 unless the library is messed up. The IGV plot you show is highly biased to the 3' exon and in the top sample the exonic regions are not easily distinguished from the introns.

          Comment

          • peromhc
            Senior Member
            • Sep 2009
            • 108

            #6
            Originally posted by Jon_Keats View Post
            Something is fishy. There is no way you should get that high of alignment with 100x100 human RNA sequencing using bowtie2 unless the library is messed up. The IGV plot you show is highly biased to the 3' exon and in the top sample the exonic regions are not easily distinguished from the introns.
            Along these lines, >40% multiply mapped reads is likely one of the problems. Have you looked at read quality-- kmer frequency, etc?

            Comment

            • sdriscoll
              I like code
              • Sep 2009
              • 436

              #7
              If you run bowtie2 in local mode it will absolutely align over 90% of your data.

              As others have mentioned, Tophat was not made because bowtie could not do gapped alignments, it was made because there was no aligner that could align reads to the genome across splice junctions. Tophat does this which is separate from gapped alignments, which Tophat will now also report thanks to bowtie2.

              If you do not use Tophat in your cufflinks pipeline cufflinks will be missing valuable information about how the aligned reads are joining exons (in fact joining transcripts) together. cufflinks was designed to make use of that information. The only way to get bowtie2 to generate those type of alignments would be to align to a transcriptome and then converte the alignments back to genomic coordinates (something that Tophat does as part of its alignment pipeline). Then you'd be missing out on novel alignment information, though.

              If you want to get the best results out of your pipeline it's not 100% alignment you should be going for but for alignments to the genome that include spliced alignments. Those alignmets are the most powerful thing for assembling transcripts and for providing evidence of new exons and alternative splicing. For example you might see coverage that looks like a new exon from bowtie2 but only with Tophat would you also be able to see if reads aligning to that new exon also have junctions with annotated exons from a nearby gene.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              • carmeyeii
                Senior Member
                • Mar 2011
                • 137

                #8
                So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
                I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?

                Carmen

                Comment

                • sdriscoll
                  I like code
                  • Sep 2009
                  • 436

                  #9
                  I don't think it ever did a transcriptome alignment stage back then. I was never entirely sure what including the GTF was doing back then because of that. I think they looked at it as a guide to help resolve messy/unclear junction conditions.
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment

                  • carmeyeii
                    Senior Member
                    • Mar 2011
                    • 137

                    #10
                    Hmmmm.... So I'm guessing it used it to validate the potential junctions it had found in its initial mapping to the genome?

                    But then it would never find new stuff :/

                    Or maybe it looked for junctions close enough to what it had found to correct those to "perfection"... ?

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    24 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    30 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    39 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    62 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...