SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GTF usage in Tophat wariobrega Bioinformatics 5 02-26-2015 10:42 AM
GTF input tophat and cufflinks rubbertjes Bioinformatics 6 07-02-2013 07:42 AM
Tophat gtf problem Jetse Bioinformatics 1 02-12-2013 04:59 AM
Tophat v1.1 with GTF files hyjkim Bioinformatics 7 12-17-2012 07:11 AM
Homo_sapiens. GRCh37.55.gtf for tophat shouguogao RNA Sequencing 2 12-16-2011 02:51 AM

Reply
 
Thread Tools
Old 02-03-2014, 10:49 AM   #1
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default TopHat with and without GTF

From what I understand, running TopHat with the GTF will assist with the mappings and make them a little cleaner, but shouldn't make a huge difference. I tried running TopHat with and without GTF on some mouse data.

Alignment summary with GTF:
Code:
Left reads:
               Input:  76922617
              Mapped:  37116408 (48.3% of input)
            of these:   1976558 ( 5.3%) have multiple alignments (515208 have >20)
Right reads:
               Input:  76672086
              Mapped:  35646606 (46.5% of input)
            of these:   1395593 ( 3.9%) have multiple alignments (748805 have >20)
47.4% overall read alignment rate.

Aligned pairs:  32699739
     of these:    745858 ( 2.3%) have multiple alignments
          and:    353721 ( 1.1%) are discordant alignments
42.2% concordant pair alignment rate.
Alignment summary without GTF:
Code:
Left reads:
               Input:  76922617
              Mapped:   4455809 ( 5.8% of input)
            of these:    212949 ( 4.8%) have multiple alignments (481436 have >20)
Right reads:
               Input:  76672086
              Mapped:   3360721 ( 4.4% of input)
            of these:    128254 ( 3.8%) have multiple alignments (732174 have >20)
 5.1% overall read alignment rate.

Aligned pairs:    817083
     of these:      5615 ( 0.7%) have multiple alignments
          and:       720 ( 0.1%) are discordant alignments
 1.1% concordant pair alignment rate.
The difference between alignment with and without GTF is huge. Is this normal? What would explain such a big discrepancy?

If this is normal, then the conclusion is that providing a GTF is important. By that logic, can TopHat really be trusted to detect novel transcripts if it has so much trouble working with transcripts not described by a GTF file?
id0 is offline   Reply With Quote
Old 02-03-2014, 11:00 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Including a GTF file can make a large difference (see "tophat2" vs. "tophat2 ann" at the bottom):



I recommend reading the whole paper, it's quite useful.
dpryan is offline   Reply With Quote
Old 02-03-2014, 11:05 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I should add that in either case your alignment rate is exceedingly low. What sort of organism is this? Also, did you do any adapter trimming?
dpryan is offline   Reply With Quote
Old 02-03-2014, 01:45 PM   #4
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

To answer your question, this is mouse without adapter trimming.

Thanks for that informative paper. However, the difference between annotated and non-annotated TopHat there is a few percentage points. For me it's ~5% versus ~50%.

For comparison, I am getting over 80% with just regular genomic alignment with Bowtie, so the reads themselves are of reasonable quality.
id0 is offline   Reply With Quote
Old 02-04-2014, 01:05 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

80% with mouse RNAseq is more what one would expect (I get >95% alignment with mouse RNAseq, though only ~85-90% map uniquely).

Are you using local alignment with bowtie? Also, keep in mind that tophat is less tolerant (by default) of mismatches than bowtie, so if you have a number of those (due to using a quite divergent strain, for example), then that might also cause these sorts of problems.

Maybe give STAR a try and see if that produces better results for you. I've been quite happy with it.
dpryan is offline   Reply With Quote
Old 02-04-2014, 08:33 AM   #6
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

Based on what I've heard from other people, STAR will be much faster, but only marginally more accurate (if at all).

Regarding mismatches, that should not be affected by adding or removing a GTF. That variable is yielding ~5% versus ~50% alignment rate for me. I don't see how I can find any novel genes based off TopHat alignment if it is having so much difficulty finding known ones.
id0 is offline   Reply With Quote
Old 02-05-2014, 01:13 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

True, though if the low alignment rate is due in part to the ends of many reads not mapping then using an aligner that can do soft-clipping (e.g., STAR) might produce better results. Aside from that, I'd have to actually see and play around with your data a bit to be of any more help. I've never had these sorts of issues with mouse RNA.
dpryan is offline   Reply With Quote
Old 02-06-2014, 05:27 AM   #8
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

I ran the same sample with STAR. I generated two genomes, one with GTF and one without. I ran the sample against both. I got more than twice the number of splices with GTF, which makes sense to me. For uniquely mapped reads, I got 64% alignment rate with GTF and 63% without. Essentially identical, which is what I would expect from a good aligner.

I will have to evaluate STAR more thoroughly. Based on the literature and this forum, it's main advantage is speed, which is not a concern for me, so I never bothered to test it for myself. At least for this one example, it seems to be far superior than TopHat in terms of alignment. I would also be far more confident in any novel genes detected from this alignment.
id0 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO