SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
why low mapping rates for RNAseq? NGSfan RNA Sequencing 49 09-09-2014 01:34 PM
Low mapping percentage with TopHat2 DerSeb RNA Sequencing 3 06-05-2012 06:35 AM
RNA-seq Tophat2 errors LiamT Bioinformatics 5 05-09-2012 02:35 AM
HiSeq 2000 RNA-Seq Mapping Rates Lee Sam Illumina/Solexa 9 12-09-2010 06:39 AM
RNA-Seq: Modeling non-uniformity in short-read rates in RNA-Seq data. Newsbot! Literature Watch 2 06-29-2010 02:47 PM

Reply
 
Thread Tools
Old 06-15-2012, 11:15 AM   #1
IceWater
Junior Member
 
Location: USA CA

Join Date: Apr 2012
Posts: 7
Default why low mapping rates for RNA-seq with tophat2

Hi everybody,

As a beginner for RNA-seq analysis, I desperately need your help and will appreciate it very much.
I did single end sequencing of Arabidopsis thaliana transcriptome with Hiseq2000. The read length is 51bp. The sequencing quality seemed to be quite good when checked with FASTQC. When I ran Tophat2, the resulting accepted_hits.bam file was about 38 M bite in its size while the unmapped.bam was about 280 MB. Although I haven't found out the exact mapping rate, judging from the sizes of the mapped and unmapped files it seems that the majority of the reads are not mapped to the genome. When I randomly picked up some reads from the unmapped file and blasted them against the Arabidopsis genome (-intron, +UTR), I found almost all the reads I checked can be perfectly blasted to a certain mRNA. I used genes.gtf and genome in the TAIR10 downloaded from iGenome. This low mapped rate happened no matter I used the following scrpit1 or 2. Does any one has any clue what the reason can be? Thanks for your suggestions.

script1:
tophat2 -p 8 -i 30 -g 5 --min-coverage-intron 30 --min-segment-intron 30 --b2-sensitive -G genes.gtf -o ./ genome 4_GCCAAT_L001_R1_001.fastq.gz 4_GCCAAT_L001_R1_002.fastq.gz 4_GCCAAT_L001_R1_003.fastq.gz 4_GCCAAT_L001_R1_004.fastq.gz 4_GCCAAT_L001_R1_005.fastq.gz 4_GCCAAT_L001_R1_006.fastq.gz 4_GCCAAT_L001_R1_007.fastq.gz 4_GCCAAT_L001_R1_008.fastq.gz 4_GCCAAT_L001_R1_009.fastq.gz 4_GCCAAT_L001_R1_010.fastq.gz 4_GCCAAT_L001_R1_011.fastq.gz 4_GCCAAT_L001_R1_012.fastq.gz

script2:
tophat2 -p 8 -G genes.gtf -o ./ genome 4_GCCAAT_L001_R1_001.fastq.gz 4_GCCAAT_L001_R1_002.fastq.gz 4_GCCAAT_L001_R1_003.fastq.gz 4_GCCAAT_L001_R1_004.fastq.gz 4_GCCAAT_L001_R1_005.fastq.gz 4_GCCAAT_L001_R1_006.fastq.gz 4_GCCAAT_L001_R1_007.fastq.gz 4_GCCAAT_L001_R1_008.fastq.gz 4_GCCAAT_L001_R1_009.fastq.gz 4_GCCAAT_L001_R1_010.fastq.gz 4_GCCAAT_L001_R1_011.fastq.gz 4_GCCAAT_L001_R1_012.fastq.gz

Last edited by IceWater; 06-18-2012 at 02:41 PM.
IceWater is offline   Reply With Quote
Old 06-15-2012, 11:54 AM   #2
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

There are a lot of potential reasons. Poor quality sequence, contamination, ribosomal RNA, etc. I've had all of these affect my mapping at one time or another.
With your short reads, have you tried just using Bowtie? You should be able to get a significant amount of them mapping. If not, it might indicate a sample problem rather than an alignment problem.
You might also want to give STAR a try. I always get better mapping with it over Tophat.
http://gingeraslab.cshl.edu/STAR/
pbluescript is offline   Reply With Quote
Old 06-15-2012, 12:47 PM   #3
JeremyDay
Registered Vendor
 
Location: San Diego

Join Date: Feb 2012
Posts: 25
Default Tophat 1

I am not a Tophat user, but I have heard from others that Tophat 2.0 changed from Tophat 1 in the sense that it maps only to annotated references, which reduces mapability. Maybe try Tophat 1? Some of our bioinformaticians have switched back. Don't quote me on any of this, just hearsay
JeremyDay is offline   Reply With Quote
Old 06-15-2012, 12:53 PM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by JeremyDay View Post
I am not a Tophat user, but I have heard from others that Tophat 2.0 changed from Tophat 1 in the sense that it maps only to annotated references, which reduces mapability. Maybe try Tophat 1? Some of our bioinformaticians have switched back. Don't quote me on any of this, just hearsay
Tophat2 will do an initial mapping to annotated transcripts sequences, but then it should map to back to the genome regardless of annotation. This actually speeds up the mapping drastically.
chadn737 is offline   Reply With Quote
Old 06-15-2012, 03:15 PM   #5
hpimentel
Junior Member
 
Location: Berkeley

Join Date: Nov 2010
Posts: 6
Default

Just trickling back from our mailing list... this user was answered on our mailing list by Daehwan. The main problem was that the reads were not passed in with a ',' in between them, they were separated by space. TopHat will interpret this command entirely differently.

Quote:
Originally Posted by JeremyDay View Post
I am not a Tophat user, but I have heard from others that Tophat 2.0 changed from Tophat 1 in the sense that it maps only to annotated references, which reduces mapability. Maybe try Tophat 1? Some of our bioinformaticians have switched back. Don't quote me on any of this, just hearsay
I believe the change that you are referring to is in TopHat 1.4, where we changed transcriptome mapping if a GTF is given using the argument '-G'. Internally this program is called 'map2gtf'. The new method maps directly to the transcriptome before anything else and converts the coordinates back to genomic coordinates. This typically results in better alignments. One reason you might get less alignments in newer versions of TopHat (>1.3 or 1.4) is that the internal bowtie parameters have become more stringent (allowing less mismatches with -N I believe).



HTH,

Harold
hpimentel is offline   Reply With Quote
Old 06-18-2012, 12:41 PM   #6
IceWater
Junior Member
 
Location: USA CA

Join Date: Apr 2012
Posts: 7
Default Thanks.

Hi Everyone,

I really appreciate your guys replies and suggestions. I now find out the reason why this happened. It is just as what Harold said: I used space instead of "," to separate the reads passed in.

Last edited by IceWater; 06-18-2012 at 02:42 PM.
IceWater is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO