SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 08:05 PM
Can Cuffdiff treat paired-end and single-end reads at the same time? zun RNA Sequencing 3 06-12-2012 05:37 PM
Can paired-end mapping produce more reads than single-end ? warrenemmett Bioinformatics 13 03-20-2012 11:10 PM
RNA-seq: Replicates, single-end, paired-end story pasta Bioinformatics 2 07-04-2011 11:51 PM
BOth single and paired end reads in a file!! adgen Illumina/Solexa 0 06-30-2010 10:28 AM

Reply
 
Thread Tools
Old 12-10-2010, 11:11 AM   #1
adarshjose
Junior Member
 
Location: ames IA

Join Date: Jul 2010
Posts: 6
Default TopHat -paired end vs single end reads

Hi,

I was trying to map paired end Illumina GA IIE 85 bp reads to a reference genome using TopHat. When I tried to map both the pairs together only a small fraction (< 10 % ) of the reads mapped to the genome, but > 80 % of the reads mapped to the reference when I mapped the pairs separately.

mapping using each paired end data separately:
tophat -r 200 -o ./tophatr200 Ref/Zm.seq.uniq seqs__filtered_6_1.fastq
tophat -r 200 -o ./tophatr200 Ref/Zm.seq.uniq seqs__filtered_6_2.fastq

(> 80 % of reads mapped here.)

mapping paired data together:
tophat -r 200 -o ./tophatr200 Ref/Zm.seq.uniq seqs__filtered_6_1.fastq seqs__filtered_6_2.fastq

(< 10 % of reads mapped here.)

Has anyone seen this before ? Could this have something to do with the -r value ? Any suggestion will be greatly appreciated.

Thanks

Adarsh Jose
Iowa State University
adarshjose is offline   Reply With Quote
Old 12-14-2010, 01:03 AM   #2
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

hey,

probably the distance between your paired-ends is to high such that TopHat isn't able to map it accurate to the source sequence. This could result of a high standard deviation in the sample prep. of the reads you use (i.e. too large clone libraries).
If you map the read on their own they could be mapped because the information of mate pairs doesn't really matter in such a case. Try to enlarge the possible gaps while using TopHat and review the results.

Don't know if it really helps but i guess that this could be a reason.


cheers

phil
sphil is offline   Reply With Quote
Old 05-26-2011, 10:30 AM   #3
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default

Hi adarshjose,

Did you solve your problem? I would be very interested in how you solved the discrepancy.

-a
arrchi is offline   Reply With Quote
Old 05-26-2011, 10:31 AM   #4
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default

Hi adarshjose,

Did you solve your problem? I would be very interested in how you solved the discrepancy.

-a
arrchi is offline   Reply With Quote
Old 06-16-2011, 09:44 PM   #5
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

The reads may be trimmed....
jameslz is offline   Reply With Quote
Old 03-20-2012, 11:06 PM   #6
anurag.gautam
Member
 
Location: India

Join Date: Oct 2010
Posts: 15
Default

Hi ,
I tried to map illumina ~2 million reads to Oryza sativa indica reference genome with its reference gtf file using different versions of Tophat 1.1.4, 1.3.0, 1.3.1, 1.3.2, 1.3.3 and the current one 1.4.1 .
I used the defalut options just to check if the mapping statistics really gets affected. As a result, I got the following stats:
Reads Used Reads Mapped
Tophat1.1.4 2,000,000 2,27,554
Tophat1.3.0 2,000,000 2,30,817
Tophat1.3.1 2,000,000 2,31,935
Tophat1.3.2 2,000,000 4,517
Tophat1.3.3 2,000,000 2,31,935
Tophat1.4.1 2,000,000 1,37,724

I wanted to know why the number of reads mapped is varying in each version even though using the same data. Secondly, why there is a drastic change in the mapping stats with version 1.3.2 and 1.4.1 as compared with other versions? Can please anybody throw some light on this matter
anurag.gautam is offline   Reply With Quote
Old 03-21-2012, 04:09 AM   #7
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by anurag.gautam View Post
Hi ,
I tried to map illumina ~2 million reads to Oryza sativa indica reference genome with its reference gtf file using different versions of Tophat 1.1.4, 1.3.0, 1.3.1, 1.3.2, 1.3.3 and the current one 1.4.1 .
I used the defalut options just to check if the mapping statistics really gets affected. As a result, I got the following stats:
Reads Used Reads Mapped
Tophat1.1.4 2,000,000 2,27,554
Tophat1.3.0 2,000,000 2,30,817
Tophat1.3.1 2,000,000 2,31,935
Tophat1.3.2 2,000,000 4,517
Tophat1.3.3 2,000,000 2,31,935
Tophat1.4.1 2,000,000 1,37,724

I wanted to know why the number of reads mapped is varying in each version even though using the same data. Secondly, why there is a drastic change in the mapping stats with version 1.3.2 and 1.4.1 as compared with other versions? Can please anybody throw some light on this matter
Could you fix your comma placement? I don't know how many alignments Tophat gave you. Does 2,27,554 mean 227,554?
pbluescript is offline   Reply With Quote
Old 03-21-2012, 04:21 AM   #8
anurag.gautam
Member
 
Location: India

Join Date: Oct 2010
Posts: 15
Default

Yes both are same
Tophat1.1.4 2,000,000 227,554
Tophat1.3.0 2,000,000 230,817
Tophat1.3.1 2,000,000 231,935
Tophat1.3.2 2,000,000 4,517
Tophat1.3.3 2,000,000 231,935
Tophat1.4.1 2,000,000 137,724

anurag.gautam is offline   Reply With Quote
Old 03-21-2012, 04:58 AM   #9
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by anurag.gautam View Post
Yes both are same
Tophat1.1.4 2,000,000 227,554
Tophat1.3.0 2,000,000 230,817
Tophat1.3.1 2,000,000 231,935
Tophat1.3.2 2,000,000 4,517
Tophat1.3.3 2,000,000 231,935
Tophat1.4.1 2,000,000 137,724

That's not a lot of mapped reads. Either something went wrong with the library prep, sequencing, or mapping method. How good is the reference genome for Oryza sativa indica?
pbluescript is offline   Reply With Quote
Old 03-21-2012, 05:23 AM   #10
anurag.gautam
Member
 
Location: India

Join Date: Oct 2010
Posts: 15
Default

Reference genome of ORyza sativa indica is of good quality and has good coverage. The reads are also of higher quality. , But still the question remains the same , why different mapping stats?
anurag.gautam is offline   Reply With Quote
Old 06-12-2012, 06:15 PM   #11
zun
Member
 
Location: Japan

Join Date: Oct 2010
Posts: 26
Default

hello anurag.gautam,

I also have used tophat series with same O.sativa reads since 2010,
but I haven't encountered the same situation as yours.
In fact the number of mapped reads varied a little, but not drastically like your case.....hmm I don't know the reason why, sorry...

> adarshjose
I had a same problem before, and realized that was because tophat abandoned the mate pairs which mapped on different chromosomes when uniting the left/right reads mapped by bowtie.
but tophat2 has a option called "--report-discordant-pair-alignment" which allows mate pairs to map to different chromosomes.
so you will get higher mapping rate with tophat2...
hope this will help you....

zun
zun is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO