SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat 2.0.4 and GTF conversion failure -- bug? mixter RNA Sequencing 2 05-21-2013 03:49 PM
Initial Assembly Help - Short Reads Large Genome mabentley86 Bioinformatics 7 07-25-2012 01:44 AM
Tophat v1.1.4 potential error with sam to bam conversion? jb2 Bioinformatics 6 11-17-2011 12:52 AM
Tophat with --initial-read-mismatches command Error: chenyao Bioinformatics 6 08-24-2011 10:17 PM
Initial Review of deCODEme Personal Genomics Service ECO Personalized Genomics 3 11-16-2007 06:40 PM

Reply
 
Thread Tools
Old 09-05-2012, 06:04 AM   #1
kreitinger
Member
 
Location: Madison, WI

Join Date: May 2011
Posts: 10
Question Tophat reads kept/discarded during initial conversion

I am using Tophat to analyze illumina HiSeq2000 paired end read data. I have noticed that during the initial execution, Tophat1(and 2) "converts the reads" and then sorts the left reads into kept and discarded groups (e.g. 8,000,012 kept, 10,121 discarded) and does the same for the right reads (e.g. 7,804,000 kept, 206133 discarded). Since there are a different number of discarded reads, I'm assuming that "lone" mates are treated as single reads.

My question is, how does tophat decide which reads to keep and discard and why? Are there some underlying QC filters?
kreitinger is offline   Reply With Quote
Old 11-14-2012, 08:31 PM   #2
ROaj
Junior Member
 
Location: Vancouver

Join Date: Jan 2012
Posts: 1
Default

I am also VERY interested in this question/answer as I do quite a bit of quality trimming prior to mapping my reads and I've noticed the discarded reads being about 1-2% of my total read library.
ROaj is offline   Reply With Quote
Old 01-06-2013, 01:16 AM   #3
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Hey anyone of you got the answer as the same occurred with me also.

Tophat version is v2.0.6. Previously using the old software and that was working fine.
AsoBioInfo is offline   Reply With Quote
Old 01-23-2013, 02:22 PM   #4
asperjelly
Member
 
Location: Wisconsin

Join Date: Jan 2013
Posts: 11
Default

I'm also using Tophat v2.0.6 and I also had this same question. I'm assuming it is removing reads that don't meet some quality threshold, but can't seem to find any documentation with the manual.
asperjelly is offline   Reply With Quote
Old 01-23-2013, 02:34 PM   #5
kreitinger
Member
 
Location: Madison, WI

Join Date: May 2011
Posts: 10
Default

I still haven't figured out why these reads are discarded. Since this step happens before alignment to the genome or GTF annotations, it has to be related to discarding low quality reads. I emailed tophat.cufflinks@gmail.com with this thread's link, so hopefully they respond.
kreitinger is offline   Reply With Quote
Old 01-24-2013, 04:17 AM   #6
Daehwan Kim
Junior Member
 
Location: Maryland, USA

Join Date: Oct 2012
Posts: 4
Default

TopHat filter out some reads if they are of low complexity or include too many Ns.
Daehwan Kim is offline   Reply With Quote
Old 04-09-2013, 01:54 PM   #7
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

About how many might "too many" be?
carmeyeii is offline   Reply With Quote
Old 05-14-2013, 01:53 PM   #8
NKAkers
Member
 
Location: New York, NY

Join Date: Sep 2011
Posts: 26
Default Not the answer to your question but...

I can tell you that the 'discarded' reads end up in unmapped.bam.

Hopefully future versions of tophat will allow for more user control/better documentation of the quality filtering.
NKAkers is offline   Reply With Quote
Old 10-05-2013, 12:27 AM   #9
harryzs
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

I checked unmapped.bam from TopHat 2.0.9

samtools view -f 0x200 unmapped.bam | head

I got:
Code:
HWI-7001436:48:C2ET1ACXX:5:1108:2968:28222	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDD@DDDDBBBDDDD	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1203:5292:62817	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCTCGTTACA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBB5&)0((+()+((++	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1312:13946:40878	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1203:5920:62936	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDD	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1312:14680:40864	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:2312:9415:35514	581	*	0	255	*	*	0	0	ATTAAAAAAAAAAAACTCCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHIII<FHCHIIIIIIHDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1312:14593:40904	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCTCTCTTATAAAC	CCCFFFDFHGHHHIJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDBDD<9>&&+((((4(+(((((	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1108:4206:28028	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBBDDD<BDDDDDDDDDDDDDDDDDDDDDDD9	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1203:7475:62973	581	*	0	255	*	*	0	0	AGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDDDDBDDDDDDDBB@DDDDDDB@DBDB95&	ZT:A:L
HWI-7001436:48:C2ET1ACXX:5:1108:4708:28068	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
I think it makes sense removing these reads before alignment.

Right??

Another question:
what is the meaning of "ZT:A:L"?

Last edited by harryzs; 10-05-2013 at 12:44 AM.
harryzs is offline   Reply With Quote
Reply

Tags
filtering, paired end reads, quality illumina, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO