SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
'Properly paired' reads in sam flag from TopHat mapping AdamB Bioinformatics 9 03-08-2012 08:30 AM
Help me!!!!! low % of properly paired reads!!! Trudy Bioinformatics 1 05-25-2011 12:26 AM
How to check if reads are properly paired in mate-pair data? genepool_bee Bioinformatics 2 02-22-2011 02:07 AM
samtool fixmate drastically decreases the number of properly paired reads Jerry-cs Bioinformatics 1 01-13-2011 06:04 AM
Lower percentage of properly paired sequence sunnyvu Bioinformatics 3 05-12-2010 09:58 AM

Reply
 
Thread Tools
Old 12-02-2010, 07:54 AM   #1
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default properly paired reads in TopHat output

Hi,

The samtools flagstat results for a TopHat BAM file from my data are reproduced below. I am a little concerned about the percentage of "properly paired" reads (~33%). I guess that in RNA-Seq data, the two reads in a pair may be mapping to different exons, which may cause the pair to deviate from whatever the definition for proper pairing is. Does anyone have any comments on whether these numbers may be within reasonable limits for a RNA-Seq data set?

Code:
106172059 in total
0 QC failure
0 duplicates
106172059 mapped (100.00%)
106172059 paired in sequencing
53427448 read1
52744611 read2
34923852 properly paired (32.89%)
89669616 with itself and mate mapped
16502443 singletons (15.54%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)
The reads were PE 76bp, the insert size was 200bp and the syntax used to run TopHat is:
Code:
tophat -o 108971.testing.out -m 2 -p 4 -r 200 --mate-std-dev 50 -g 10 /home/sensh/bin/bowtie-0.12.7/indexes/hg18_inclusive 108971.read1.fastq 108971.read2.fastq
Thanks,

Shurjo
shurjo is offline   Reply With Quote
Old 12-02-2010, 09:05 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I am curious about this issue as well. I have several human RNA-seq data sets where the percentage of "properly paired" reads was about 50% in TopHat output, but much higher in bwa output. Of course, this depends both on the actual mapping and the criterion that each aligner uses to set the "properly paired" flag in the BAM file. In my case, there were some 20% spliced alignments in the TopHat output, which would explain part of the remaining "non-properly paired reads".

Does anyone know what criterion TopHat uses for setting the "properly paired" flag?
kopi-o is offline   Reply With Quote
Old 12-02-2010, 12:35 PM   #3
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default

Actually, I just looked at the FLAG field and I am even more puzzled. Issuing this command:
Code:
 samtools view accepted_hits.bam | awk  '{print $2}' | sort | uniq
returns only 0 and 16. So, I am not sure how samtools flagstat is even deciding which reads are properly paired, given that neither of these flags indicates even pairing, let alone correct pairing. At this point, I suspect that the TopHat BAM is not really doing much with this field, so a different approach may be needed to examine what percentage of PE reads are paired in the output.
shurjo is offline   Reply With Quote
Reply

Tags
paired-end, rna-seq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO