Hi,
The samtools flagstat results for a TopHat BAM file from my data are reproduced below. I am a little concerned about the percentage of "properly paired" reads (~33%). I guess that in RNA-Seq data, the two reads in a pair may be mapping to different exons, which may cause the pair to deviate from whatever the definition for proper pairing is. Does anyone have any comments on whether these numbers may be within reasonable limits for a RNA-Seq data set?
The reads were PE 76bp, the insert size was 200bp and the syntax used to run TopHat is:
Thanks,
Shurjo
The samtools flagstat results for a TopHat BAM file from my data are reproduced below. I am a little concerned about the percentage of "properly paired" reads (~33%). I guess that in RNA-Seq data, the two reads in a pair may be mapping to different exons, which may cause the pair to deviate from whatever the definition for proper pairing is. Does anyone have any comments on whether these numbers may be within reasonable limits for a RNA-Seq data set?
Code:
106172059 in total 0 QC failure 0 duplicates 106172059 mapped (100.00%) 106172059 paired in sequencing 53427448 read1 52744611 read2 34923852 properly paired (32.89%) 89669616 with itself and mate mapped 16502443 singletons (15.54%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5)
Code:
tophat -o 108971.testing.out -m 2 -p 4 -r 200 --mate-std-dev 50 -g 10 /home/sensh/bin/bowtie-0.12.7/indexes/hg18_inclusive 108971.read1.fastq 108971.read2.fastq
Shurjo
Comment