I used tophat to map my RNAseq reads.
1. mapped the reads to the reference genome and only examined the exon regions with the mapped reads. Then calculate the percentage of the genes covered by those reads (breadth coverage).
2. Extracted the exons to generate a reference geneset. Then mapped the reads to this geneset and calculate the percentage of the genes covered by the reads
The results from 1 and 2 were compared and they were quite different.
For my specific data set, 2 generally gave a higher coverage percentage when compared with 1.
Geneset accounts for 5% of the total genome.
I wonder if it is because genome is too big and the mapped reads are more spread along the genome especially for those duplicate/repetitive regions. If geneset is used as reference, reads are forced to mapped to the limited regions and the breadth coverage is increased accordingly???
Any thoughts about this??
1. mapped the reads to the reference genome and only examined the exon regions with the mapped reads. Then calculate the percentage of the genes covered by those reads (breadth coverage).
2. Extracted the exons to generate a reference geneset. Then mapped the reads to this geneset and calculate the percentage of the genes covered by the reads
The results from 1 and 2 were compared and they were quite different.
For my specific data set, 2 generally gave a higher coverage percentage when compared with 1.
Geneset accounts for 5% of the total genome.
I wonder if it is because genome is too big and the mapped reads are more spread along the genome especially for those duplicate/repetitive regions. If geneset is used as reference, reads are forced to mapped to the limited regions and the breadth coverage is increased accordingly???
Any thoughts about this??
Comment