Hi All-
I'm using an RNA-seq dataset produced by modENCODE. It is a mixture of single- and paired-end 75nt reads. Many of the paired-end mates overlap with each other, which creates problems in generating read counts at each base/position in the genome.
I'm wondering how people are dealing with this. In principle, I think that paired-mates should be merged into a single read, since they reflect a single mRNA molecule. However, the software I've been using (samtools) treats each read separately, leading to duplication of counts in overlapping regions.
Any thoughts or suggestions?
Thanks,
-Dan
I'm using an RNA-seq dataset produced by modENCODE. It is a mixture of single- and paired-end 75nt reads. Many of the paired-end mates overlap with each other, which creates problems in generating read counts at each base/position in the genome.
I'm wondering how people are dealing with this. In principle, I think that paired-mates should be merged into a single read, since they reflect a single mRNA molecule. However, the software I've been using (samtools) treats each read separately, leading to duplication of counts in overlapping regions.
Any thoughts or suggestions?
Thanks,
-Dan