I have dealt with both ChIP-seq and RNA-seq analysis. In ChIP-seq, it's almost a standard procedure to remove those redundant reads that map to the same location with the same orientation. It's reasonable because by chance it's very unlikely for the sonication to break the genomic sequence at the same location for more than twice during sample preparation. So, if we see the redundant reads, they are most likely PCR amplifications.
However, it seems NOT to be a standard to remove those redundant reads for RNA-seq. My understanding is that the total coding sequence length is much shorter than the genomic sequence length, which significantly increase the chance for the same location to be selected for sequencing. However, how do you distinguish the redundancy of amplification from random selection?
I have had this concern because I have seen certain genes containing much higher read count from one biological replicate than the other replicates. Probably more than 100 folds! It's very unlikely to happen b/c of biological variation. They are more likely to be related with PCR bias.
Any thoughts?
- L
However, it seems NOT to be a standard to remove those redundant reads for RNA-seq. My understanding is that the total coding sequence length is much shorter than the genomic sequence length, which significantly increase the chance for the same location to be selected for sequencing. However, how do you distinguish the redundancy of amplification from random selection?
I have had this concern because I have seen certain genes containing much higher read count from one biological replicate than the other replicates. Probably more than 100 folds! It's very unlikely to happen b/c of biological variation. They are more likely to be related with PCR bias.
Any thoughts?
- L
Comment