Simple question but I want to make sure I get this right. Basically I'm looking at pulldowns of regions enriched with methylated CpG's.
Let's say I have a read that has the sequence: ACGTCGTACGCGTGCAAGCGC
The "CG" sequence appears 5 times.
The "GC" sequence appears 4 times.
However, there are only 7 distinct occurrences of "CG" or "GC" as one "CG" overlaps "GCGC" and one "GC" overlaps "CGCG".
A read may overlap the forward or reverse strand but either way CG will appear as the reverse complement of CG is CG.
Thus, I should only consider sequences of C followed by G, and not G followed by C, correct? Hence, in the above read, it will cover 5 CpG sites?
Is this correct logic?
Let's say I have a read that has the sequence: ACGTCGTACGCGTGCAAGCGC
The "CG" sequence appears 5 times.
The "GC" sequence appears 4 times.
However, there are only 7 distinct occurrences of "CG" or "GC" as one "CG" overlaps "GCGC" and one "GC" overlaps "CGCG".
A read may overlap the forward or reverse strand but either way CG will appear as the reverse complement of CG is CG.
Thus, I should only consider sequences of C followed by G, and not G followed by C, correct? Hence, in the above read, it will cover 5 CpG sites?
Is this correct logic?
Comment