Hi, I have bisulfite converted reads (mapped as a bam file) and I would like to select only the reads that start and end at a Msp1 cut site (C^CGG) based on the reference genome.
The way I would think to go about doing that is making a bed file covering CCGG motifs from the reference genome fasta file. Then, selecting reads that start at a CCGG site based on the location they map to or ending at a CCGG by the location they map to, plust the length of the read.
However, I'm not sure how to do either part of this or if there might be more efficient ways to get this set of sites. I'd appreciate any help or better ideas, thanks!
The way I would think to go about doing that is making a bed file covering CCGG motifs from the reference genome fasta file. Then, selecting reads that start at a CCGG site based on the location they map to or ending at a CCGG by the location they map to, plust the length of the read.
However, I'm not sure how to do either part of this or if there might be more efficient ways to get this set of sites. I'd appreciate any help or better ideas, thanks!