I'm working with PE RRBS data. The library fragment size is 40-220bp, and the reads are 50bp PE reads. Because of this, there are a significant amount of regions in which the PE reads overlap. I'm using BSseeker2 to align the PE data, but during the methylation calling there is a bias in the overlapping regions, as each read in an overlapping pair is used to call the methylation state. I need to avoid scoring overlapping methylation calls twice. However, this has proved to be harder than expected. I've tried two approaches:
1. Remove the overlapping portion of one of the PE reads from the aligned bam file before methylation calling using BSeQC. I've tried this, and the output suggests that I have no overlapping PE reads:
Total reads: 19725948
Not unique paired mapping reads: 0(0.00%)
Unique paired mapping reads: 19725948(100.00%)
Skip not paired unique mapping reads: 0(0.00%)
In unique paired mapping reads:
All unique paired mapping basepairs: 932699172
Filter end-repaired nucleotides in MspI site : 19714618(2.11% of u
nique mapping basepairs)
Filter overlapped basepairs: 0(0.00% of unique paired mapping base
pairs)
The results are the same for all of my samples. However I know that there are regions where PE reads do overlap. I'm not sure why BSeQC does not recognize this in my bam files produced by BSseeker2. So I decided to try another approach:
2. Merge overlapping PE reads before alignment using SeqPrep. This results in about 10M of the ~19M reads being merged. However when I try to align these as SE reads, BSseeker2 seems to run properly, then stops partway through without any warnings or errors. I've looked through the output reports, but can't figure out where the problem is.
Does anyone have any experience dealing with PE RRBS datasets? Bismark has an option to remove these overlapping portions, so I can switch to Bismark if I need to. But my % uniquely aligned drops about 10% or more using Bismark instead of BSseeker2 because Bismark does not allow the use of bowtie2 in the local alignment mode. Thanks.
1. Remove the overlapping portion of one of the PE reads from the aligned bam file before methylation calling using BSeQC. I've tried this, and the output suggests that I have no overlapping PE reads:
Total reads: 19725948
Not unique paired mapping reads: 0(0.00%)
Unique paired mapping reads: 19725948(100.00%)
Skip not paired unique mapping reads: 0(0.00%)
In unique paired mapping reads:
All unique paired mapping basepairs: 932699172
Filter end-repaired nucleotides in MspI site : 19714618(2.11% of u
nique mapping basepairs)
Filter overlapped basepairs: 0(0.00% of unique paired mapping base
pairs)
The results are the same for all of my samples. However I know that there are regions where PE reads do overlap. I'm not sure why BSeQC does not recognize this in my bam files produced by BSseeker2. So I decided to try another approach:
2. Merge overlapping PE reads before alignment using SeqPrep. This results in about 10M of the ~19M reads being merged. However when I try to align these as SE reads, BSseeker2 seems to run properly, then stops partway through without any warnings or errors. I've looked through the output reports, but can't figure out where the problem is.
Does anyone have any experience dealing with PE RRBS datasets? Bismark has an option to remove these overlapping portions, so I can switch to Bismark if I need to. But my % uniquely aligned drops about 10% or more using Bismark instead of BSseeker2 because Bismark does not allow the use of bowtie2 in the local alignment mode. Thanks.
Comment