Hi,
I'm flummoxed by a discrepancy between the Bismark SAM file and the results of bismark methylation extractor.
bismark_methylation_extractory counts 11 reads at the location 20,665,340 on chromosome Y.
When I view the SAM file in IGV (after conversion to the BAM format and indexing), I can see that there are 14 reads at that location.
Given that four of the reads are two pairs of overlapping reads, I would expect that the count for bismark methylation extractor would be 12, since bismark_methalytation_extractor only counts the first member of overlapping reads.
According to my understanding of bismark_methylation_extractor, it should therefore count 12 reads.
Why does it count 11 reads?
On what criteria does it exclude another read?
I've put in attachment a screenshot of IGV for the location of interest.
I converted the cov file to the bigWig format, but the results are identical if I go back to the original cov file.
[alexis@lg-1r17-n03 methylation_extractor]$ grep 20665340 AtTneo_Y_sorted_read_name.deduplicated.bismark.cov
Y 20665340 20665340 90.9090909090909 10 1
Here is also the original command.
bismark_methylation_extractor \
--paired-end \
--bedGraph \
--buffer_size 8100M \
--CX_context \
--zero_based \
--merge_non_CpG \
--comprehensive \
--output ../../results/bismark/AtTneo/Y/methylation_extractor \
../../results/bismark/AtTneo/Y/AtTneo_Y_sorted_read_name.deduplicated.bam
Thank you for your help,
Alexis
I'm flummoxed by a discrepancy between the Bismark SAM file and the results of bismark methylation extractor.
bismark_methylation_extractory counts 11 reads at the location 20,665,340 on chromosome Y.
When I view the SAM file in IGV (after conversion to the BAM format and indexing), I can see that there are 14 reads at that location.
Given that four of the reads are two pairs of overlapping reads, I would expect that the count for bismark methylation extractor would be 12, since bismark_methalytation_extractor only counts the first member of overlapping reads.
According to my understanding of bismark_methylation_extractor, it should therefore count 12 reads.
Why does it count 11 reads?
On what criteria does it exclude another read?
I've put in attachment a screenshot of IGV for the location of interest.
I converted the cov file to the bigWig format, but the results are identical if I go back to the original cov file.
[alexis@lg-1r17-n03 methylation_extractor]$ grep 20665340 AtTneo_Y_sorted_read_name.deduplicated.bismark.cov
Y 20665340 20665340 90.9090909090909 10 1
Here is also the original command.
bismark_methylation_extractor \
--paired-end \
--bedGraph \
--buffer_size 8100M \
--CX_context \
--zero_based \
--merge_non_CpG \
--comprehensive \
--output ../../results/bismark/AtTneo/Y/methylation_extractor \
../../results/bismark/AtTneo/Y/AtTneo_Y_sorted_read_name.deduplicated.bam
Thank you for your help,
Alexis
Comment