Seqanswers Leaderboard Ad

**blancha** · 07-01-2015, 02:56 PM

I've simplified the problem, to make the troubleshooting easier.
I've run bismark_methylation_extractor a second time, including overlaps.
I get 13 bases aligning at position 20665340, whereas I count 14 when I view the SAM generated with Bismark in IGV.
So, why the discrepancy? Is it the bottom read that appears to be truncated that is not counted? I don't see any mention in the documentation of reads being skipped. Or is it a bug?

I've put in attachment the IGV screenshot, this time not showing paired reads together, to make the counting easier.

[blancha@lg-1r17-n04 methylation_extractor_with_overlap]$ grep 20665340 *bismark.cov
Y 20665340 20665340 92.3076923076923 12 1

bismark_methylation_extractor \
--include_overlap \
--paired-end \
--bedGraph \
--buffer_size 8100M \
--CX_context \
--zero_based \
--merge_non_CpG \
--comprehensive \
--output ../../results/bismark/AtTneo/Y/methylation_extractor_with_overlap \
../../results/bismark/AtTneo/Y/AtTneo_Y_sorted_read_name.deduplicated.bam

Attached Files

IGV screenshot.png (84.8 KB, 38 views)

**dpryan** · 07-01-2015, 11:19 PM

We'd need to see the lines from the SAM file to give you a real answer. My guess is that one of the alignments is to the original top strand while the others are to the original bottom.

**blancha** · 07-02-2015, 06:04 AM

Thank you.
You are absolutely correct.

I have put in attachment the IGV screenshot with the reads colored by the XG tag. 13 reads have the XG tag set to GA, while one read has the tag set to CT.
So, I suppose that one read with the tag set to CT is excluded from the bismark_methylation_extractor count.
I've put at the botttom the details for the one read that appears not to be counted by bismark_methylation_extractor.
I'm still not absolutely clear on why the read is excluded from the methylation counts based on the tag, but I'll think about it a a bit more. I suppose that bismark_methylation_extractor has concluded that the original sequenced base was a G and not a C, and therefore could not be methylated?

It's a bit frustrating to get different results from the Bismark SAM file, the bismark_methylation_extractor, and methylKit, but I've made progress in understanding why. bismark_methylation_extractor excludes overlapping reads by default. methylKit's read.bismark function excludes bases with a Phred quality score below 20 by default. bismark_methylation_extractor appears to exclude certain reads based on their XG and XR tags, according to criteria that I do not fully understand yet (not counted if sequenced base was a G?).

Read name = HWI-ST915:46:C6ANNACXX:5:2112:8591:53033_1:N:0:
----------------------
Location = chrY:20,665,340
Alignment start = 20,665,340 (+)
Cigar = 53M
Mapped = yes
Mapping quality = 15
Secondary = no
Supplementary = no
Duplicate = no
Failed QC = no
----------------------
Base = G
Base phred quality = 33
----------------------
Mate is mapped = yes
Mate start = chrY:20665426 (-)
Insert size = 186
First in pair
Pair orientation = F1R2
----------------------
MD = 5C0C1C0C1C4C7C14C0C0C2C0C2C1C1G0
XG = CT
NM = 15
XM = .....hh.hh.x....h.......x..............hhh..hh..h.x..
XR = CT
-------------------
Alignment start position = chrY:20665340
GTTTTTTTTTTTTGATTTATATAATTGGAAAAATAATAATTTTTTTTTTTTTT

Attached Files

Screen Shot 2015-07-02 at 10.01.55 AM.png (93.8 KB, 37 views)

**dpryan** · 07-02-2015, 06:08 AM

That alignment is to the top strand, so a G will always just be a G (if the read had an A there then it'd be either a SNP or a sequencing error). Alignments to the top strand give information on C->T transitions. Alignments to the bottom strand give information on G->A transition. No alignment (or pair) will give information on both.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Discrepancy between Bismark SAM file and bismark methylation extractor

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News