SEQanswers

Go Back   SEQanswers > Applications Forums > Epigenetics



Reply
 
Thread Tools
Old 03-01-2017, 12:08 AM   #1
zubr
Junior Member
 
Location: Moscow, Russia

Join Date: Feb 2017
Posts: 2
Default Strong CpG methylation bias between R1 and R2

Hello,
I'm new in bioinformatics, and for first time training I've got the set of WGBS 100bp PE reads from few human cancer tissues.
I've filtered reads with prinseq, sorted, and aligned them with bismark in PE mode to hg38 (prepared with bismark) from ucsc.
Mapping efficiency is ~20% with ~80% C's methylated in CpG context.
OK, low mappability of reads from BS treated DNA has been mentioned many times.
Then I tried to map reads 1 and 2 separately in SE mode.
Read 1: mapping efficiency ~60% with ~80% C's methylated in CpG context.
Read 2: mapping efficiency ~50% with ~40% C's methylated in CpG context.
additional trimming by 10-20 nt from any end of read2 slightly increase mappability, but doesn't affect methylation rate.
This result seems extremely odd to me.
If DNA was treated with BS, how can it happen that only read2 in pair shows 2X less methylation in CpG context?
Does anybody have a fresh look?
Thank you in advance.
zubr is offline   Reply With Quote
Old 03-01-2017, 03:11 PM   #2
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,083
Default

Would you have following information:
1- Kit or method used for library prep
2- Read length
3- Library peak size
4- FastQC output for reads
nucacidhunter is offline   Reply With Quote
Old 03-02-2017, 05:05 AM   #3
zubr
Junior Member
 
Location: Moscow, Russia

Join Date: Feb 2017
Posts: 2
Default

This is what I could extract from core lab personnel:

1- Kit or method used for library prep

Genomic DNA was extracted from tissue, BS treated, sonicated, end repaired, dA-tailed. Then standard illumina adaptors were used for PE sequencing.

2- Read length

100bases (adaptors already trimmed)

3- Library peak size

~200nt

4- FastQC output for reads

sorry, I can't attach picture right now, but fastQC report is good for all reads median quality at 5' end is 30, at 3' end is ~15. And I preformed quality trimming with threshold over 15.
zubr is offline   Reply With Quote
Old 03-02-2017, 05:24 PM   #4
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,083
Default

Generally there are three WGBS library prep methods:
1- Post-ligation bisulfite conversion: DNA fragmentation and standard library preparation with methylated adapters followed by bisulfite conversion and amplification
2- Post-bisulfite conversion library preparation by second strand synthesis of converted ssDNA followed by standard end repair, A tailing and adapter ligation and PCR amplification of double stranded DNA.
3- Post-bisulfite conversion library preparation by synthesise of second strand with random primers appended with one partial Illumina adapter sequence and tagging the 3’ end of new strand with Terminal Tagging Oligo appended with other partial Illumina adapter followed by PCR amplification.

I assume your library was prepared with method 1. Peak size of 200 on average would have insert size of 75 nt so I would expect that large number of reads have been trimmed at 5’ end.

It would be interesting to see the FastQC “per base sequence content” plot for reads and that should show similar portion of converted Cs. For an example see following plots for low diversity RRBS library that shows low %C in R1 and correspondingly low %G in R2. If your plots show similar C and G then issue could be analysis step.

RRBS.pdf
nucacidhunter is offline   Reply With Quote
Old 03-02-2017, 05:45 PM   #5
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 308
Default

Something in this description seems wrong. After bisulfite conversion the DNA should be (mostly) single stranded (since the bisulfite conversion requires single stranded DNA). Thus the standard end-repair, A-tailing and Illumina adapter ligation with Y-adapters will not work.

Quote:
Originally Posted by zubr View Post
This is what I could extract from core lab personnel:

1- Kit or method used for library prep

Genomic DNA was extracted from tissue, BS treated, sonicated, end repaired, dA-tailed. Then standard illumina adaptors were used for PE sequencing.
luc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO