SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Methods for DNAm at 350 CpG sites in Humans CPR2020 Epigenetics 0 05-05-2017 12:42 PM
Some Covered CpG sites is extremely high for RRBS with BiSeq package reprogrammer Bioinformatics 3 06-21-2016 11:19 AM
bismark score_min; CpG cytosine? mbk0asis Epigenetics 1 04-01-2016 01:27 AM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 05:40 AM

Reply
 
Thread Tools
Old 08-13-2018, 11:21 PM   #1
Hedi86
Member
 
Location: Norway

Join Date: Oct 2017
Posts: 13
Default CpG sites in Bismark

Hello

im trying to calculate the percentage of covered CpG sites in my RRBS library and compare it with total CpG sites in reference genome. i got splitting report from Bismark (see bellow)

q1- could i say CpG sites in my RRBS library are equal to number of Total methylated C's in CpG context + number of Total C to T conversions in CpG context (around 19 million) ? if No how i can find total CpG sites in RRBS library?

q2- i downloaded pig CGI annotation and counted all CpG sites but the total was around 2 million. sound very low for me. how i can find the actual number of CpG sites in reference genome?

q3- is there a way to determine CpG sites per chromosome and compare it with CpG sites in each chromosome of reference genome?


Final Cytosine Methylation Report
=================================
Total number of C's analysed: 141645338

Total methylated C's in CpG context: 7904886
Total methylated C's in CHG context: 50683
Total methylated C's in CHH context: 107717

Total C to T conversions in CpG context: 12298571
Total C to T conversions in CHG context: 35912924
Total C to T conversions in CHH context: 85370557
Hedi86 is offline   Reply With Quote
Old 08-14-2018, 03:03 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 609
Default

Hi Hedi,

Quote:
q1- could i say CpG sites in my RRBS library are equal to number of Total methylated C's in CpG context + number of Total C to T conversions in CpG context (around 19 million) ? if No how i can find total CpG sites in RRBS library?
No, I'm afraid you canít say that. The numbers reported are the overall numbers of methylation calls performed for the entire run, and have nothing to do with the number of genomic positions covered. If you want to find out how many Cs were covered in your experiment you generate a coverage file where each line corresponds to a covered C position. So the number of lines in the file (zcat file.cov.gz | wc -l) is the number of positions covered in your experiment.

Quote:
q2- i downloaded pig CGI annotation and counted all CpG sites but the total was around 2 million. sound very low for me. how i can find the actual number of CpG sites in reference genome?
You could use
Code:
bam2nuc
(part of Bismark) to find out the number of Cs, or CpGs, in the genome. Here is the output for the Sscrofa11.1 build (genome-wide).

Code:
A       717891230
AA      237125812
AC      124343360
AG      171421615
AT      185000140
C       517402066
CA      178358877
CC      136906913
CG      30619972
CT      171516061
G       517706165
GA      147162051
GC      108922386
GG      136983938
GT      124637555
T       719048243
TA      155244114
TC      147229152
TG      178680414
TT      237894187
CGIs are only a small, albeit CG-rich, fraction of the genome, so 2M doesnít sound too bad.


Quote:
q3- is there a way to determine CpG sites per chromosome and compare it with CpG sites in each chromosome of reference genome?
I would suggest you use SeqMonk for this kind of work. You need to keep in mind though that RRBS only expects to cover ~1-2% of the genome at very specific positions, so getting an idea about how many CpG were covered per chromosome is almost certainly not anything you should be interested in.
fkrueger is online now   Reply With Quote
Old 08-15-2018, 02:27 AM   #3
Hedi86
Member
 
Location: Norway

Join Date: Oct 2017
Posts: 13
Default

thank you for your advice and help. in methylkit using following command you can get coverage as well. but im wondering is it CpG coverage or read coverage? they used both definitions in their tutorial (https://www.bioconductor.org/package...ics_on_samples) . is it different with your suggested way of CpG coverage calculation?

getCoverageStats(my.methRaw[[1]],plot = F,both.strands = FALSE)
read coverage statistics per base
summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 12.00 15.00 28.25 20.00 131376.00

thanks again
Hedi86 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO