Seqanswers Leaderboard Ad

**Jon17** · 09-21-2016, 11:43 AM

Seems like there are a lot of pipelines out there that get big differences in methylation results. Is there a paper out that that shows a complete workflow and has control data and output results I can compare?

I can get bismark to work but I'm doing epigenetic sequence capture and want to know my results are in line with industry standard methods.

**Jon17** · 09-30-2016, 07:13 AM

How do you handle control DNA?

I have Bisulfite Sequencing data with lambda control (NC_001416) spike in.

I'm looking at the alignment files (SAM) and I see there are reads aligned to the lambda (NC_001416) reference.

So how do I separate methyl stats of these two samples? bismark_methylation_extractor seems to give me only one set of statistics and doesn't differentiate the control group from the sample group.

I don't see any guidance in the manual or tutorials.

This command is not ideal: --split_by_chromosome

**fkrueger** · 09-30-2016, 07:39 AM

You could either align the to the lambda sequence by itself which should be really quick (probably several hundred million sequences/hour), or if you have the output already I would probably just filter for reads containing the Lambda sequence (NC_001416) and then run the bismark_methylation_extractor on that new SAM file. A command like this should get you the alignments:

cat current_alignment_file.sam | grep NC_001416 > lambda_alignments.sam

bismark_methylation_extracor --bedGraph --gzip lambda_alignments.sam

If you are starting from BAM files the commands need to include Samtools but the idea is the same. I hope this helps. Cheers, Felix

**Jon17** · 10-26-2016, 06:58 PM

How does Bismark handle multiple read groups?

I know you can set the read group flag in Bismark but that's only for 1 read group. What if you have multiple? Does Bismark align those read-groups separately against the reference? Do any re-calibrations?

GATK seems to imply recalibration (BQSR) is necessary for SNP calling. What about methylation calls?

I'm wondering if I should:

1) Run Bismark on each read group separately
2) Combine the multiple bam files
3) Re-calibrate base scores (BQSR)
4) Run bismark_methylation_extractor on the final re-calibrated bam file

**fkrueger** · 10-26-2016, 11:53 PM

Hi Jon,

Bismark allows you to specify a read group for each sample, so that if you were to combine several different samples afterwards you could still tell which read came from which sequencing run. Other than that I am afraid the methylation extractor doesn't do anything sophisticated such as the BQSR step, it will simply extract the methylation information from the BAM files as is.

In contrast to tools like GATK which specifically ask you not to perform any quality trimming up-front but rather use soft clipping during the alignment step, Bismark is designed to to be used only with properly adapter and base call quality trimmed data (Trim Galore uses a Phred score cut-off of 20 by default). Because of this additional step we do not check the basecall quality again during the extraction process. I hope this helps, Best, Felix

**Jon17** · 10-31-2016, 04:51 PM

Thank you for the response. I'm currently using GATK and Picard for alignment metrics (depth of coverage, insert size, on target bases, FOLD_80_BASE_PENALTY, etc). Given I'm using trim_galore and bismark, would you consider that use of GATK & Picard ok? Or should I use a different tool? What do you recommend?

If Bismark is "designed to to be used only with properly adapter and base call quality trimmed data", what are the pitfalls?

**fkrueger** · 11-01-2016, 01:42 AM

Hi Jon,

I would assume that GATK and Picard will probably run but I don't know exactly what they need to look at the metrics you mentioned, you might have to ask their developers to be honest. We personally use SeqMonk to get all of the stats we are interested in. It is a genome browser that does all sorts of other relevant quantitations as well. Cheers, Felix

**Jon17** · 12-07-2016, 11:36 AM

Is there a way to use your tools to get CpG coverage? For instance

55% of CpG sites are at 30x
45% of CpG sites are at 20x
25% of CpG sites are at 10x

As well as methylation rates given certain depths at CpG sites?

55% of CpG sites are methylated at 30x
45% of CpG sites are methylated at 20x
25% of CpG sites are methylated at 10x

**fkrueger** · 12-07-2016, 12:56 PM

Not directly I am afraid, but one could probably write a quick script (or even use some clever awk/sed constructs) to extract this kind of information from either the coverage files or genome-wide methylation reports.

**chxu02** · 01-30-2017, 06:41 AM

Hi Felix,
When I tried to run the recently added script:

Code:

filter_non_conversion

I got the following error message:

Code:

syntax error at ~/bismark-0.17.0/filter_non_conversion line 280, near "last nless ("
Execution of ~/bismark-0.17.0/filter_non_conversion aborted due to compilation errors.

**fkrueger** · 01-30-2017, 07:02 AM

Sorry for this, I have just fixed this typo. Please clone the latest development version here:

GitHub - FelixKrueger/Bismark: A tool to map bisulfite converted sequence reads and determine cytosine methylation states

https://github.com/FelixKrueger/Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states - FelixKrueger/Bismark

Cheers, Felix

**seqfast** · 02-07-2017, 12:51 PM

Directional/Non-directional amplicons

Hi (hopefully) Felix,

I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

Final Alignment report
======================
Sequences analysed in total: 57545
Number of alignments with a unique best hit from the different alignments: 36008
Mapping efficiency: 62.6%
Sequences with no alignments under any condition: 21427
Sequences did not map uniquely: 110
Sequences which were discarded because genomic sequence could not be extracted: 0

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT: 1667 ((converted) top strand)
CT/GA: 13421 ((converted) bottom strand)
GA/CT: 1088 (complementary to (converted) top strand)
GA/GA: 19832 (complementary to (converted) bottom strand)

Final Cytosine Methylation Report
=================================
Total number of C's analysed: 448283

Total methylated C's in CpG context: 22378
Total methylated C's in CHG context: 6624
Total methylated C's in CHH context: 5983

Total unmethylated C's in CpG context: 9707
Total unmethylated C's in CHG context: 221431
Total unmethylated C's in CHH context: 182160

C methylated in CpG context: 69.7%
C methylated in CHG context: 2.9%
C methylated in CHH context: 3.2%

**fkrueger** · 02-08-2017, 02:33 AM

Originally posted by seqfast View Post

Hi (hopefully) Felix,

I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

Final Alignment report
======================
Sequences analysed in total: 57545
Number of alignments with a unique best hit from the different alignments: 36008
Mapping efficiency: 62.6%
Sequences with no alignments under any condition: 21427
Sequences did not map uniquely: 110
Sequences which were discarded because genomic sequence could not be extracted: 0

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT: 1667 ((converted) top strand)
CT/GA: 13421 ((converted) bottom strand)
GA/CT: 1088 (complementary to (converted) top strand)
GA/GA: 19832 (complementary to (converted) bottom strand)

Final Cytosine Methylation Report
=================================
Total number of C's analysed: 448283

Total methylated C's in CpG context: 22378
Total methylated C's in CHG context: 6624
Total methylated C's in CHH context: 5983

Total unmethylated C's in CpG context: 9707
Total unmethylated C's in CHG context: 221431
Total unmethylated C's in CHH context: 182160

C methylated in CpG context: 69.7%
C methylated in CHG context: 2.9%
C methylated in CHH context: 3.2%

Hi seqfast,

When you are generating and sequencing PCR amplified regions it is quite common that you see only the top strand (with both the OT and CTOT strands) or the bottom strand (with both OB and CTOB as in your case), depending on which strand you targeted when designing the primers.

To help getting your head around this it really helps to draw the sequence of an amplicon of interest out on a sheet of paper. You should see that the bisulfite treatment will change one of the two strands so much that the oligos will only amplify one of the two strands, and this PCR product is then usually sequenced from both sides, e.g. the CTOB and OB strands, but this may be different depending on how the library preparation was done.

So in short: PCR amplicons are not normally directional libraries but you only sequence both versions of either the top or the bottom strand, depending on how the primers were designed. I hope this helps, let me know if I can be of any further assistance. Cheers, Felix

**seqfast** · 02-09-2017, 09:21 AM

Thank you Felix,

I've been told to PCR primers are 'agnostic' to bisulfite conversion, but I believe there will still be some effects and a likelihood of some strand-favoring in the amplification.

Appreciate your quick reply, and thanks for a very valuable tool!

-sf

**Kuckunniwi** · 04-10-2017, 10:04 AM

Hi,

I am trying to launch methylation_extractor and it does not like my files =( It says:

Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1654, <IN> line 27.
Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1657, <IN> line 27.
Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1660, <IN> line 27.
Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1663, <IN> line 27.
Use of uninitialized value $read_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
Use of uninitialized value $genome_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
Unexpected combination of read and genome conversion: '' / ''

The line 27 is the first read of sam file:

Code:

27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG    16      chr10   71747   255     51M     *       0       0       CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC     CG     3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG     NM:i:9  MD:Z:2AAA6CA3A2A2A24A3  XM:Z:Z..x........................x..z..h...h.......hhx..

The methylation extractor worked well for me before (even for the same files), what can cause the problem? (I've tried to play with cmd modes, did not help...)

The command line: ./bismark_methylation_extractor -s test.sam

Thanks!

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News