Unconfigured Ad

**fkrueger** · 04-10-2017, 01:20 PM

Originally posted by Kuckunniwi View Post

Hi,
27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG 16 chr10 71747 255 51M * 0 0 CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC CG 3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG NM:i:9 MD:Z:2AAA6CA3A2A2A24A3 XM:Z:Z..x........................x..z..h...h.......hhx..

The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.

Files can be truncated if the file was copied and the transfer corrupted the files, or potentially if the alignment process was terminated abruptly. This may happen either by pressing ctrl + C or something similar, or if the Bismark align process runs out of memory or gets killed by your OS.
Normally it is sufficient to run the alignments again (making sure that you have enough system resources available). If you can see the final Bismark alignment report the file should be ready for the methylation extraction. Best, Felix

**Kuckunniwi** · 04-10-2017, 07:34 PM

Originally posted by fkrueger View Post

The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.

Thanks for the answer! Actually the file is not truncated (it has a lot more lines while it will end exactly at this line in case of truncated file), most probably I was not able to copy the full line from the

Code:

less -S file.sam

. Also, these files were processed once before. I will check if I accidentally truncated line after copying it and will edit this answer...

UPD: yes, this .bam file does not contain any additional columns. I really wonder how they were processed before (I have bismark report for these files!) and how columns, not rows, may be truncated... ='(

**pig_raffles** · 04-24-2017, 06:39 AM

Hi,

I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

Many thanks

**fkrueger** · 04-24-2017, 07:20 AM

The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

**fkrueger** · 04-24-2017, 07:27 AM

Originally posted by pig_raffles View Post

Hi,

I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

Many thanks

The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

**fkrueger** · 04-24-2017, 07:36 AM

The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

Best wishes, Felix

**pig_raffles** · 04-25-2017, 01:46 AM

Excellent, thank you for the very clear explanation

**twotwo** · 05-11-2017, 01:39 PM

Originally posted by Ahra View Post

hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

According to user guide book,

A typical command including the optional bedGraph --counts output could look like this:
bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam

my data is paired-end so i edited little that one. like this:

bismark_methylation_extractor -p --no_overlap --comprehensive --CX
--bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam

but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

my result was like thins

Bismark methylation extractor version v0.7.11
HWI-D00111:99

26ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
HWI-D00111:99

26ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
HWI-D00111:99

26ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
HWI-D00111:99

26ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

how can i get the methylation percentage information and run further bedGraph2Cytosines process!

i am looking forward anyone's reply

I have the same question as yours. Did you solve your problem? Then which command should I use the generate the methylation percentage?

**fkrueger** · 05-11-2017, 01:51 PM

A typical command to extract methylation calls and get a coverage file is:

Code:

bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam

Do you have any problems running this command, and if so can you post details?

**twotwo** · 05-12-2017, 07:04 AM

Originally posted by fkrueger View Post

A typical command to extract methylation calls and get a coverage file is:

Code:

bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam

Do you have any problems running this command, and if so can you post details?

Hi, Thanks fkrueger! I ran your code. The following is the head what I have from the bedgraph file:

track type=bedGraph
chr11 110189 110190 100
chr11 110211 110212 100
chr11 113464 113465 100
chr11 113508 113509 100
chr11 113509 113510 100
chr11 113524 113525 100
chr11 113525 113526 100
chr11 123420 123421 100
chr11 123449 123450 100
So my question is: Is there all of the probe having 100% methylation percentage? That is a little weird....
By the way, I have another question: If I have paired samples, can I compare the methylation percentage with bismark? Thanks!

**fkrueger** · 05-12-2017, 07:32 AM

You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

Best, Felix

**twotwo** · 05-12-2017, 10:03 AM

Originally posted by fkrueger View Post

You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

Best, Felix

Hi, Felix,
Thanks for your quick answer. Here is the head of the coverage file. Does that mean that I should merge the data (get all the information for one SNP) and get the methylation percentage?

chr11 110190 110190 100 1 0
chr11 110212 110212 100 2 0
chr11 113465 113465 100 1 0
chr11 113509 113509 100 4 0
chr11 113510 113510 100 1 0
chr11 113525 113525 100 2 0
chr11 113526 113526 100 1 0
chr11 123421 123421 100 1 0
chr11 123450 123450 100 1 0
chr11 123849 123849 100 5 0

**fkrueger** · 05-12-2017, 12:33 PM

I am not quite sure if I understand your question here to be honest.

Code:

chr11 113509 113509 100 4 0

This example line means that for the position 113509 on chromosome 11 you had 4 methylation calls in total that were methylated (in the entire dataset), and 0 calls that were unmethylated. This translates into a 100% methylation percentage at this position (column 4). Also, the positions here are simply cytosines in the genome but not SNP.

Just to remind you this this the format:

Code:

The coverage output looks like this (tab-delimited, 1-based genomic coords):
============================================================================================================================================

<chromosome>  <start position>  <end position>  <methylation percentage>  <count methylated>  <count non-methylated>

I hope this helps.

**twotwo** · 05-12-2017, 12:43 PM

Thank you very much!

**Juulluu21** · 06-28-2017, 08:50 PM

Data Analysis with Bismark

We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

Both group was sequenced in paired-end fashion.

I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

C methylated in CpG context: 0.6%

C methylated in CHG context: 0.5%

C methylated in CHH context: 0.7%

Whereas after running Bismark on my Control files I got these percentages:

C methylated in CpG context: 99.6%

C methylated in CHG context: 99.3%

C methylated in CHH context: 99.9%

So, how would I interpret my data?

a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

Thank you very much for reading this long post!!

Bests!!!

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News