Seqanswers Leaderboard Ad

**fkrueger** · 04-29-2014, 03:21 AM

We just released a new version of Bismark (v0.12.1) which adds the following functionality/fixes:

o Bismark: Added calculation of MAPQ values for SAM/BAM output generated with Bowtie 2 for both single-end and paired-end mode. The calculation is implemented like in Bowtie 2 itself. Mapping quality values are still unavailable for alignments performed with Bowtie and retain a value of 255 throughout
o Bismark: Fixed an uninitialised value warning for PE alignments with Bowtie 2 that occurred whenever Read 2 aligned to the very start of a chromosome (this only affected the warning itself and had no impact on any results)
o coverage2cytosine: all chromosomes or scaffolds are now processed irrespective of whether they were covered in the sequencing experiment or not. Previously, CpG/cytosine reports for genomes with lots of small scaffolds that were not covered by any reads might have had a variable number of lines between experiments

Thanks to all here on this forum and S. Jhanwar for their contributions.

Bismark can be downloaded here: https://www.bioinformatics.babraham....jects/bismark/

**Thias** · 05-12-2014, 10:19 AM

Hello Bismark-Users,

I am a new user of Bismark v 0.12.1 and currently familarizing myself with the typical output generated along the pipeline.

Upon a closer look at the .cov and .bedGraph files generated by bismark_methylation_extractor (1) I noticed the presence of a few thousand lines beginning with an asterisk (*) instead of the usual chromosome name. What do those entries stand for ?

Code:

awk < SAMPLE.cov ' !x[$1]++ {print $1}' | sort

*
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrY

I notice that chrX is missing in the list, which made the think whether those are actually the readouts of chrX? The reference genome used for mapping is the standard mm9 from Illumina iGenomes by the way.

Thanks a lot for your help!

--------------------------------------
1) invoked with the options: -s --comprehensive --bedGraph --counts --cutoff 10 --report

**fkrueger** · 05-12-2014, 11:37 AM

Hi Thias,

I have previously seen a * as the chromosome in a SAM/BAM file when Samtools encounters a read alignment to a chromosome that is not present in the SAM header (the @SQ SN: lines). From the order you posted it seems that the X should have been the second last chromosome in the list, and incidentally I had a report on Friday last week that the second last chromosome name was missing from the SAM header. I have by now tracked and fixed this bug which was caused by missing an increment counter when processing the very last chromosome of a multi-FastA reference file. So the question is, did you use a single multi-FastA reference file as your genome? If so I can send you the latest version of Bismark which fixes this bug.

If you could just do a samtools view -H on your file you could see whether this was the problem. You don't have to redo the alignments, it would suffice if you could simply add the @SQ header line to your Bismark results file (and then maybe run the methylation extraction once more). Can you keep me updated on this via email? Sorry for the inconvenience, Felix

**Thias** · 05-13-2014, 02:53 AM

Originally posted by fkrueger View Post

[...] The second last chromosome name was missing from the SAM header. I have by now tracked and fixed this bug which was caused by missing an increment counter when processing the very last chromosome of a multi-FastA reference file. So the question is, did you use a single multi-FastA reference file as your genome?

Yes, you have nailed it. The header line for chrX is indeed missing in the SAM and it was indeed a multi-FastA which I used as a reference.

**fkrueger** · 05-13-2014, 02:58 AM

Wohoo! Please find the latest version attached. As added bonus it also supports the new large indexes for Bowtie 2 which you might or might not need. Please let me know if there are any problems.

Attached Files

bismark.pl (371.1 KB, 26 views)

**fwessely** · 05-13-2014, 05:28 AM

Hi Felix,

I have just seen a tiny bug, when the option --basename is used, the file name for ambiguous reads is missing an underscore.
Also, if this option is used the basename is not used for the file name of the alignment pie chart (...alignment_overview.png). I'd prefer to use the basename for this file as well.

Thanks
Frank

**fkrueger** · 05-13-2014, 05:41 AM

Hi Frank, I have tried to fix them both now (attached). Cheers, Felix

Attached Files

bismark.pl (371.3 KB, 23 views)

**Thias** · 05-13-2014, 10:10 AM

Originally posted by fkrueger View Post

You don't have to redo the alignments, it would suffice if you could simply add the @SQ header line to your Bismark results file (and then maybe run the methylation extraction once more).

In case someone else would like to do that on a batch of files, this script should help:

Code:

#!/bin/bash
   find  $PWD  -type f -name '*.bam' | grep -v  "out.bam" | while read SAMPLE
    do
     samtools view -H $SAMPLE | sed 's=LN:16299=LN:16299\n@SQ\tSN:chrX\ LN:166650296=' > mynewheader.txt
     samtools reheader mynewheader.txt $SAMPLE > out.bam
      if [ -s out.bam ]
        then
                rm mynewheader.txt $SAMPLE
                mv out.bam $SAMPLE
        fi
    done
exit 0

Of course, you have to adapt the sed command to your specific needs. In my case I took part of the header line above the desired position (LN:16299) and replaced it with the same string followed by a \n (newline) and the desired missing header line. Ths script avoids doing a BAM -> SAM -> BAM conversion, but unfortunately still needs a bit of time because a temporary copy of the BAM is written.

**adeirossi** · 05-14-2014, 01:14 AM

Hi Felix,

A few of our libraries have M-bias plots that show pronounced bias far in from the 5' end. When I first tried to avoid the bias by setting high values for the --ignore and --ignore_r2 options, bismark_methylation_extractor failed:

$ perl bismark_methylation_extractor --paired-end --no_overlap --ignore 35 --ignore_r2 35 --output out --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --gzip --bedGraph --buffer_size 4G 99_26_1_GATCAG_003.bam

*** Bismark methylation extractor version v0.12.1 ***

Summarising Bismark methylation extractor parameters:
===============================================================
Bismark paired-end SAM format specified (default)
First 35 bp will be disregarded when processing the methylation call string of Read 1
First 35 bp will be disregarded when processing the methylation call string of Read 2
Output path specified as: out/

...

substr outside of string at bismark_methylation_extractor line 2199, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $last_op_2 in concatenation (.) or string at bismark_methylation_extractor line 2287, <IN> line 179.
Use of uninitialized value $meth_call in split at bismark_methylation_extractor line 2502, <IN> line 179.
CIGAR string contained a non-matching number of lengths and operations

It seems that bismark_methylation_extractor has issues when --ignore or --ignore_r2 equals or exceeds the alignment length; this was the case in the example above, where a small subset of my reads were trimmed to <=35 bp.

I came up with a little patch that at least prevents failure when I run on my paired-end data (also see attached):

HTML Code:

--- a/bismark_methylation_extractor-0.12.1
+++ b/bismark_methylation_extractor-0.12.1_mod
@@ -2152,7 +2152,12 @@ sub process_Bismark_results_file{
          if ($ignore) {
            ### Clipping off the first <int> number of bases from the methylation call strings as specified with '--ignore <int>' for read 1
            ### the methylation calls have already been reversed where necessary
-           $meth_call_1 = substr($meth_call_1,$ignore,length($meth_call_1)-$ignore);
+           my $len = length($meth_call_1)-$ignore;
+           if ($len <= 0) {
+             next;
+           } else {
+             $meth_call_1 = substr($meth_call_1,$ignore,$len);
+           }

            if ($strand eq '+') {

@@ -2196,7 +2201,12 @@ sub process_Bismark_results_file{
          if ($ignore_r2) {
            ### Clipping off the first <int> number of bases from the methylation call string as specified with '--ignore_r2 <int>' for read 2
            ### the methylation calls have already been reversed where necessary
-           $meth_call_2 = substr($meth_call_2,$ignore_r2,length($meth_call_2)-$ignore_r2);
+           my $len = length($meth_call_2)-$ignore_r2;
+           if ($len <= 0) {
+             next;
+           } else {
+             $meth_call_2 = substr($meth_call_2,$ignore_r2,$len);
+           }

            ### If we are ignoring a part of the sequence we also need to adjust the cigar string accordingly

If you have time, it would be great if you could take a quick look at the patch and let me know if anything looks suspect.

Many thanks,
Andrew

Attached Files

bismark_methylation_extractor-0.12.1_mod.pl (181.9 KB, 29 views)

**fkrueger** · 05-14-2014, 02:48 AM

We just released a new version of Bismark (v0.12.2) which fixes a couple if issues:

o Bismark: Added support for the new 64-bit index files for very large genomes in Bowtie 2 mode. The large genome indexes (ending in .bt2l instead of .bt2 for small genomes) are generated automatically by bismark_genome_preparation and work just as well in the Bismark alignment step
o Bismark: Fixed a bug that would omit the name of the second last chromomome from the SAM header if the genome had been supplied as Multi-FastA file. Everything else, including the alignments, would have been unaffected by this glitch
o Bismark: When the option '--basename' is specified, SE amibiguous file names now feature an underscore in their file name. Also, the pie chart file names are now derived from the the basename
o Methylation Extractor: Introduced a length check when the options --ignore or --ignore_r2 were set to ensure that only reads that are long enough are being processed

Bismark can be downloaded here: http://www.bioinformatics.babraham.a...jects/bismark/

@Andrew: Thanks, this seems to make perfect sense so I have included it in the current release

**blancha** · 05-30-2014, 10:13 AM

Hi,

Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

That way, I could do a downstream analysis with bedtools directly.
After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

Thank you.

**fkrueger** · 05-30-2014, 11:38 AM

Hi blancha, that sounds like a sensible suggestion, I'll look into this when I am back from my holidays.

**fkrueger** · 06-02-2014, 05:01 AM

Originally posted by blancha View Post

Hi,

Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

That way, I could do a downstream analysis with bedtools directly.
After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

Thank you.

I have just changed the '--zero_based' option of the methylation extractor to write out an additional coverage file (ending in .zero.cov) which uses the UCSC zero-based, half-open standard. The cytosine report only writes out a single position (as it a single cytosine), which uses 1-based coords by default but can be changed to 0-based if desired. Please find the files attached, I hope this is what you were after.

Attached Files

zero-based-half-open.zip (40.0 KB, 25 views)

**blancha** · 06-02-2014, 06:52 AM

Hi Felix,

Looks great. Thanks.
I'll try it out.

Thanks.

Alexis

**adeirossi** · 06-04-2014, 05:01 PM

Hi Felix,

On the topic of bismark_methylation_extractor, I have run into an error when I try to run it on BAM files with very few reads:

$ perl bismark_methylation_extractor --paired-end --no_overlap --ignore 2 --ignore_r2 33 --output $outDir --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --gzip --bedGraph --buffer_size 4G 2_267_5_CTTGTA_021.bam
*** Bismark methylation extractor version v0.12.2 ***

Summarising Bismark methylation extractor parameters:
===============================================================
Bismark paired-end SAM format specified (default)
First 2 bp will be disregarded when processing the methylation call string of Read 1
First 33 bp will be disregarded when processing the methylation call string of Read 2
Output path specified as: meth/ctl_results/A73L06H9O/2_267_5_CTTGTA_021/

...

Processed 8 lines from meth/ctl_results/A73L06H9O/2_267_5_CTTGTA_021/2_267_5_CTTGTA_021.bam in total
Total number of methylation call strings processed: 16

Determining maximum read lengths for M-Bias plots
Maximum read length of Read 1: 73
Maximum read length of Read 2: 0

No data sets or points at /mnt/ngs/analysis/software/bismark/bismark_v0.12.2/bismark_methylation_extractor line 649, <IN> line 20.

The script dies when plot() fails to generate a read 2 M-bias plot. Since the absence of M-bias plots does not particularly bother me, I came up with this patch (also see attached) so that plot() failure causes a warning instead of a fatal error:

Code:

--- a/bismark_v0.12.2/bismark_methylation_extractor
+++ b/bismark_methylation_extractor-0.12.2-mod
@@ -558,11 +558,14 @@ sub produce_mbias_plots{

     $graph1->set_legend('CpG methylation','CHG methylation','CHH methylation','CpG total calls','CHG total calls','CHH total calls');

-    my $gd1 = $graph1->plot(\@mbias_read1) or die $graph1->error;
+    if (my $gd1 = $graph1->plot(\@mbias_read1)) {
+      open (MBIAS_G1,'>',$mbias_graph_1) or die "Failed to write to file for M-bias plot 1: $!\n\n";
+      binmode MBIAS_G1;
+      print MBIAS_G1 $gd1->png;
+    } else {
+      warn "WARNING: Cannot generate read 1 M-bias plot: " . $graph1->error;
+    }

-    open (MBIAS_G1,'>',$mbias_graph_1) or die "Failed to write to file for M-bias plot 1: $!\n\n";
-    binmode MBIAS_G1;
-    print MBIAS_G1 $gd1->png;
   }

   if ($paired){
@@ -646,11 +649,13 @@ sub produce_mbias_plots{
                  ) or die $graph2->error;

       $graph2->set_legend('CpG methylation','CHG methylation','CHH methylation','CpG total calls','CHG total calls','CHH total calls');
-      my $gd2 = $graph2->plot(\@mbias_read2) or die $graph2->error;
-
-      open (MBIAS_G2,'>',$mbias_graph_2) or die "Failed to write to file for M-bias plot 2: $!\n\n";
-      binmode MBIAS_G2;
-      print MBIAS_G2 $gd2->png;
+      if (my $gd2 = $graph2->plot(\@mbias_read2)) {
+        open (MBIAS_G2,'>',$mbias_graph_2) or die "Failed to write to file for M-bias plot 2: $!\n\n";
+        binmode MBIAS_G2;
+        print MBIAS_G2 $gd2->png;
+      } else {
+        warn "WARNING: Cannot generate read 2 M-bias plot: " . $graph2->error;
+      }

     }
   }

There may be a more elegant way to avoid the error, but this patch at least allowed the extractor to get past the plotting step and generate bedGraph output, so I thought I'd share. I'd be curious to hear your thoughts.

Thanks,
Andrew

Attached Files

bismark_methylation_extractor-0.12.2-mod.pl (182.2 KB, 24 views)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News