SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CisGenome -- an integrated tool for ChIP-seq data analysis hji Bioinformatics 66 12-30-2014 01:55 PM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 05:40 AM
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments fkrueger Bioinformatics 6 03-19-2012 05:06 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 08:43 AM

Reply
 
Thread Tools
Old 03-26-2014, 05:22 AM   #341
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Hi Pete,
I have just fixed this, it should never actually attempt to look for files in the --mbias_only mode. Cheers, Felix
fkrueger is offline   Reply With Quote
Old 04-08-2014, 05:46 AM   #342
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

We just released a new version of Bismark (v0.11.1) which adds the following functionality/fixes:

o Bismark: The option --pbat now also works for use with Bowtie 2, in both single-end and paired-end mode. The only limitation to this is that it only works with FastQ files and uncompressed temporary files
o Bismark: Changed the order in which the @SQ lines are written out to the SAM/BAM header from random to the same order they are being read in from the genomes folder (or the order of the files in which they occur within a multi-FastA file)
o Bismark: Included a new option '-B/--basename basename' for output files instead of deriving these names from the input file. --basename takes precedence over the option --prefix.
o Bismark: Unmapped or ambiguous files now end in .fq or.fa for FastA or FastQ files, respectively (instead of .txt files)
o Methylation extractor: willl no longer attempt to delete unused files if --mbias_only was speficied
o Methylation extractor: Added a test to see if a file that does not end in .bam is in fact a BAM file, and if this succeeds open the file using Samtools view

Bismark can be downloaded here: www.bioinformatics.babraham.ac.uk/projects/
fkrueger is offline   Reply With Quote
Old 04-25-2014, 02:37 PM   #343
adeirossi
Member
 
Location: California

Join Date: Jan 2012
Posts: 11
Default

Hi Felix,

I took a crack at adding MAPQ values to Bismark. Following Devon's suggestion, I tried to apply the same MAPQ calculation used in bowtie2. I only made changes to paired-end mode.

The attached Perl script was modified from bismark-0.11.1. Let me know what you think if you have time and interest.

Cheers,
Andrew
Attached Files
File Type: pl bismark-0.11.1-pe_mapq.pl (366.8 KB, 6 views)
adeirossi is offline   Reply With Quote
Old 04-26-2014, 04:39 AM   #344
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Quote:
Originally Posted by adeirossi View Post
Hi Felix,

I took a crack at adding MAPQ values to Bismark. Following Devon's suggestion, I tried to apply the same MAPQ calculation used in bowtie2. I only made changes to paired-end mode.

The attached Perl script was modified from bismark-0.11.1. Let me know what you think if you have time and interest.

Cheers,
Andrew
Hi Andrew,

Thanks a lot for having a go at this, I'll try to have a look next week and implement it for the next release. Let's hope this serves as an incentive for me to extend it to single-end mode as well.

Best, Felix
fkrueger is offline   Reply With Quote
Old 04-28-2014, 08:00 AM   #345
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Hi Andrew,

Thanks very much for your initiative with the MAPQ values! I have reviewed your suggestions and extended their implementation to single-end mode (for use with Bowtie 2 only). I have uploaded a tentative version 0.12.1 (attached), would you mind testing it to see if it works as expected?

Cheers, Felix
Attached Files
File Type: pl bismark.pl (368.2 KB, 1 views)
fkrueger is offline   Reply With Quote
Old 04-28-2014, 04:09 PM   #346
adeirossi
Member
 
Location: California

Join Date: Jan 2012
Posts: 11
Default

Hi Felix,

Thank you for taking a look and extending the MAPQ value reporting to single-end mode so quickly. I have tested your bismark.pl script in both paired- and single-end modes and it works as expected.

There is, however, one minor issue that does not affect the SAM output. (It was not caused by the recent changes; it happens in v0.11.1 as well.) Occasionally I see warnings like this:
Use of uninitialized value in numeric ne (!=) at ./bismark line 3201, <$__ANONIO__> line 578.
Chromosomal sequence could not be extracted for SEQUENCER12:47:H7FFHADXX:1:1206:6069:83277_1:N:0:AGATGT J02459.1 107
The issue occurs in paired-end mode when read 2 maps to the first base of a chromosome (in this case, lambda DNA). When this happens, extract_corresponding_genomic_sequence_paired_ends_bowtie2() returns without defining $methylation_call_params->{$identifier}->{unmodified_genomic_sequence_1}, which is then tested on line 3201, resulting in the "uninitialized value" warning and the printing of position_1 instead of position_2. I added one line to your bismark.pl script (see attached) to fix this behavior; now only (what I assume to be) the intended warning is printed, showing that position_2 is equal to 1:
Chromosomal sequence could not be extracted for SEQUENCER12:47:H7FFHADXX:1:1206:6069:83277_1:N:0:AGATGT J02459.1 1
Can you confirm that this change is an improvement?

Thanks,
Andrew
Attached Files
File Type: pl bismark_fix_uninit.pl (368.3 KB, 4 views)
adeirossi is offline   Reply With Quote
Old 04-29-2014, 03:21 AM   #347
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

We just released a new version of Bismark (v0.12.1) which adds the following functionality/fixes:

o Bismark: Added calculation of MAPQ values for SAM/BAM output generated with Bowtie 2 for both single-end and paired-end mode. The calculation is implemented like in Bowtie 2 itself. Mapping quality values are still unavailable for alignments performed with Bowtie and retain a value of 255 throughout
o Bismark: Fixed an uninitialised value warning for PE alignments with Bowtie 2 that occurred whenever Read 2 aligned to the very start of a chromosome (this only affected the warning itself and had no impact on any results)
o coverage2cytosine: all chromosomes or scaffolds are now processed irrespective of whether they were covered in the sequencing experiment or not. Previously, CpG/cytosine reports for genomes with lots of small scaffolds that were not covered by any reads might have had a variable number of lines between experiments

Thanks to all here on this forum and S. Jhanwar for their contributions.

Bismark can be downloaded here: http://www.bioinformatics.babraham.a...jects/bismark/
fkrueger is offline   Reply With Quote
Old 05-12-2014, 10:19 AM   #348
Thias
Member
 
Location: Münster, Germany

Join Date: Mar 2013
Posts: 42
Default

Hello Bismark-Users,

I am a new user of Bismark v 0.12.1 and currently familarizing myself with the typical output generated along the pipeline.

Upon a closer look at the .cov and .bedGraph files generated by bismark_methylation_extractor (1) I noticed the presence of a few thousand lines beginning with an asterisk (*) instead of the usual chromosome name. What do those entries stand for ?

Code:
awk < SAMPLE.cov ' !x[$1]++ {print $1}' | sort

*
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrY
I notice that chrX is missing in the list, which made the think whether those are actually the readouts of chrX? The reference genome used for mapping is the standard mm9 from Illumina iGenomes by the way.

Thanks a lot for your help!

--------------------------------------
1) invoked with the options: -s --comprehensive --bedGraph --counts --cutoff 10 --report
Thias is offline   Reply With Quote
Old 05-12-2014, 11:37 AM   #349
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Hi Thias,

I have previously seen a * as the chromosome in a SAM/BAM file when Samtools encounters a read alignment to a chromosome that is not present in the SAM header (the @SQ SN: lines). From the order you posted it seems that the X should have been the second last chromosome in the list, and incidentally I had a report on Friday last week that the second last chromosome name was missing from the SAM header. I have by now tracked and fixed this bug which was caused by missing an increment counter when processing the very last chromosome of a multi-FastA reference file. So the question is, did you use a single multi-FastA reference file as your genome? If so I can send you the latest version of Bismark which fixes this bug.

If you could just do a samtools view -H on your file you could see whether this was the problem. You don't have to redo the alignments, it would suffice if you could simply add the @SQ header line to your Bismark results file (and then maybe run the methylation extraction once more). Can you keep me updated on this via email? Sorry for the inconvenience, Felix
fkrueger is offline   Reply With Quote
Old 05-13-2014, 02:53 AM   #350
Thias
Member
 
Location: Münster, Germany

Join Date: Mar 2013
Posts: 42
Default

Quote:
Originally Posted by fkrueger View Post
[...] The second last chromosome name was missing from the SAM header. I have by now tracked and fixed this bug which was caused by missing an increment counter when processing the very last chromosome of a multi-FastA reference file. So the question is, did you use a single multi-FastA reference file as your genome?
Yes, you have nailed it. The header line for chrX is indeed missing in the SAM and it was indeed a multi-FastA which I used as a reference.
Thias is offline   Reply With Quote
Old 05-13-2014, 02:58 AM   #351
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Wohoo! Please find the latest version attached. As added bonus it also supports the new large indexes for Bowtie 2 which you might or might not need. Please let me know if there are any problems.
Attached Files
File Type: pl bismark.pl (371.1 KB, 5 views)
fkrueger is offline   Reply With Quote
Old 05-13-2014, 05:28 AM   #352
fwessely
Junior Member
 
Location: UK

Join Date: Oct 2011
Posts: 3
Default

Hi Felix,

I have just seen a tiny bug, when the option --basename is used, the file name for ambiguous reads is missing an underscore.
Also, if this option is used the basename is not used for the file name of the alignment pie chart (...alignment_overview.png). I'd prefer to use the basename for this file as well.

Thanks
Frank
fwessely is offline   Reply With Quote
Old 05-13-2014, 05:41 AM   #353
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Hi Frank, I have tried to fix them both now (attached). Cheers, Felix
Attached Files
File Type: pl bismark.pl (371.3 KB, 3 views)
fkrueger is offline   Reply With Quote
Old 05-13-2014, 10:10 AM   #354
Thias
Member
 
Location: Münster, Germany

Join Date: Mar 2013
Posts: 42
Default

Quote:
Originally Posted by fkrueger View Post
You don't have to redo the alignments, it would suffice if you could simply add the @SQ header line to your Bismark results file (and then maybe run the methylation extraction once more).
In case someone else would like to do that on a batch of files, this script should help:

Code:
#!/bin/bash
   find  $PWD  -type f -name '*.bam' | grep -v  "out.bam" | while read SAMPLE
    do
     samtools view -H $SAMPLE | sed 's=LN:16299=LN:16299\n@SQ\tSN:chrX\ LN:166650296=' > mynewheader.txt
     samtools reheader mynewheader.txt $SAMPLE > out.bam
      if [ -s out.bam ]
        then
                rm mynewheader.txt $SAMPLE
                mv out.bam $SAMPLE
        fi
    done
exit 0
Of course, you have to adapt the sed command to your specific needs. In my case I took part of the header line above the desired position (LN:16299) and replaced it with the same string followed by a \n (newline) and the desired missing header line. Ths script avoids doing a BAM -> SAM -> BAM conversion, but unfortunately still needs a bit of time because a temporary copy of the BAM is written.

Last edited by Thias; 05-14-2014 at 02:47 AM. Reason: Added a grep -v in the script to avoid find picking up the out.bam itself
Thias is offline   Reply With Quote
Old 05-14-2014, 01:14 AM   #355
adeirossi
Member
 
Location: California

Join Date: Jan 2012
Posts: 11
Default

Hi Felix,

A few of our libraries have M-bias plots that show pronounced bias far in from the 5' end. When I first tried to avoid the bias by setting high values for the --ignore and --ignore_r2 options, bismark_methylation_extractor failed:
$ perl bismark_methylation_extractor --paired-end --no_overlap --ignore 35 --ignore_r2 35 --output out --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --gzip --bedGraph --buffer_size 4G 99_26_1_GATCAG_003.bam

*** Bismark methylation extractor version v0.12.1 ***

Summarising Bismark methylation extractor parameters:
===============================================================
Bismark paired-end SAM format specified (default)
First 35 bp will be disregarded when processing the methylation call string of Read 1
First 35 bp will be disregarded when processing the methylation call string of Read 2
Output path specified as: out/

...

substr outside of string at bismark_methylation_extractor line 2199, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2225, <IN> line 179.
Use of uninitialized value $op in string eq at bismark_methylation_extractor line 2230, <IN> line 179.
Use of uninitialized value $last_op_2 in concatenation (.) or string at bismark_methylation_extractor line 2287, <IN> line 179.
Use of uninitialized value $meth_call in split at bismark_methylation_extractor line 2502, <IN> line 179.
CIGAR string contained a non-matching number of lengths and operations
It seems that bismark_methylation_extractor has issues when --ignore or --ignore_r2 equals or exceeds the alignment length; this was the case in the example above, where a small subset of my reads were trimmed to <=35 bp.

I came up with a little patch that at least prevents failure when I run on my paired-end data (also see attached):

HTML Code:
--- a/bismark_methylation_extractor-0.12.1
+++ b/bismark_methylation_extractor-0.12.1_mod
@@ -2152,7 +2152,12 @@ sub process_Bismark_results_file{
          if ($ignore) {
            ### Clipping off the first <int> number of bases from the methylation call strings as specified with '--ignore <int>' for read 1
            ### the methylation calls have already been reversed where necessary
-           $meth_call_1 = substr($meth_call_1,$ignore,length($meth_call_1)-$ignore);
+           my $len = length($meth_call_1)-$ignore;
+           if ($len <= 0) {
+             next;
+           } else {
+             $meth_call_1 = substr($meth_call_1,$ignore,$len);
+           }

            if ($strand eq '+') {

@@ -2196,7 +2201,12 @@ sub process_Bismark_results_file{
          if ($ignore_r2) {
            ### Clipping off the first <int> number of bases from the methylation call string as specified with '--ignore_r2 <int>' for read 2
            ### the methylation calls have already been reversed where necessary
-           $meth_call_2 = substr($meth_call_2,$ignore_r2,length($meth_call_2)-$ignore_r2);
+           my $len = length($meth_call_2)-$ignore_r2;
+           if ($len <= 0) {
+             next;
+           } else {
+             $meth_call_2 = substr($meth_call_2,$ignore_r2,$len);
+           }

            ### If we are ignoring a part of the sequence we also need to adjust the cigar string accordingly
If you have time, it would be great if you could take a quick look at the patch and let me know if anything looks suspect.

Many thanks,
Andrew
Attached Files
File Type: pl bismark_methylation_extractor-0.12.1_mod.pl (181.9 KB, 4 views)
adeirossi is offline   Reply With Quote
Old 05-14-2014, 02:48 AM   #356
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

We just released a new version of Bismark (v0.12.2) which fixes a couple if issues:

o Bismark: Added support for the new 64-bit index files for very large genomes in Bowtie 2 mode. The large genome indexes (ending in .bt2l instead of .bt2 for small genomes) are generated automatically by bismark_genome_preparation and work just as well in the Bismark alignment step
o Bismark: Fixed a bug that would omit the name of the second last chromomome from the SAM header if the genome had been supplied as Multi-FastA file. Everything else, including the alignments, would have been unaffected by this glitch
o Bismark: When the option '--basename' is specified, SE amibiguous file names now feature an underscore in their file name. Also, the pie chart file names are now derived from the the basename
o Methylation Extractor: Introduced a length check when the options --ignore or --ignore_r2 were set to ensure that only reads that are long enough are being processed

Bismark can be downloaded here: http://www.bioinformatics.babraham.a...jects/bismark/

@Andrew: Thanks, this seems to make perfect sense so I have included it in the current release
fkrueger is offline   Reply With Quote
Old 05-30-2014, 10:13 AM   #357
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Hi,

Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

That way, I could do a downstream analysis with bedtools directly.
After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

Thank you.
blancha is offline   Reply With Quote
Old 05-30-2014, 11:38 AM   #358
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Hi blancha, that sounds like a sensible suggestion, I'll look into this when I am back from my holidays.
fkrueger is offline   Reply With Quote
Old 06-02-2014, 05:01 AM   #359
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 619
Default

Quote:
Originally Posted by blancha View Post
Hi,

Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

That way, I could do a downstream analysis with bedtools directly.
After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

Thank you.
I have just changed the '--zero_based' option of the methylation extractor to write out an additional coverage file (ending in .zero.cov) which uses the UCSC zero-based, half-open standard. The cytosine report only writes out a single position (as it a single cytosine), which uses 1-based coords by default but can be changed to 0-based if desired. Please find the files attached, I hope this is what you were after.
Attached Files
File Type: zip zero-based-half-open.zip (40.0 KB, 2 views)
fkrueger is offline   Reply With Quote
Old 06-02-2014, 06:52 AM   #360
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Hi Felix,

Looks great. Thanks.
I'll try it out.

Thanks.

Alexis
blancha is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methylation, sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO