Seqanswers Leaderboard Ad

**bmartinez** · 12-08-2015, 12:34 PM

problems with bismark2bedGraph and coverage2cytosine to get methylations extracted

Hi everyone,

I am analysing WGBS data with Bismark v0.14.5. I have trimmed, aligned the data with Bowtie and deduplicated with no issues. However, I am having problems to extract the methylations. My genome is in 47,100 scaffolds. The command I am using is:

bismark_methylation_extractor -p --comprehensive --merge_non_CpG --samtools_path /opt/samtools-0.1.19 --genome_folder /path_to_genome/Bisulfite_Genome_BowtieOne --buffer_size 10G --report --bedGraph --cytosine_report --scaffolds --gzip --multicore 3 -o /path/file.fastq.gz_bismark_pe.deduplicated.sam

I get proper CpG_context_file.fastq.gz_bismark_pe.deduplicated.txt.gz and Non_CpG_context_file.fastq.gz_bismark_pe.deduplicated.txt.gz files, see the first lines for an example:

Bismark methylation extractor version v0.14.4
HWI-ST539:249:C7BDRACXX:6:1101:1460:1903_1:N:0:GCCAAT + scaffold.s31344 999331 Z
HWI-ST539:249:C7BDRACXX:6:1101:1460:1903_1:N:0:GCCAAT - scaffold.s31344 999181 z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT - scaffold.s10570 115561 z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115578 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115590 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115616 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115624 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115639 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115632 Z

However, it does not convert properly into bed files (I get and empty file) and the cytosine reports I get is like this, with no methylation at all (columns 3 and 4 are all 0’s):

scaffold.s00001 45 + 0 0 CG CGC
scaffold.s00001 46 - 0 0 CG CGT
scaffold.s00001 49 + 0 0 CG CGG
scaffold.s00001 50 - 0 0 CG CGT
scaffold.s00001 1095 + 0 0 CG CGT
scaffold.s00001 1096 - 0 0 CG CGG
scaffold.s00001 1481 + 0 0 CG CGC
scaffold.s00001 1482 - 0 0 CG CGG
scaffold.s00001 1560 + 0 0 CG CGC
scaffold.s00001 1561 - 0 0 CG CGG

I have tried to do things step by step, but I get the same result. I have been working on this for more than a week now and I do not find were is the error. Could someone help me with this?

Thanks a lot in advance!

Begoña

**fkrueger** · 12-08-2015, 12:58 PM

Hi Begona,

It looks like the bismark2bedGraph step is somehow failing, can you check the error logs (or what appears on screen) to see what is going wrong exactly?

The CpG report simply puts the coverage file into genomic context, so if the coverage file is empty then the CpG report will show 0 0 only as well.

I'm happy to look at this in more detail, you can also send me email with the error logs. Best, Felix

**fkrueger** · 12-09-2015, 04:00 AM

It appears that GZIP-compressed input files were streamed directly into the Unix sort command (e.g. when using the option --scaffolds/--gazillion), but sort cannot read compressed files and thus it would produce an empty output. I have opened an issue of Github for that (https://github.com/FelixKrueger/Bismark/issues/9) and fixed the way GZIP compressed files are streamed to sort and it seems to work fine on my end. The latest version can be cloned straight from Github.

**bmartinez** · 12-14-2015, 04:08 PM

problems with bismark2bedGraph and coverage2cytosine to get methylations extracted

Hi Felix and everyone,

Thanks a lot for your help, Felix, the problem is solved.
Otherwise, I have and additional issue related to the sorting of the CpG_context and Non_CpG_context files in my operating system. I am reporting it in case someone else can be affected and to ask for your opinion.

I get an error when sorting these files (this is done within the script bismark2bedgraph). The line that does the sorting in the script is the following one:

open $ifh, "sort -S $sort_size -T $sort_dir -k3,3V -k4,4n $in |" or die "Input file could not be sorted. $!\n";

I have solved the issue sorting just with "3,3". Otherwise, I am still trying to confirm that this is not affecting the final results. If anyone can give a hint on this, I would greatly appreciate.

Begoña

**Tlexander** · 01-08-2016, 09:49 AM

Bismark Bug v0.14.4 --remove_spaces in bismark_methylation_extractor.

I think there is some bugs in the option of --remove_spaces in bismark_methylation_extractor.

The error is as the following:

Changed directory to /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/
Now replacing whitespaces in the sequence ID field of the Bismark methylation extractor output /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt prior to bedGraph conversion

Couldn't write to file /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt.spaces_removed.txt: No such file or directory
Finished BedGraph conversion ...

**fkrueger** · 01-09-2016, 05:55 AM

Originally posted by Tlexander View Post

I think there is some bugs in the option of --remove_spaces in bismark_methylation_extractor.

The error is as the following:

Changed directory to /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/
Now replacing whitespaces in the sequence ID field of the Bismark methylation extractor output /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt prior to bedGraph conversion

Couldn't write to file /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt.spaces_removed.txt: No such file or directory
Finished BedGraph conversion ...

Thanks for reporting this, could you please also post the exact command you used when you called the methylation extractor so I can reproduce it more easily?

**biocomputer** · 01-31-2016, 06:51 PM

When I try to run bismark2bedGraph from a directory that doesn't directly contain the input file it fails to find the input file, even though I've specified the path to the input file. For example, if I'm in the directory that contains the input file and use this command it works properly (note that I've taken the output from the methylation extractor and split it by chromosome for batch processing but the same thing happens if I use the original unsplit file):

bismark2bedGraph -o ./output/chr10.bg ./CpG_chr10.txt

If I move up a directory and run this command it doesn't work:

bismark2bedGraph -o ./directory/output/chr10.bg ./directory/CpG_chr10.txt

The programs ends with:

Using the following files as Input:
CpG_chr10.txt

Writing bedGraph to file: ./directory/output/chr10.bg.gz
Also writing out a coverage file including counts methylated and unmethylated residues to file: ./directory/output/chr10.bg.gz.bismark.cov.gz

Couldn't find file 'CpG_chr10.txt': No such file or directory

**fkrueger** · 02-01-2016, 01:32 AM

Thanks for reporting this, I have filed an issue on the Bismark GitHub page and will address it as soon as I find some time. Cheers, Felix

**MagdalenaZ** · 02-01-2016, 03:05 AM

Mapping SE and PE reads

Hi,

I have PE reads, some 20% of which overlap. I usually overlap these reads before mapping, so that I have the following files:

Non-overlappling_reads_1.fastq
Non-overlappling_reads_2.fastq
Merged_overlappling_reads.fastq

I would submit those reads to bismark as:

bismark -1 Non-overlappling_reads_1.fastq -2 Non–overlappling_reads_2.fastq Merged_overlappling_reads.fastq

….but then my resulting BAM-file contains both SE and PE reads, so do I use the –p or –s flag on bismark_methylation_extractor ?

Also, from the results it doesn't really look like the Merged_overlappling_reads.fastq actually gets read.

Or I could run bismark AND bismark_methylation_extractor twice,; once for SE and one for PE – but then at what point do I merge the results?
Last option – just not do the overlap, but feed in all PE as is, and use: bismark_methylation_extractor —include_overlap and hope that all will be well. But then I loose the SE reads.

So many options, but hopefully only one optimal solution!
Very grateful for your advice!

Cheers,
Magdalena

**fkrueger** · 02-01-2016, 03:47 AM

Hi Magdalena,

Code:

bismark -1 Non-overlappling_reads_1.fastq -2 Non–overlappling_reads_2.fastq Merged_overlappling_reads.fastq

does not work in the way you think it will as it would really only do PE alignments of the overlapping reads. So if you wanted to split this up manually (but why?) then you can run the PE on non-overlapping reads first and the SE on the overlapping reads, and then run a PE and SE methylation extraction separately. If you wanted to you could then merge the data again for the bismark2bedGraph step, just feed in all the CpG* files from both PE and SE mapping.

I am not quite sure however if merging and making things complicated isn’t exactly doing exactly what the methylation extractor is doing anyway: mapping non-overlapping reads and getting the information from both reads, and only getting the information once from overlapping reads because of the --no_verlap option (isn’t this the same as your ‘merghing’ step?)

**MagdalenaZ** · 02-01-2016, 05:29 AM

Hi,

yep, when I benchmarked that code I get same results if I include the SE or not.

I'm a bit keen to also use SE reads, as I after adapter trimming get improved mapping results, but some 3% of reads are after adapter trimming unpaired. I guess I could chuck them away, but I'll have a go with your suggestion:
to merge them in bismark2bedGraph.

Every little helps ..... :-)

Many thanks for your rapid reply!

**fkrueger** · 02-08-2016, 08:13 AM

Originally posted by biocomputer View Post

When I try to run bismark2bedGraph from a directory that doesn't directly contain the input file it fails to find the input file, even though I've specified the path to the input file. For example, if I'm in the directory that contains the input file and use this command it works properly (note that I've taken the output from the methylation extractor and split it by chromosome for batch processing but the same thing happens if I use the original unsplit file):

If I move up a directory and run this command it doesn't work:

The programs ends with:

I have just been looking into this and can't reproduce the error. I then went back to the Release Notes and found this one for version 0.14.0:

bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well

Did you by any chance run an older version than that?

**biocomputer** · 02-08-2016, 10:01 AM

I'm using 0.14.4:

$ bismark2bedGraph --version

Bismark Methylation Extractor Module -
bismark2bedGraph

Bismark Extractor Version: v0.14.4
Copyright 2010-15 Felix Krueger, Babraham Bioinformatics

Babraham Bioinformatics - Bismark Bisulfite Read Mapper and Methylation Caller

https://www.bioinformatics.babraham.ac.uk/projects/bismark/

**fkrueger** · 02-10-2016, 07:24 AM

Right, I believe it is now fixed, the latest version can be downloaded here: https://github.com/FelixKrueger/Bismark/issues/18

**VC87** · 03-02-2016, 08:35 AM

Hi!Can anyone explain to me the differences between bisulfite-seq, WGBS, RRBS and COBRA library reads?thanks in advance!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News