Seqanswers Leaderboard Ad

**fkrueger** · 08-18-2015, 02:25 AM

Originally posted by lancelothk View Post

Hi fkrueger,

I am currently trying to use bismark to analyse a huge BS-seq dataset in HPC environment. I am thinking to split the big fastq into smaller pieces, run bismark with each of them in one node, then merge back the BAM results. Do you have any suggestion how can I merge bismark reports? Do you have existing script to do this?

Thanks.

Hi Lanceloth,

As it stands there is no stand-alone script to merge the mapping reports, but the code should pretty much all be there for it is used for --multicore core runs anyway. The two subroutines

Code:

'merge_individual_mapping_reports' and
'read_alignment_report'

should contain everything. Let me know if you need help merging these into a stand-alone script.

Just as a short word of warning when you are trying to merge paired-end BAM files with samtools merge you need to make sure that the files are subsequently sorted by read name, otherwise the reads are not guaranteed to follow each other line by line. Would maybe the --multicore option be a little more feasible?

Best, Felix

**lancelothk** · 08-18-2015, 05:17 AM

Thank you so much Felix. I will take a look at the source code.

**lancelothk** · 08-18-2015, 12:24 PM

Hi fkrueger,

I found two minor issues in bismark v0.14.3.
The deduplicate_bismark will give errors with --representative option:
Failed to close output filehandle: Bad file descriptor
Failed to close report filehandle: Bad file descriptor

I found out that it is caused by a bug in line:548. The } should be after two close lines, since OUT and REPORT have been closed in deduplicate_representative().

The -B/--basename <basename> option can be found in the script, but not in pdf version manual.

BTW, I finished extracting reports merging code into stand alone script. The most painful part is the global variable...

Thanks.

**fkrueger** · 08-19-2015, 01:07 AM

Thanks for pointing out these issues. I have updated the manual and removed the superfluous closing statements. They will find their way into a new release which we'll be releasing later today.

Edit: Just as a quick word of warning: the --representative mode is almost certainly not what what you want to use because it will find the most highly amplified and thus biased sequence for a given position instead of a random. I will probably hide it from use in the next release...

**fkrueger** · 08-19-2015, 05:28 AM

Bismark v0.14.4. New functionality and allele-specific alignment support

We have just released a new Bismark version (v0.14.4). This brings a few convenience features, adds some options and also fixes some bugs, further details are outlined below.

It is also worth mentioning that it should now be possible to use Bismark in conjunction with SNPsplit to align Bisulfite-Seq data in an allele-specific fashion against an N-masked genome if both genotypes are known. More information about this may be found on the SNPsplit project page.

o Bismark: Changed the FLAG values of paired-end alignments to the CTOT or CTOB strands so that reads can be properly displayed in SeqMonk when imported as BAM files. This change affects only paired-end alignments in --pbat or --non_directional mode. In detail we simply swapped the Read 1 and Read 2 FLAG values round so reads now resemble exactly concordant read pairs to the OT or OB strands. Note that results produced by the methylation extractor or further downstream of that are not affected by this change
o Bismark: Input files specified with filepath information for FastA files are now handled properly in --multicore runs (this was fixed only for FastQ files in the previous patch)
o Bismark: Unmapped and ambiguous files (options --unmapped and --ambiguous) are now written out as gzip compressed files by default
o Bismark: Changed the default mode of operation to --bowtie2. Bowtie (1) alignments may still be chosen using the option --bowtie1

o Bismark Genome Preparation: Changed the execution of the genome indexing of the parent process to system() rather than an exec() call since this seemed to lead to interesting faults when run in a pipeline setting
o Bismark Genome Preparation: Changed the default indexing mode to --bowtie2. Bowtie (1) indexing is still available via the option --bowtie1

o bismark2bedGraph: The coverage (.cov) and bedGraph (.bedGraph) files are now written out as gzip compressed files by default

o coverage2cytopsine: Added new option '--gc/--gc_context' to reprocess the genome to find methylation in GpC context. This might be useful for specialist applications where GpC methylases had been employed. The output format is exactly the same as for the normal cytosine report, and only positions covered by at least one read are reported (output file ends in .GpC_report.txt). In addition this will write out a Bismark coverage file (ending in GpC.cov)

o deduplicate_bismark: Removed redundant closing statements to get rid of warning messages
o deduplicate_bismark: The option --representative is no longer displayed in the help text. The option was once useful to determine the PCR bias that had been introduced by over digestion with bisulfite and is nearly always not what should be used for deduplication (it will be left in and is still functional for the time being though)

Bismark is available from the Babraham Bioinformatics project page.

**lancelothk** · 08-19-2015, 07:25 AM

I found one more bug in deduplicate_bismark. It is also in v0.14.4.
There are several calls of samtools directly use "samtools" instead of using $samtools_path. E.g. line 269, line 207.

**fkrueger** · 08-20-2015, 04:31 AM

Originally posted by lancelothk View Post

I found one more bug in deduplicate_bismark. It is also in v0.14.4.
There are several calls of samtools directly use "samtools" instead of using $samtools_path. E.g. line 269, line 207.

Thanks for spotting that, I've fixed all these calls.

**marcusmchale** · 09-01-2015, 10:08 AM

error with seedlen > 32

I get an error from bowtie2 when I try to define seed length of 50 in bismark. I haven't found any mention of this problem elsewhere nor mention of seedlen limits in the bowtie2 manual. Particularly, it seems strange given the recommended "typical' settings are for a seed length of 50 in the bismark manual. Can anyone help me to trace the source of this error?

Using Bowtie 2 index: /home/tair10/Bisulfite_Genome/CT_conversion/BS_CT

Error: -L argument must be <= 32; was50
Error: Encountered internal Bowtie 2 exception (#1)
Command: /cm/shared/apps/bowtie/2-2.1.0/bowtie2-align --wrapper basic-0 -q -N 1 -L 50 --score-min L,0,-0.2 --ignore-quals --norc -x /home/tair10/Bisulfite_Genome/CT_conversion/BS_CT -U 4AB_trimmed_r1.fastq_C_to_T.fastq
bowtie2-align exited with value 1

The alignment does seem to work when no seedlen is defined. Here is a sample of a read from the relevant fastq, you will note the read length is 50bp but I don't think this is relevant since the error says -L must be <= 32.

@HWI-D00458:73:C6EBDANXX:1:1101:1728:1972 1:N:0:GCTCTA_A
AGCGTGGTTTATTGATTTTTTAGATTTTCGGAATTTGAAGTTAGAGGTGT
+
CG>EFF<EFECE>D1<111@/FG>CFGGGG///0=1:FGGD1FE1FGEBG

P.S. This is my first comment in the forum (though I have been stalking this place for years) so I apologise if it is out of place.

**fkrueger** · 09-01-2015, 10:41 AM

If you type bowtie2 --help you can find the following text:

Code:

-L <int>           length of seed substrings; must be >3, <32 (22)

Obviously this is not mentioned in the manual but you need to keep the seed substrings in the range of 3 to 32. The default is 22. I hope this helps, Felix

**marcusmchale** · 09-01-2015, 10:49 AM

Thanks for the prompt reply, the manual for bismark suggests the following command:

bismark -n 1 -l 50 /data/genomes/homo_sapiens/GRCh37/ test_dataset.fastq

Which would call bowtie2 to use "-L 50".

Is there something I'm missing?

Oh, it's because of the differences in alignment strategy between bowtie1/bowtie2. Thanks for the lead!

**fkrueger** · 09-01-2015, 11:16 AM

Oh it seems I need to update the manual because we very recently changed the default aligner to Bowtie 2, and the command in the manual still refers to bowtie1 (if you use --bowtie1 you can use the command as in the manual). I'll have this changed soon, thanks for spotting this.

If you want to run the test dataset just leave out all options and try using the defaults. Best, Felix

**daanum** · 11-07-2015, 01:12 PM

genome preparation

hi,
I am trying to run bismark genome preparation but unable to do so.
I have bismark v 14.5 unzipped folder on server and have bowtie-2.2.2.6 version unzipped folder and genome files for human grch38- all these three folders in one folder. Do i need to run any installation step for bismark/bowtie before i run genome preparation ?

I am new to methylation analysis so will be great if you could please help.

thanks in advance.

**fkrueger** · 11-07-2015, 01:27 PM

Bismark just needs to be extracted as is outlined step by step in the manual (http://www.bioinformatics.babraham.a...User_Guide.pdf). I believe Bowtie 2 also only needs to be unzipped, then either you place the bowtie2 executable in the PATH (just google how to do this), or you specify the path with --path_to_bowtie in Bismark.

All other steps including the genome preparation (

Code:

bismark_genome_preparation [options] <path_to_genome_folder>

) are explained in detail in the manual, this protocol, or this methylation analysis course. Good luck, Felix

**daanum** · 11-09-2015, 12:04 AM

Hi,

I am unable to run the bismark_genome_preparation step yet.
I get an error "Command not found'.
Any idea? I am trying since yesterday, not sure what am i doing wrong?

**fkrueger** · 11-09-2015, 12:25 AM

Originally posted by daanum View Post

Hi,

I am unable to run the bismark_genome_preparation step yet.
I get an error "Command not found'.
Any idea? I am trying since yesterday, not sure what am i doing wrong?

I admire your perseverance but you might want to consider doing a basic Linux operation tutorial, I think you might benefit.

Here you've got a couple of options:
1) either you move to the folder containing the Bismark installation and then run ./bismark_genome_preparation (./ prepends the path to the current genome)
2) you can type /path/to/Bismark/bismark_genome_preparation which should work from anywhere.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News