Seqanswers Leaderboard Ad

**fkrueger** · 12-27-2014, 06:27 AM

Hopefully last release for the year: Bismark v0.13.1

We have just released a new version of Bismark (v0.13.1), which is available from the Babraham Bioinformatics website. This version adds several new useful options and bug fixes, perhaps most notably the new option --merge_CpG for the coverage2cytosine module that pools CpG information from both top and bottom strand methylation evidence and a fix for the methylation extractor option --multicore which now accurately shows the correct number of events in the M-bias plots.

Here is a more detailed list of all changes:
• Bismark Genome Preparation: Added a check for unique chromosome names to the Bismark indexer to avoid disappointments later
• Methylation Extractor: Fixed a bug for the M-bias reports when the option --multicore was used, in which case only the numbers of one core were used to constuct the report. Now every different thread writes out an individual M-bias table, and once the methylation extraction has completed all these individual files are merged into a single, cumulative table as it should be
• Methylation Extractor: Added a new option --mbias_off, which processes the files as normal but does not write out any M-bias files. This option is meant for users who run the methylation extractor two times, the first time to figure out whether there is a bias that needs to be removed, and the second time using the --ignore options, but without overwriting the already existent M-bias reports
• bismark2bedGraph: Deferred removal of the input file path information a little so that specifying file paths doesn't prevent bismark2bedGraph from finding the input files anymore
• bismark2bedGraph: If the specified output directory doesn't exist it will be created
• bismark2bedGraph: Changed the way scaffolds are sorted (with --gazillion/--scaffold specified) to -k3,3V (this was done following a suggestion by Volker Brendel, Indiana University: "The -k3,3V sort option is critical when the sequence names are numbered scaffolds (without left-buffering of zeros)
• coverage2cytosine: Added a new option --merge_CpG that will post-process the genome-wide report to write out an additional coverage file which has the top and bottom strand methylation evidence pooled into a single CpG dinucleotide entity. This may be the desirable input format for some downstream processing tools such as the R-package bsseq (by K.D. Hansen). An example would be:

Code:

  genome-wide CpG report (old)
         gi|9626372|ref|NC_001422.1|     157     +       313     156     CG
         gi|9626372|ref|NC_001422.1|     158     -       335     156     CG

  merged CpG evidence coverage file (new)
         gi|9626372|ref|NC_001422.1|     157     158     67.500000       648     312

This option is currently experimental, and only works if CpG context only and a single genome-wide report were specified (i.e. it doesn't work with the options --CX or --split_by_chromosome)
• coverage2cytosine: Changed the processing of not-covered chromosomes so that they are sorted and not processed randomly. This should make runs more reproducible

Comments or suggestions welcome. Happy New Year! Felix

**chxu02** · 01-10-2015, 07:28 PM

much smaller bedgraph & cov files generated...

Hi all,

I got some results I can not explain. I have 2 human BS-seq libraries. Before sequencing with great depth, I sequenced a little bit to get 1M raw reads for each library, and analyzed the data with bismark. It all looked good. The methylation extractor files were with similar size, and bedgraph& cov files with similar size too. Now I sequenced to get 250M PE raw reads for each library. The methylation extractor files are with similar size, while the bedgraph and cov files from one library are with only 1/4 of the size of files from another library. When imported into IGV, it looks like I have much less coverage for the library with smaller files. Because of the biological information in the 2 samples, I expect to see same coverage while different methylation levels. What's going on here?

Youyou

**fkrueger** · 01-11-2015, 02:04 PM

Hi Youyou, I don't think file size is the best way to judge the outcome your experiment. Rather, it might be helpful to know whether all steps finished successfully, the sequence mode and length, whether the reads were trimmed, the mapping efficiency, duplication rate etc. Ideally could you attach here or send me the reports (FastQC, trimming, mapping, dedupliciation and methylation extraction reports) via email? Cheers, Felix

**chxu02** · 01-11-2015, 07:31 PM

Originally posted by fkrueger View Post

Hi Youyou, I don't think file size is the best way to judge the outcome your experiment. Rather, it might be helpful to know whether all steps finished successfully, the sequence mode and length, whether the reads were trimmed, the mapping efficiency, duplication rate etc. Ideally could you attach here or send me the reports (FastQC, trimming, mapping, dedupliciation and methylation extraction reports) via email? Cheers, Felix

Hi Felix,

Thanks for your prompt reply. Now I figured it out. The distribution of reads from the library with smaller bedgraph is further enriched at some certain genomic regions compared to another one. That's why two libraries ended up with similar amount of C mapped, while distinct numbers of lines in bedgraph generated.

I have one more concern about the deduplication step. Each of my BS-seq libraries was made from 1ng ChIP DNA. So I don't expect to see genome-wide distribution of reads, and with >200M reads, I can tolerate some level of PCR duplication which should be determined by genome size and library depth, while the bismark deduplication script seems to cut the number of duplicated reads arbitrarily to 1, which vastly reduced the depth of my libraries. I just wonder if there is any alternative for me to do the deduplication step.

Appreciate your contribution.

Youyou

**fkrueger** · 01-12-2015, 03:37 AM

Glad that you got it sorted. If your libraries were derived from a ChIP-ped sample I would agree that you probably don't want to deduplicate the data. Unless you used a unique molecule identifier for each read there is no way to discriminate between genuine ChIP-ped fragments and PCR amplified reads, but deduplication would completely take the depth out of your experiment.

**chxu02** · 01-19-2015, 09:54 PM

Hi Felix,
I'm stuck in the claim that "If a library is directional, only reads which are (bisulfite converted) versions of the original top strand (OT) or the original bottom strand (OB) will be sequenced."
I'm quite sure my library is directional. The mapping efficiency is 80% when I use --directional. But I did pair-end sequencing which means that the CTOT & CTOB strands were sequenced from another end. So what happened actually when I ran bismark --directional with my pair-end reads?

Youyou

**dpryan** · 01-20-2015, 12:54 AM

Hi Youyou, I'll save Felix from needing to reply.

Internally, bismark is calling bowtie2 (or bowtie) with the --norc flag when it's aligning to the C->T converted genome (this would be the OT strand). Even though bowtie is being told to not align to the reverse complement (CTOT) it'll still reverse complement read #2 of a pair prior to alignment. This is due to the library type flag (--fr) defaulting to what's appropriate for Illumina.

**chxu02** · 01-20-2015, 06:23 AM

Hi Devon,
Thanks for your answer. Still, I'm not quite sure what happened at the methylation calling step. How does bismark process the methylation info from the #2 reads which uniformly reflected the sequence info from the CTOT or CTOB strands?

**fkrueger** · 01-20-2015, 06:29 AM

The orientation of a paired-end alignment is given solely by the alignment of Read 1. Since you ran your data in default = directional mode (and rightly so given that you've got a mapping efficiency of 80%!), R1 can only align to the OT and OB strands. Technically R2 aligns to the CTOT and CTOB strands, but since they are part of a paired-end alignment Bismark factors this in automatically and does a G to A lookup for methylation calls of these reads. In short: don't worry, it is all being handled for you!

**jeni** · 01-29-2015, 07:49 AM

Hello,

My bismark report says:

C methylated in CpG context: 52.0%
C methylated in CHG context: 1.0%
C methylated in CHH context: 0.8%
C methylated in unknown context (CN or CHN): 20.6%

Does this means the bisulfite conversion rate is 99% because the non-CpGs methylation rate is around 1%?

Thanks.
Jeni

**fkrueger** · 01-29-2015, 09:35 AM

I would certainly interpret it that way, yes. You could potentially also look at spike in controls, but since 0.8% is you lowest value the efficiency must have been >= 99.1%.

**jeni** · 01-29-2015, 11:48 PM

Thanks....

**kentawan** · 02-09-2015, 07:43 PM

Hi All,

I have gotten my final methyl_extractor output files based on my bismark output files which have a mapping efficiency of 56.6% (is this normal?).

Right now I am stuck on the research relevancy of OB and OT CpG Mehylation. If I just want to search for DMRs, can I merge both OB and OT together to make a total CpG file?

many thanks in advance.

**fkrueger** · 02-10-2015, 01:36 AM

Hi kentawan,
unless you are interested in looking at events for top and bottom strand separately you can indeed just merge the two outputs. You can then use this output to find DMRs.

Some tools even go one step further and recommend that you merge the top and bottom strand information of a CpG dinucleotide. I you wanted to do this you could use the option --merge_CpG of the coverage2cytosine script to do this (version 0.13.1 required; look at the latest release notes here).

Regarding the question of whether 56.6% mapping efficiency is 'normal', well this difficult to answer as it depends very much on what you have done. Among the factors that play a role are: genome used (repeat content), read length, rigorous adapter/quality trimming, single-end, paired-end, library strategy (directional, PBAT, enrichment, shotgun), contaminations etc. As a guideline for 100bp single-end shotgun reads you would probably expect efficiencies of ~70-75% against the mouse and ~80-85% against the human genome (in Bowtie2 mode).

**chxu02** · 02-25-2015, 01:00 PM

Hi Felix,
Is there a way to segregate cytosines into those on + strand and on - strand? The "-/+" in bismark methylation extractor output files specifies methylation state instead of strand info. I think generating that optional genome-wide cytocine report and then grepping can do this job. But that will generate tons of data... Any idea?

PS: Does OT necessarily mean + (Watson) strand?

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 7 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News