SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CisGenome -- an integrated tool for ChIP-seq data analysis hji Bioinformatics 66 12-30-2014 02:55 PM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 06:40 AM
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments fkrueger Bioinformatics 6 03-19-2012 06:06 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 09:43 AM

Reply
 
Thread Tools
Old 03-27-2011, 08:55 PM   #81
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Yep it works fine now. It would be good if you rather created a BAM file with a pipe to "samtools view -t <chr.sizes> -bS -" for space saving. Just a suggestion because I like working with BAM files that take less disk space.
zee is offline   Reply With Quote
Old 03-27-2011, 09:02 PM   #82
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

Hi Z,

That's not a bad idea at all. I'll look into that.

Thanks heaps.

Cheers,
Oliver
olivertam is offline   Reply With Quote
Old 03-27-2011, 09:09 PM   #83
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Hi Oliver,

Attached is a version we modified for our work with novoalign-novomethyl and bismark. We output an extra SAM tag ZB:Z:GA or ZB:Z:CT so that we're able to split these out later in our pipeline.
Attached Files
File Type: pl bismark2bam.pl (12.8 KB, 53 views)
zee is offline   Reply With Quote
Old 04-07-2011, 03:53 AM   #84
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi Zee,

Would it be OK if I stuck your bismark2bam.pl script up on our website?

Felix
fkrueger is offline   Reply With Quote
Old 04-07-2011, 04:05 AM   #85
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Sure thing it's open to anybody who wants to use it.
zee is offline   Reply With Quote
Old 04-15-2011, 01:18 AM   #86
zeam
Member
 
Location: USA

Join Date: Oct 2010
Posts: 38
Default

Hey guys,
Recently ,I was using bismark to process my methyC-seq data, but the efficiency of mapping is not so good.And I know the species I work on is transposon rich which is greter than 60%.In most papers,their mapping efficiency is greater than 70%.But for my methylome data,it's about 40% for single end reads, and 68% for paired ends reads.

Does anyone have encounter the similar mapping problem?Or someone can give me some suggestions about the mapping strategy.Half of my reads are single end.
zeam is offline   Reply With Quote
Old 04-15-2011, 01:38 AM   #87
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi Zeam,

could you give us a few more details about your actual experiment? A paired-end mapping efficiency of 68% for BS-Seq data sounds quite good to me, but 40% for SE is indeed a bit low.

What was the read length you used for the single end files, what were the mapping parameters and what did the QC of the FastQ files look like?

Also, Bismark should produce the stats:

Sequences with no alignments under any condition: 123
Sequences did not map uniquely: 73591

which should give you a feel whether sequences just fail to align (too many errors, residual adapter sequence or the like) or if they get rejected because they align in too many places (this could be indicative of a high repetitive element content). Another possibility could be that your genome of interest contains something like properly sequenced but unplaced scaffolds which could share a high sequence similarity to other chromosomes. These might also result in a high number of sequences being rejected due to non-unique mapping.

If you like you could send me the Bismark mapping report and the FastQC report (the zipped file) to take a look, maybe it tells us more.

Best,
Felix
fkrueger is offline   Reply With Quote
Old 04-15-2011, 02:44 AM   #88
zeam
Member
 
Location: USA

Join Date: Oct 2010
Posts: 38
Default Report from bismark mapping

Hi Felix,
The two attachments are bismark mapping report for a PE and SE lane respectively,and the fastqC report will be emailed to you because of its file size.

The raw reads' length is 100bp base pair.And after trimmed by q 13,small propotion was less than 100 bp.

Thanks for replying!

Best wishes,
Zeam
zeam is offline   Reply With Quote
Old 04-15-2011, 03:06 AM   #89
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi Zeam,

thanks for the attachments. By just briefly looking at the mapping report you seem to have 45 million alignments which got rejected because of ambiguous mappings. These mismappings do not only mean that the reads map somewhere else, but they map at least twice with the same number of lowest mismatches.

To me this looks like you are using a newly assembled plant genome that contains either a lot of smaller scaffolds that could not be placed into the main genome or something like an unmapped_chromosome. The only solution there is for such a problem is re-indexing your genome but removing very small scaffold first. We once had a similar problem with Chlamydomonas, and if I recall it correctly removing unplacable scaffolds worked like a charm.

A quick word about your mapping parameters. 100bp reads are really quite long for BS-Seq, using -n 2 -l 28 (which are the default settings) is tolerating quite a lot of errors. If I were you I would be much more stringent about the parameters, maybe even use something like -n 2 -l 70 or so, as sequencing errors can not only allow mismappings but will also lead to false methylation calls. Also with reads this long you are likely to read into the adapter on the other side, so you might want to use an adapter trimmer on the reads as well.

Please let me know if I can be of more help,

Best,
Felix
fkrueger is offline   Reply With Quote
Old 04-21-2011, 07:07 AM   #90
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

I would like to announce that Bismark v0.5.0 has been released today.


The 3 main modifications are:

- paired-end alignments should now be performed correctly irrespective of the sequence ID format in the FastQ file. This hopefully means that the new format which will be output by the Illumina Casava version 1.8 will no longer cause Bismark to stop.

- the alignment output will now also include extra column(s) for sequence basecall quality scores (both for single and paired-end data). This should facilitate filtering on qualities later on if desired.

- fixed a bug with paired-end alignments where alignments to the CTOT strand were accidentially assigned to the CTOB strand and vice versa.

All associated files can be obtained from:
http://www.bioinformatics.bbsrc.ac.u...cts/index.html


I hope the modifications do not break too many downstream analysis scripts ... If you spot any flaws please let me know.

Best,
Felix
fkrueger is offline   Reply With Quote
Old 05-04-2011, 08:07 AM   #91
jamal
Member
 
Location: Denmark

Join Date: Jan 2010
Posts: 10
Default

Hi Felix

thank you for your software.
I want to use Bismark to align paired-end bisulfite sequences from mamalian genome. which option can i use to run it in parallel threads, I checked -p ,but it didn't work.

best regards
jamal
jamal is offline   Reply With Quote
Old 05-04-2011, 08:18 AM   #92
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi Jamal,

I have tried to implement a -p option, but unfortunately it is technically not so straight forward to get this to work. The easiest way to parallelize Bismark right now is to split the input file(s) into smaller subfiles and merge the ouput again afterwards. (Just as a side note, with appropriate alignment parameters 1 thread of Bismark should align roughly 10-20 M sequences per hour)

I am sorry that the parallelization is currently not easier, maybe I should work on implementing a similar strategy (splitting files internally and merging the output again) for a future release. If you have any questions regarding parameters just send me an email.

Best wishes,
Felix
fkrueger is offline   Reply With Quote
Old 05-04-2011, 07:19 PM   #93
zeam
Member
 
Location: USA

Join Date: Oct 2010
Posts: 38
Default Why not -p option for Bismark

I got the same query,and Felix,Bismark developer,gave a explanation about this,the details are as follows
++++++++++++++++++++++++++++++++++++++++++++++
Let's say you give each of the bowtie threads 4 CPUs instead of 1. This could practically mean that
CPU1 gets reads 1-5,
CPU2 gets reads 6-10,
CPU3 gets reads 11-15, and
CPU4 gets reads 16-20.

Now if for whatever reason read 16 aligns first (maybe a perfect match) it would get reported as the first alignment of the 'thread 1' fbowtie instance, which practically means that all other reads (1-15) would be scored as having no alignment. It is the randomness of Bowtie's reporting that does not allow specifying a higher -p option, I have tried it already.

If you have spare capacity on your computer you can split up your input files though, and run multiple instances of Bismark at the same time and merge them again afterwards. Just keep in mind that running Bismark once will use 5 CPU threads and for e.g. the human genome 8 GB of RAM. Make sure that you do not run more instances of Bismark as you have got RAM available as this might cause some threads of Bowtie to die due to insufficient memory which will ultimately produce wrong alignment results.

I hope I explained it better this time,
zeam is offline   Reply With Quote
Old 06-16-2011, 03:05 AM   #94
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Just a quick note that Bismark v0.5.1 has just been released, which fixes a few minor issues:

- The genome folder for the bismark_genome_preparation can now be specified either as absolute or relative path.
- Fixed a bug where a newline character was missing after the quality values in the unmapped reads FastQ output.
- Fixed a bug which prevented paired-end alignments in FastA format.
- Input files for the methylation extractor can now also have a relative path.
- The bismark2bedGraph script received an update to fix a 1-off error.

All associated files can be found at http://www.bioinformatics.bbsrc.ac.uk/projects/bismark/ (Ctrl+Refresh might be needed to force a cache update).

Felix
fkrueger is offline   Reply With Quote
Old 07-12-2011, 11:54 AM   #95
wilhelml
Junior Member
 
Location: Aloha, Oregon

Join Date: Jun 2010
Posts: 7
Default Question about calculating %methylated from bismark data...

Hi Felix,

I want to calculate the ratio of methylated CpGs to total CpGs for a given stretch of DNA.
My Bismark data is non-directional, thus it is my understanding from the manual that
Cs methylated on the bottom strand will appear as methylated Gs on the top strand.

So if a given stretch of DNA has 10 CpGs on the top strand, and Bismark reports
let's say 5 positions as methylated, is the ratio of methylated CpGs 5/10, or 5/20?

This example has 10 CpGs on the top strand, which means it has 10 on the bottom
strand as well, for a total of 20. Again, since Bismark is reporting data from
both strands, should I include both strands in my 'total CpGs' calculation?

Thank you!

Larry Wilhelm
wilhelml is offline   Reply With Quote
Old 07-12-2011, 12:16 PM   #96
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi Larry,

Thanks for this confusing example . Just to quickly summarise for your example with 10 CpGs on the top strand:

Cs on the original top strand (OT) and also Gs at the same position on the complementary strand (CTOT) will report the methylation status of the original top strand.

Likewise, Cs on the original bottom strand (OB) and Gs on its complementary strand (CTOB) will report the methylation of the original bottom strand.

It is not granted that you will see reads from both OT or CTOT and OB or CTOB in the same sequencing experiment, so you can end up having methylation information for one strand only. Thus, I think you can't simply use the total number Cs present on both strands and calculate the methylation ratio as, in your example, 5 out of 20 CpGs found methylated, thus 25% methylation. Rather, you will have to take the number of Cs found methylated - per strand - and divide this number by the total number of Cs you actually observed in this region. Then you can compare the methylation on both strands to see if you have the same level of methylation on both strands or whether they differ from each other, and average over both of them.

In it's internal calculation, Bismark just divides the number of methylated Cs by (methylated+unmethylated) Cs observed to get a percentage methylation.

I hope this doesn't add to the confusion

Felix
fkrueger is offline   Reply With Quote
Old 07-12-2011, 02:54 PM   #97
wilhelml
Junior Member
 
Location: Aloha, Oregon

Join Date: Jun 2010
Posts: 7
Default Follow up: Question about calculating %methylated from Bismark data...

Thanks Felix, very helpful.

You have reduced confusion. I have added to it because I was mistaken, my library is directional.

Here's some real data. I've listed the lines from the CpG_context output of the
Methylation Extractor that correspond to a small region of interest. I've appended the output with
two fields, the base on the top strand at that position, and my interpretation of the
bismark output:

read_id:2:82:3541:4552#0 - 1 1343509 z : C : un-methylated C on top strand
read_id:2:82:3541:4552#0 + 1 1343515 Z : C : methylated C on top strand
read_id:2:82:3541:4552#0 + 1 1343524 Z : C : methylated C on top strand
read_id:2:82:3541:4552#0 + 1 1343526 Z : C : methylated C on top strand
read_id:2:82:3541:4552#0 + 1 1343528 Z : C : methylated C on top strand
read_id:2:82:3541:4552#0 - 1 1343570 z : C : un-methylated C on top strand
read_id:2:82:3541:4552#0 - 1 1343561 z : C : un-methylated C on top strand
read_id:2:90:16977:1371#0 - 1 1343619 z : G : un-methylated C on bottom strand
read_id:2:90:16977:1371#0 + 1 1343562 Z : G : methylated C on bottom strand
read_id:2:90:16977:1371#0 + 1 1343571 Z : G : methylated C on bottom strand

CpGs on top strand = 11 (counted from the sequence itself)
CpGs for which we have alignment data = 10

mCs_top_strand = 4 (methylated CpGs on top strand)
un-mCs_top_strand = 3 (un-methylated CpGs on top strand)

mCs_bottom_strand = 2
un-mCs_bottom_strand = 1

%mCs_top_strand = 4/7 = 57%
%mCs_bottom_strand = 2/3 = 67%

Is this all a reasonable interpretation?

We do not have anything near saturating coverage in these experiments, so just
reporting the number of Cs methylated as a fraction of the total for which we
have alignment data seems appropriate.

Thanks again.

Larry
wilhelml is offline   Reply With Quote
Old 07-13-2011, 05:02 AM   #98
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi Larry,

This looks reasonable to me for a strand specific methylation rate. To get the total methylation state it would then just be (methylated Cs top strand
+ methylated Cs bottom strand) / (all Cs top strand + all Cs bottom strand) * 100 = % methylation.

Best,
Felix
fkrueger is offline   Reply With Quote
Old 07-26-2011, 03:48 AM   #99
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Dear all,

I wanted to let you know that there is now a new version of the bismark2SAM conversion script (v4), which is available for download here (force a browser refresh if needed (Ctrl+refresh)).
Previous versions reported slightly wrong alignments for paired-end or non-directional libraries.

The new version should now report correct alignments for all kinds of sequencing libraries (SE and PE, directional and non-directional). Also, sorting of SAM files by chromosome and start position is now optional, as files can be sorted by Samtools sort later on.

I advise users also to no longer use the bismark2BAM script anymore as it contains the same bugs as the older bismark2SAM script(s), but rather use the following workflow for data analysis via the SAM/BAM and/or pileup route:

- generate bismark alignments
- bismark2SAM_v4 to convert to SAM format
- samtools view to convert SAM to BAM format
- samtools sort to sort the BAM file
- use samtools pileup for pileup files

Please let me know if there are any problems with the new conversion script,

Many thanks,
Felix
fkrueger is offline   Reply With Quote
Old 08-16-2011, 07:04 AM   #100
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Bismark version 0.5.2 was just released. It fixes a bug with the methylation extractor when the option --ignore was specified and adds some convenience-related features. Also, it has been brought to my attention that tab characters in the read ID field cause Bowtie to truncate the read ID (this does not happen with spaces), which causes internal checks to fail. Bismark will now produce a warning if there are tab characters in the read ID, but since the dataset this was tried on was >4 years old and I have since not seen any read IDs containing tabs, we'll leave it up to the user to remove tabs in the input files before running Bismark.

The changes include:

- Increased the 'chunkmbs' default value to 256 MB (up from 64 MB)
- Bismark will now accept input files in both comma and space separated format
- Fixed a bug in the methylation extractor which resulted in offset positions for reverse reads when the option '--ignore' was used (single-end only)
- Included a check (and warning) whether the read IDs in the input files contain tab characters, as this will cause Bowtie to truncate the reads and result in no alignments


The Bismark download is available from the Bismark project page.
fkrueger is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methylation, sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO