SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CisGenome -- an integrated tool for ChIP-seq data analysis hji Bioinformatics 66 12-30-2014 01:55 PM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 05:40 AM
Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments fkrueger Bioinformatics 6 03-19-2012 05:06 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 08:43 AM

Reply
 
Thread Tools
Old 10-08-2010, 12:59 AM   #21
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

I've loaded an example BAM file for paired end data into GBrowse and I have a question about miss-match information in relation to methylation in the SAM file.

If you look at this example couple of base pairs in GBrowse there is first a C on the plus strand that I assume was methylated because it remains as a C in the aligned reads. The next base as a C for the - strand and it seems that this was not methylated because the aligned - strand reads are all A.

I was thinking that the A for the - strand reads would show up as a miss match but it doesn't seem to (it should be shaded in red if it is a MM and I've checked that this works for other BAM files from standard genomic sequence data).

Am I thinking about this in the wrong way and do you have a method to colour methylated bases?
natstreet is offline   Reply With Quote
Old 10-08-2010, 07:07 AM   #22
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

Hi natstreet,

I apologize for the errors. The genome_methylation_bismark2bedGraph.pl problem was a typo on my part. The attached version should run on CHH now.
As for the BAM/SAM error, this is due to an update of bismark that I haven't implemented in my single-end script. I have now updated the script (bismart_to_SAM_v2.pl, attached), and if you run it in the single-end mode, it should work (let me know if it doesn't).

Unfortunately, I have very little experience with pair-ends SAM/BAM, so I'll need some time to look through it. In the meantime, see if the single-end pipeline works well.

Thanks.

Cheers,
Oliver
Attached Files
File Type: pl bismark_to_SAM_v2.pl (12.4 KB, 21 views)
File Type: pl genome_methylation_bismark2bedGraph.pl (4.4 KB, 18 views)
olivertam is offline   Reply With Quote
Old 10-08-2010, 10:20 AM   #23
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

No apologies needed - you're helping me out massively!

I'll give the scripts a try over the weekend and let you know if they're all working.

As soon as I started looking at some example methylated or non-methylated (so bisulfite converted) sites in GBrowse I realised that the consensus calling isn't so simple because things are not in agreement (so to say) between the two strands. I haven't yet got my head round how to deal with that.
natstreet is offline   Reply With Quote
Old 10-08-2010, 10:23 AM   #24
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

Are you using the single-end or paired-end option?
I think there could be a slight bug in the paired-end bismark_to_SAM option (not in the single-end), and I'm still trying to debug it. The single-end should work, and might be useful in giving you the pileup that you need.

Good luck.

Cheers,
Oliver
olivertam is offline   Reply With Quote
Old 10-08-2010, 06:55 PM   #25
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

So I guess there is a way to represent the Bismark methylation alignment in SAM format. Are there some standard public datasets we could use to benchmark some methylation alignment programs? I think it would be a useful exercise.
Another thought I had about this was how to output the methylation context graphs in BigWig for genome browser display.
zee is offline   Reply With Quote
Old 10-08-2010, 07:04 PM   #26
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

Hi Zee,

I have just re-posted this from an earlier discussion:

Quote:
To follow up on the BEDGraph/BigWig idea, we have developed a workaround for this. Again, this is based on single-end analysis, so I haven't tested paired-ends

1) Use methylation-extractor on the Bismark output, with '--comprehensive' option ('--merge_non_CpG' is optional)
2) Run the following script (genome_methylation_bismark2bedGraph.pl). This script sorts the methylation extractor output, then parses the results to generate an "overall methylation level" as a BEDGraph file, with one sampled cytosine site per line.
3) Use the bedGraphToBigWig program (available online, I believe) to convert the BEDGraph to BigWig.

Here's the usage for the genome_methylation_bismark2bedGraph.pl

Usage: genome_methylation_count.pl (--cutoff [threshold] ) [Bismark methylation caller output] > [output]

--cutoff [threshold] - The minimum number of times a methylation state was
seen for that nucleotide before its methylation
percentage is reported.
Default is no threshold

The output file is a tab-delimited BedGraph file with the following information:

<Chromosome> <Start Position> <End Position> <Methylation Percentage>

Bismark methylation caller (v0.2.0 or later) should produce three output files
(CpG, CHG and CHH) when using the "--comprehensive" option
(Two files if using the "--merge_non_CpG" option).
To count both CpG and Non-CpG, combine the output files.

Bismark methylation caller (v0.1.5 or earlier) should produce two output files
(CpG and Non-CpG) when using the "--comprehensive" option.
To count both CpG and Non-CpG, combine the two output files.
The script (genome_methylation_bismark2bedGraph.pl) should be in an earlier thread, but I have attached the script again (hopefully the most "bug-free" version)

Let me know if you have any questions, issues or bugs (since this is a workaround)

Cheers,
Oliver
Attached Files
File Type: pl genome_methylation_bismark2bedGraph.pl (4.4 KB, 13 views)
olivertam is offline   Reply With Quote
Old 10-08-2010, 07:05 PM   #27
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

I like the idea of using public data to benchmark the various programs. You should contact fkrueger, who made this tool. Maybe he has done something similar
olivertam is offline   Reply With Quote
Old 10-08-2010, 07:11 PM   #28
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

We wrote a methylation aligner about 2 years ago before Bismark was out and at that time it was only MAQ and BSMapper that could do methylation alignment. Another thing absent 2 years ago was SAM/BAM so now some standard is available the challenge is to make things easier on the end user who can plug-and-play their analysis work with the sorted alignments.
So, perhaps a specific SAM tag could be added to represent methylation-specific information in the record so the bedGraph, BigWigs,etc programs could be written to read these directly from the BAM file.
I would be happy to look at taking some test data and doing some methylation aligner benchmarking once again.
zee is offline   Reply With Quote
Old 10-08-2010, 10:43 PM   #29
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

Quote:
Originally Posted by olivertam View Post
Are you using the single-end or paired-end option?
I think there could be a slight bug in the paired-end bismark_to_SAM option (not in the single-end), and I'm still trying to debug it. The single-end should work, and might be useful in giving you the pileup that you need.

Good luck.

Cheers,
Oliver
Hi Oliver

I was using the paired end script. Today I'll load in BAM files made using the updated single end script you posted and see how that looks.
natstreet is offline   Reply With Quote
Old 10-08-2010, 10:46 PM   #30
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

Zee

I'd be happy to share our data as a test set for working on representing methylation in SAM and bedgraph/bigwig files as well as a dataset for alignment benchmarking. Let me know if that would be useful and I can make it available for download.

I'm also happy to run all of the different aligners and post the results. I'm not qualified for actually benchmarking them etc but I'm more than willing to spend the time providing what people need for this.
natstreet is offline   Reply With Quote
Old 10-09-2010, 02:12 AM   #31
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

Oliver (and interested others)

I've now added BAM files made from mapping my paired end data as if it were single end data to GBrowse. A useful example can be seen here. The bottom track shows the paired end mapping BAM file and the two tracks above are the F and R reads mapped as single end data. In all cases forward strand alignments are coloured green and reverse strand alignments in the salmony-orange colour.

Some things to notice are that in the paired end mapping track, the reverse reads shows the reverse strand base and not the forward strand one (which is what seems to be the standard). There are also a mix of + and - strand alignment in the F and R tracks, which I didn't expect (maybe I'm missing something obvious though?).

This actually highlights an interesting point: If the - strand alignments shows the + strand base, it will be impossible to see C bases that were converted by the bisulfite treatment unless there is a way to specifically colour either the methylated bases or the bases that were converted (more interesting would be the methylated bases). As for the paired-end BAM files, even in the single end files C bases that were converted still aren't shown as miss matches. I need to learn about the SAM file format to understand why this is - so far I've been rather blindly making things work rather than getting into the proper details.
natstreet is offline   Reply With Quote
Old 10-09-2010, 11:42 AM   #32
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

Hi natstreet,

I have to admit that I never used the pair-end BAM on Gbrowse (I use UCSC, but I guess they don't have Arabidopsis).

As for the F and R, when you made paired-end libraries, did you specify which genomic strand was ligated to each adapter. As you can imagine, the DNA is double-stranded upon fragmentation, so unless you specify the F adapter to only ligate to the "+" genomic strand, it can potentially ligate to the "-" genomic strand, therefore giving you a "-" read on F. Likewise with R, it depends on whether the original ligation (when you made the library) was ligated ONLY to the "-" genomic DNA, or randomly ligated. Let me know if this wasn't clear.

I must confess that my scripts were built to work on UCSC genome browser, which might have greater tolerance for badly made SAM/BAM. I could try to look more into the SAM format, but thus far, I'm not sure how to handle the problem. Sorry for that.

Cheers,
Oliver
olivertam is offline   Reply With Quote
Old 11-01-2010, 09:12 AM   #33
sunsnow86
Member
 
Location: organge CA

Join Date: Jul 2010
Posts: 17
Default a few more features to genome_methylation_bismark2bedGraph.pl

To Olivertam

Thank you for your effort in writing the script "genome_methylation_bismark2bedGradph.pl". it is very useful and works great! But I would like to have a few more features in the BED file if possible. Could you add a few lines in your program which can show the methylation level by color, such as methylation by blue and unmethylation by red. I think it will be more straightforward fior us to do a pairwise comparison to identify the differential methylation region. The color parameter I would like to use is as follows. I just dont know how to integrate that into your code since I am on the wet bench side. Also for the meth_percentage, could you only show the value in 3 digits. Thanks

$meth_value = $meth_percentage * 10;
if ($meth_value > 900){
$itemRgb = (57,39,140);
}
elsif (( 800 < $meth_value) && ($meth_value <= 900) ) {
$itemRgb = (49,48,206);
}
elsif ((700 < $meth_value )&& ($meth_value <= 800) ) {
$itemRgb = (57,48,198);
}
elsif ((600 < $meth_value ) && ($meth_value <= 700 )) {
$itemRgb = (80,80,80);
}
elsif ( (500 < $meth_value) && ($meth_value <= 600 )) {
$itemRgb = (112,112,112);
}
elsif ( (400 < $meth_value) && ($meth_value <= 500 )) {
$itemRgb = (144,144,144);
}
elsif ( (300 < $meth_value) && ($meth_value <= 400) ) {
$itemRgb = (192,192,192);
}
elsif ( (200 < $meth_value) && ($meth_value <= 300) ) {
$itemRgb = (148,12,274);
}
elsif ( (100 < $meth_value) && ($meth_value <= 200 )) {
$itemRgb = (165,0,41);
}
else {
$itemRgb = (173, 0, 33);
}
}
sunsnow86 is offline   Reply With Quote
Old 11-01-2010, 12:41 PM   #34
olivertam
Member
 
Location: Cold Spring Harbor

Join Date: Jun 2010
Posts: 22
Default

Hi sunsnow86,

Here's a modified version of the script as requested. I haven't tested the script extensively yet (since I don't have any data to test with), so please let me know if there are issues.

Cheers,
Oliver
Attached Files
File Type: pl genome_methylation_bismark2bedGraph_withcolor.pl (5.7 KB, 20 views)
olivertam is offline   Reply With Quote
Old 11-01-2010, 01:21 PM   #35
sunsnow86
Member
 
Location: organge CA

Join Date: Jul 2010
Posts: 17
Default

Thanks a lot, Olive! I will give a try to see how well it works
sunsnow86 is offline   Reply With Quote
Old 11-04-2010, 08:27 AM   #36
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 594
Default New options in Bismark

Hi everyone,

After analysing some BS-seq data recently we decided to implement a few changes to Bismark to improve performance. Bismark can be downloaded from the Babraham Bioinformatics homepage.


Major changes:

- Added the new option --directional to Bismark. If the BS-Seq library was constructed in a strand-specific way one would expect to see only sequences corresponding to the (C -> T converted) original top or bottom strands. The two strands which are complementary to the original strands are - in this case -merely theoretical and should not actually be observable in the sequencing experiment. Specifying --directional will reject alignments to these only-in-silico-existing strands and will generate a small report about rejected sequences after the Bismark run has been completed.

- Changed the default alignment option of Bismark to --best to ensure the most credible alignment results. This can be turned off by specifying --no_best. Disabling --best can speed up the alignment process considerably (good for testing purposes), but this will increase the risk of mismappings at the same time.

- Added the option -e/--maqerr so that one can play around with the maximum number of tolerated mismatches if desired (even though this can make Bowtie and thus Bismark very slow...).

Minor changes:

- The output files generated by Bismark will now end in '_bismark.txt' for single-end files or '_bismark_pe.txt' for paired-end files. The mapping and splitting reports will also end in .txt.

- The alignment and methylation summary reports have been slightly modified to allow better readability.


In summary, for the most trustworthy results with as few mismapped reads as possible we recommend that Bismark is run with the following options:

- '--best' (will now be on by default)
- decreasing the number of allowed (non-bisulfite) mismatches (adjustable with the -n and -l parameters) to as low as possible
- selecting '--directional' if the library is known to be strand-specific


If you have any questions please get in touch.

Best wishes,
Felix


PS: If our homepage does not show the most up to date version of Bismark (v0.2.3) hit Strg +refresh to force the BBSRC cache to update.
fkrueger is offline   Reply With Quote
Old 11-04-2010, 08:50 AM   #37
sunsnow86
Member
 
Location: organge CA

Join Date: Jul 2010
Posts: 17
Default

could somebody write a small script to make the format of the output alignment fit to the downstream statistical analysis pipeline which developed by the MIT Broad Institute for RRBS method.
(http://www.broadinstitute.org/~cbock...dev/index.html). I think it will be very useful for wet scientist. Thanks
sunsnow86 is offline   Reply With Quote
Old 11-17-2010, 10:07 AM   #38
sgrimm
Junior Member
 
Location: North Carolina

Join Date: Sep 2010
Posts: 6
Default

Does Bismark allow you to specify a different insert size range like bowtie (--minins and --maxins)?
sgrimm is offline   Reply With Quote
Old 11-17-2010, 11:20 AM   #39
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 594
Default

It doesn't take the options --minins and --maxins right now, but I don't see why this shouldn't be working.

If you think it would be a useful feature I can implement it first thing tomorrow morning and make a new release/send it to you via email.
fkrueger is offline   Reply With Quote
Old 11-17-2010, 11:24 AM   #40
sgrimm
Junior Member
 
Location: North Carolina

Join Date: Sep 2010
Posts: 6
Default

That would be fabulous. Thanks!
sgrimm is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methylation, sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO