SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data fkrueger Bioinformatics 649 10-05-2018 02:43 AM
Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context fkrueger Bioinformatics 27 10-11-2013 06:40 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 09:43 AM
Gapped alignment with RNA-Seq agc RNA Sequencing 2 12-21-2010 09:03 PM
SOAP2 and gapped alignments amaer Bioinformatics 0 10-26-2009 02:50 PM

Reply
 
Thread Tools
Old 12-08-2011, 07:59 AM   #1
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments

We would like to announce that Bismark has received a major overhaul. While the default alignment behaviour of Bismark (using Bowtie 1) has not changed very much (see below), Bismark does now also support gapped alignments using Bowtie 2. From all test we have performed so far (single or paired-end, directional or non-directional with various simulated methylation levels or real life datasets) the Bismark results of both Bowtie 1 and Bowtie 2 are very concordant.

However, as Bowtie 2 is still in beta and subject to change, the current release of Bismark has therefore also to be considered a beta version (0.6.beta1).
Here is an overview of the most prominent changes:

Running Bismark with Bowtie 1 (default)

- Default output changed to SAM format

- The ‘old’ output format is still available via the option ‘--vanilla’

- Alignment processes were slightly modified to run in --norc/--nofw mode where appropriate, which may result in a slightly increased mapping efficiencies

- The former option ‘--directional’ is now the new default mode (‘--non_directional’ will report alignments to all four strands)

- The default paired-end maximum insert size ('-X') was increased to 500bp (up from 250bp)


Running Bismark with Bowtie 2 (optional)

- Alignments are performed in end-to-end mode (similar to Bowtie 1), but do allow gapped alignments with insertions and/or deletions

- Output format is SAM

- Since Bowtie 2 requires different indexes for alignments, the bismark genome preparation does now also support Bowtie 2 bisulfite indexing of a reference genome


I should like to stress that we don’t think that using Bowtie 2 for Bismark alignments is simply a replacement for Bowtie 1. Rather, as is also stated on the project its page, Bowtie 2 is supposed to work more efficiently for longer reads and allows gapped alignments. For shorter and/or indel-free reads Bowtie 1 may well be faster and more accurate, which is why Bowtie 1 will remain the default alignment mode for Bismark. Indeed, in some of the tests I have run so far the Bowtie 1 seemed to have a speed advantage.


While Bismark seems to work fine in all alignments modes, its methylation_extractor works currently only on the old Bowtie 1 (‘--vanilla’) output and not yet on SAM output files (I am going to work on this in the next couple of days/weeks). This is another reason for calling the current Bismark version 0.6.beta1.

Compared to Bowtie 1, Bowtie 2 has many ‘new’ parameters, of which the following are currently adjustable:

-M <int> (reporting the best out of N valid alignments)
-N <int> (multi-seed mismatches)
-L <int> (seed length)
-D <int> (maximum number of seed extension fail tries)
-R <int> (reseeding of repetitive alignments)
--score-min <func> (setting minimum alignment score for valid alignments)

We are still in the process of determining a set of most sensible parameters to generate unique 'best' alignments in a reasonable time (inceasing some of the parameters above might make Bismark run dog slow...). I would very much appreciate any comments or input in this regard (and of course also bug reports...).

All files are available from the Bismark project page.

Thanks,
Felix
fkrueger is offline   Reply With Quote
Old 12-15-2011, 03:39 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

We have just added a parallelization option for Bowtie 2 alignments (-p NTHREADS). This option became feasible because the latest Bowtie 2 release (Version 2.0.0-beta5 - December 15, 2011) added the option --reorder which reports alignments in the same way as they are read in, even if multiple threads are used for alignment.

This option should potentially be useful to speed up Bismark alignments as well, however - as a word of caution - it also requires much higher system resources. E.g. specifying -p 3 will use 4*3 = 12 threads/cores for alignments as well as 1 thread for Bismark itself, and use > 15GB of memory for a human genome.

The use of Bowtie 2 for Bismark alignments is still experimental and I would appreciate any input or feedback!

Bismark v0.6.beta2 is available from the Bismark project page.
fkrueger is offline   Reply With Quote
Old 03-19-2012, 05:34 AM   #3
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 72
Default

Quote:
Originally Posted by fkrueger View Post
While Bismark seems to work fine in all alignments modes, its methylation_extractor works currently only on the old Bowtie 1 (‘--vanilla’) output and not yet on SAM output files (I am going to work on this in the next couple of days/weeks). This is another reason for calling the current Bismark version 0.6.beta1.
I am aligned my BS reads using v_0.6.beta1 and generated an output sam file. i am now trying to run the methylation extractor on that file and I am getting an error stating:

The methylation extractor and Bismark itself need to be of the same version!

Versions used: methylation extractor: ' v0.6.beta1 '
Bismark: ' @HD VN:1.0 SO:unsorted '

I am wondering if what you quoted in the above post is relevant to my issue and if I upgrade to the most recent version of bismark, will I have an issue because the alignment was done in another version?
shawpa is offline   Reply With Quote
Old 03-19-2012, 05:50 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

If you used Bismark to generate SAM output you need to run a more recent version of the methylation_extractor, which does now use SAM format as default input file (as of version 0.6.3).

In any case I would recommend downloading the latest version (v0.7.2) and rerunning your alignments since several things have changed since version 0.6.1.

Best,
Felix
fkrueger is offline   Reply With Quote
Old 03-19-2012, 05:53 AM   #5
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 72
Default

Is it really necessary to rerun my alignments? It took over a week the last time because I have 7 lanes of 100bp hiseq data.

Last edited by shawpa; 03-19-2012 at 06:00 AM.
shawpa is offline   Reply With Quote
Old 03-19-2012, 06:04 AM   #6
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

The alignment and methylation information should still be the same, but there were several changes that might positively affect the outcome of your alignments, such as:

- Changed Bismark's behavior for "--directional" mode (default) to run only 2 parallel instances of Bowtie 1/2 to the original top (OT) and bottom (OB) strands, instead of 4 instances to all possible bisulfite strands. This change might result in somewhat faster alignment speed and mapping efficiency. It is still possible to run the 4-alignment strand mode for any combination of input file(s) and choice of aligner by specifying --non_directional.
- Sequences in FastA format do now receive Phred score qualities of 40 throughout (ASCII 'I') to prevent the SAM to BAM conversion in SAMtools from failing
- If a genomic sequence could not be extracted it will now also be counted and reported for use with Bowtie 1
- Changed the XX:Z mismatch field in the SAM output to display mismatching nucleotides of the reference sequence (instead of the read sequence ones)

Since Bismark does now only run 2 alignment instances instead of 4 for directional alignments, you should not only see an increase in mapping efficiency but it should also be quite a bit quicker than it would be if you run it with 4 strand mapping (I did several lanes of 100bp SE HiSeq mapping with ~240M sequences overnight on a single instance). You may check the change log on the Bismark page to see if there is anything of relevance for you.
fkrueger is offline   Reply With Quote
Old 03-19-2012, 06:06 AM   #7
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 72
Default

Thanks for your advice. I'll go ahead and download the new one and try again.
shawpa is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO