SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
help mapping BS treated illumina reads using rmapbs shawpa Bioinformatics 0 12-06-2011 04:02 AM
MethylCoder: software for bisulfite treated reads brentp Bioinformatics 15 07-07-2011 12:17 PM
Online Bisulfite Alignment programs? hxm44 Bioinformatics 4 05-18-2011 04:11 PM
increase in 3' %T after filtering BS-Treated reads brentp Bioinformatics 2 09-29-2010 06:10 AM
Alignment of ABI solid reads and 454 reads baohua100 Bioinformatics 2 02-23-2009 04:58 PM

Reply
 
Thread Tools
Old 07-06-2009, 01:37 AM   #1
fadista
Member
 
Location: Malmö

Join Date: Sep 2008
Posts: 37
Default alignment of bisulfite treated reads

Hi,

I would like to know if any of the available next-gen alignment algorithms like maq, bwa, bowtie or others are able to align bisulfite treated reads from a methylation-seq experiment.

This is a rather tricky alignment because it requires that C's in the reference sequence be allowed to align against T's in the bisulfite-treated reads, without a penalty.

Maybe one possiblity is to use alignment algorithms with a custom scoring matrix?
fadista is offline   Reply With Quote
Old 07-06-2009, 07:14 AM   #2
xwu
Junior Member
 
Location: Los Angeles

Join Date: Dec 2007
Posts: 9
Default

I am aware that novoalign has bisulphite sequencing alignment function built in, but not sure about the performance.
xwu is offline   Reply With Quote
Old 07-06-2009, 07:44 AM   #3
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by fadista View Post
Hi,

I would like to know if any of the available next-gen alignment algorithms like maq, bwa, bowtie or others are able to align bisulfite treated reads from a methylation-seq experiment.

This is a rather tricky alignment because it requires that C's in the reference sequence be allowed to align against T's in the bisulfite-treated reads, without a penalty.

Maybe one possiblity is to use alignment algorithms with a custom scoring matrix?
I have tried BFAST and MAQ (and BWA) to do this. For BFAST, there are details in the reference manual.
nilshomer is offline   Reply With Quote
Old 07-06-2009, 11:27 AM   #4
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

novoalign can do bisulfite sequencing, but novoalign is not free charge.
totalnew is offline   Reply With Quote
Old 07-07-2009, 05:14 AM   #5
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

If you're using Illumina the easiest (bias-free) way is to preprocess your bisulphite reads to convert C's to T's (remembering where they are) and align it to a reference with all C's changed to T's. Then write a script to introduce the C's back in, or relate these as tables in a database.

As for SOLiD, all this is horrible in colorspace. If you're trying to avoid alignment bias due to methylation differences SOLiD has some bioinformatic issues. You're required to permute the reference a great deal or slacken up the mismatches allowed, sorting out the noise later down the track. If you convert SOLiD reads back into basespace you'll pay a fairly reasonable price - any errors in the read will frameshift base calls 3' to the error <Grumble> <Grumble>

Nils, have you tried aligning SOLiD bisulphite reads?
sci_guy is offline   Reply With Quote
Old 07-07-2009, 06:35 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by sci_guy View Post
If you're using Illumina the easiest (bias-free) way is to preprocess your bisulphite reads to convert C's to T's (remembering where they are) and align it to a reference with all C's changed to T's. Then write a script to introduce the C's back in, or relate these as tables in a database.

As for SOLiD, all this is horrible in colorspace. If you're trying to avoid alignment bias due to methylation differences SOLiD has some bioinformatic issues. You're required to permute the reference a great deal or slacken up the mismatches allowed, sorting out the noise later down the track. If you convert SOLiD reads back into basespace you'll pay a fairly reasonable price - any errors in the read will frameshift base calls 3' to the error <Grumble> <Grumble>

Nils, have you tried aligning SOLiD bisulphite reads?
I don't share your disdain for colorspace since it is quite powerful. For example the false positive rate for SNPs is a lot lower since you need two specific errors next to each other to get a SNP.

Anyhow, under the current chemistry, bisulphite sequencing would be difficult from a bioinformatic perspective for longer reads (>=50bp) on the SOLiD platform. You could consider a targeted pulldown on the Illumina platform to make up for the lower capacity (per dollar).
nilshomer is offline   Reply With Quote
Old 07-07-2009, 04:32 PM   #7
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

Bis-Seq relies upon counting the C's vs T's in aligned reads so for an unbiased statistic you want alignment potential of a bisulphite-treated DNA read to be equivalent regardless of C density.

With SOLiD you really want to align to a hypomethylated genome (No C's) and a hypermethylated genome (C's remain at CpG sites) since proprocessing the reads to convert C's to T's in colorspace is not possible. Reads with intermediate levels of methylation will be regarded as having SNPs in the alignment pipeline (two colorspace changes in a row). So, if your read has a fair number of CpG sites (say a read at a CpG island) and it goes over your alignment mismatch threshold it won't align when it is a perfectly good read. This creates a confounder where there is lowered alignment potential to high density CpG regions within the genome and to CpG sites near high population frequency SNPs or INDELs. You can counter for this by relaxing the number of mismatches allowed (and introduce false positive alignments) or align to a number of permuted bisulphite references. Preprocessed reads with Illumina have none of these issues. If you have a plant genome with CNG and CNN methylation then SOLiD is not a wise choice at all.

I'm not some sort of Illumina fan-boy. I originally chose SOLiD owing to error checking built into colorspace and the increased number of reads per dollar. However for a second experiment I've swapped to Illumina owing to the potential alignment bias issue and Illumina's increases in bandwidth later this year.
sci_guy is offline   Reply With Quote
Old 07-08-2009, 04:35 AM   #8
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

Quote:
Originally Posted by nilshomer View Post
I don't share your disdain for colorspace since it is quite powerful. For example the false positive rate for SNPs is a lot lower since you need two specific errors next to each other to get a SNP.

Anyhow, under the current chemistry, bisulphite sequencing would be difficult from a bioinformatic perspective for longer reads (>=50bp) on the SOLiD platform. You could consider a targeted pulldown on the Illumina platform to make up for the lower capacity (per dollar).
have you compared the same sample sequenced by illumina vs solid? personally i am quite platform agnostic now that they have comparable levels of throughput and read length, however unless anybody has sequenced the same sample on both platforms i am still not decided as to which gives the best combination of cost vs read length vs throughput

however, i definitely agree with sci_guy - solid colorspace is currently quite useless for bisulfite sequencing... this can be overcome bioinformatically (computationally expensive) however no-one has attempted this as yet.
frozenlyse is offline   Reply With Quote
Old 03-12-2010, 09:01 AM   #9
ondovb
Member
 
Location: Maryland

Join Date: Jan 2010
Posts: 20
Default

We just released SOCS version 2, which has a mode that is fully bisulfite-tolerant for SOLiD data. It's available at:

http://solidsoftwaretools.com/gf/project/socs/

It will take longer than using a standard algorithm with converted genomes (due to the complexity of the problem), but there won't be any bias in the results.
ondovb is offline   Reply With Quote
Old 03-12-2010, 09:05 AM   #10
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

bsmap is another tool. I have used it on bisulphite reads and it seems to work well
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 03-12-2010, 08:25 PM   #11
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

Quote:
Originally Posted by ondovb View Post
We just released SOCS version 2, which has a mode that is fully bisulfite-tolerant for SOLiD data.
Thanks! I'll take a look. I have more SOLiD data coming my way soon.
sci_guy is offline   Reply With Quote
Old 03-12-2010, 08:30 PM   #12
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

Quote:
Originally Posted by bioinfosm View Post
bsmap is another tool. I have used it on bisulphite reads and it seems to work well
I saw Wei Li talk about BSMAP at the AACR 2010 Cancer Epigenetics meeting. It was a nice talk. I like their use of what cytosines are present in the read to extract as much information as possible without creating bias.

It's probably the best Illumina bisulfite aligner out there at the moment.
sci_guy is offline   Reply With Quote
Old 03-15-2010, 08:20 AM   #13
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by sci_guy View Post
I saw Wei Li talk about BSMAP at the AACR 2010 Cancer Epigenetics meeting. It was a nice talk. I like their use of what cytosines are present in the read to extract as much information as possible without creating bias.

It's probably the best Illumina bisulfite aligner out there at the moment.
Thats interesting to know. Is it possible for you to share that talk/slides?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 03-15-2010, 08:50 AM   #14
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

novoalign and gsnap (http://www.gene.com/share/gmap/) also do bisulfite alignment. So far as I know all existing programs for bisulfite alignment take very similar strategy.
lh3 is offline   Reply With Quote
Old 03-15-2010, 02:24 PM   #15
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

I don't have access to the slides but the material is covered essentially in their BSMAP paper.

lh3 - Yes, I forgot about Novoalign. I should qualify my statement and suggest that BSMAP is perhaps the best free bisulfite aligner out there at present.
sci_guy is offline   Reply With Quote
Old 03-15-2010, 03:37 PM   #16
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.
lh3 is offline   Reply With Quote
Old 03-15-2010, 05:47 PM   #17
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

Quote:
Originally Posted by lh3 View Post
From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.

Thanks for the heads-up on GSNAP. I just had a look at the paper. It looks very nice. Particularly if they release a colorspace version, I am stuck with SOLiD colorspace data at present I ended up using SHRiMP with a hypermethylated genome (so C's in CpG context are retained) to match on.

Re: GSNAP bisulfite seq
In bisulfite mode the program produces two new hash tables, one with C-to-T substitutions and the other having G-to-A substitutions. From the paper: "When GSNAP processes a bisulfite read, it performs a C-to-T substitution of each 12-mer in the read to check against the C-to-T hash table, and a G-to-A substitution of each 12-mer in the reverse complement of the read to check against the G-to-A hash table."

So, essentially it creates a bisulfite hypomethylated genome and then looks for seed matches within in silico "hypomethylated reads". So all seed matching is in a three base space with no C's present at all. BSMAP is a little cannier. Reads don't have C's removed. Instead, read C's are matched to C's in the reference while T's can be matched to C's or T's iff they come from the read. Another way of thinking about this is that Illumina reads have T's converted to Y's and are matched against a standard (not in silico bisulfite converted) reference genome. In this respect the C's present in the read help to eliminate more dubious alignment candidates; so a slightly more information dense match than purely 3 base matching. An interesting effect is that improperly bisulfite converted material (that containing many unconverted C's) will align as equally well as properly converted material. More work in downstream filtering perhaps but a better estimate of bisulfite conversion instead of just adding up all the C's in mitochrondrial DNA mapped reads.

Last edited by sci_guy; 03-22-2010 at 03:02 PM.
sci_guy is offline   Reply With Quote
Old 03-15-2010, 06:19 PM   #18
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

@sci_guy

Yes, BSMAP is better in mapping strategy, although I do not know how much practical improvement this may lead to. It would be good to see a head-to-head comparison. Thanks for the information.
lh3 is offline   Reply With Quote
Old 03-15-2010, 07:17 PM   #19
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default

@lh3. I'm going to workshop over the next couple of days. It seems somebody else in my organisation has been using BSMAP with Arabidopsis bisulphite-Seq data. Below is their talk abstract. BSMAP would be particularly good for plant genomes considering all the CNG and CNN methylation. I'll see if I can get any slides.

"Hua Ying (CSIRO)
Approaches to mapping high-throughput bisulfite sequencing reads: High-throughput bisulfite sequencing is an attractive approach for analyzing genome-wide methylation patterns at a single-base-pair resolution. Although combining bisulfite conversion and high-throughput sequencing is increasingly widespread, its analysis is still problematic and limited to a few publications. A major challenge is the alignment of bisulfite-converted short reads to the reference genome due to increased search space and reduced sequence complexity as a result of the bisulfite conversion. Here, we took advantage of a recently published mapping algorithm BSMAP and demonstrated that BSMAP is more effective than previously used methods. By applying a two-step mapping strategy, we successfully mapped more than 90% of bisulfite short reads to the Arabidopsis genome."
sci_guy is offline   Reply With Quote
Old 03-22-2010, 06:18 AM   #20
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

thanks sci_guy
bioinfosm is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, methylation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO