SEQanswers

Go Back   SEQanswers > Applications Forums > Epigenetics



Similar Threads
Thread Thread Starter Forum Replies Last Post
base calling bustard or external tools? airtime Bioinformatics 0 03-09-2012 04:37 AM
Methylation Analysis GERALD Epigenetics 5 08-15-2011 01:38 AM
How Cuffcompare works? ngs_agd Bioinformatics 1 05-10-2011 12:18 AM
tools for SNP calling in pooled samples gfmgfm Bioinformatics 0 12-30-2010 10:57 AM
how paired end alignment works? totalnew Bioinformatics 8 04-27-2009 01:46 PM

Reply
 
Thread Tools
Old 04-05-2012, 10:25 AM   #1
mixter
Member
 
Location: Munich, Germany

Join Date: May 2010
Posts: 22
Default Methylation calling tools, what works well?

Hi all,

I'm looking for a stable standalone tool to call methylation from mapped reads, in my case from RRBS. I also need to be able to call non-CpG methylation in some way.

[Reason: I'm usually using bismark, which I can also recommend, but it is limited to bowtie. My current issue with some RRBS data sets is that they have very poor mapping efficiency in bowtie (after clipping+trimming) but mapping with other tools, especially RRBSMAP, works well. Perhaps this could be due to ambiguous reads which are less of an issue when mapping only to RRBS-relevant fragments of the genome.]

methratio.py from the rrbsmap package does call methylation, but it is unclear to me whether this is just CpG or also non-CpG methylation (there is no distinction and it's not documented).

Before long trial and error, I'd be very interested in your own experiences with standalone methylation calling after mapping, and what works.

Thanks!
mixter is offline   Reply With Quote
Old 04-05-2012, 01:00 PM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Hi Mixter,

Just out of interest: were you doing paired-end sequencing with fairly long reads? If this is the case then I can understand that you see a fairly low mapping efficiency with Bowtie, and this is indeed caused by Bowtie's behavior not to regard completely overlapping reads as valid alignments.

E.g.: a read pair looking like this

---------------------------------------> read 1
<--------------------------------------- read 2

is regarded invalid for Bowtie 1.

As RRBS size-selects for fragments between 40 and 220bp, and there are indeed even shorter fragments passing the size-selection step, you can expect a sizeable proportion of reads to be completely overlapping after adapter trimming (e.g, trimming a 100bp paired-end read that was merely sequencing a 40 bp fragment).

To get the above shown reads to align with Bowtie 1 it is sufficient to trim the reads by 1 bp so that they are not completely contained within each other but only overlap almost entirely, like so:

-------------------------------------->. read 1
.<-------------------------------------- read 2

Alternatively, running Bismark with Bowtie 2 can also handle reads that are completely contained within each other.

I am only mentioning this because we just had a similar case at our institute where we sequenced a library with a ~120bp mean fragment length with 2x100 bp paired-end reads. After adapter trimming, Bowtie 1 alignments mapped with ~50% efficiency because lots of fragments were sequenced exactly twice by both reads. If these reads were trimmed by 1bp on the 3'end, the mapping efficiency went up to nearly 80%. The effect is probably even more pronounced for RRBS libraries.

If you didn't do paired-end alignments, please accept my apologies for this lengthy explanation :P

As a final remark, in theory it shouldn't be too difficult to adapt Bismark's methylation caller for your initial question. Nevertheless I am understably interested in getting to the bottom of why you were dissatisfied with Bismark's mapping results, since one should easily get ~66% mapping efficiency from a good single-end 40bp library, and this will only get better with longer (good quality) reads or paired-end reads (I recently had 2x40bp RRBS libraries with 73% mapping efficiency).
fkrueger is offline   Reply With Quote
Old 04-06-2012, 12:13 PM   #3
mixter
Member
 
Location: Munich, Germany

Join Date: May 2010
Posts: 22
Default

Thanks, even though I'm having these issues with single-end sample, the paired-end information is valuable.

I have a rather strange case where a medium-quality sequencing run is yielding almost 0.0% mapping efficiency and similar runs only about 10%. I've done proper trimming and adapter clipping before (there are still some overrepresented sequences left, but only from spike-in DNA from phiX). It's already trimmed but I'm also skipping 5 more of the start bases due to low quality. This is a RRBS digest of the mouse genome with MseI (as can be seen by most reads starting with TAA).

I've tried bismark with the latest Bowtie and also Bowtie2 (beta 5) with all options possible for making it less strict (even -e 1000, since phred values are low) but the best I can get is about 4% mapping efficiency. I was quite surprised that rrbsmap 1.6 mapped almost all reads (but its methylation calling script is really sparse and I would need an external tool, perhaps adapt bismark to it if possible).

I also aligned directly to the GA/CT converted genomes with bowtie with more custom options and got the same low efficiency. So I think it's really bowtie's strictness that makes such a difference here. If you're interested and want to take a look at some sample reads to make a guess about the big discrepancy between bowtie vs. rrbs, feel free: http://pastebin.com/J5wPvBRY (I always mapped against mm9).

Btw, apart from this issue, a future option in bismark that would be great for RRBS would be a conversion of only parts of the genome relevant for RRBS (e.g. near MspI or MseI sites) and limited mapping to those regions.

Last edited by mixter; 04-06-2012 at 12:15 PM.
mixter is offline   Reply With Quote
Old 04-18-2012, 09:53 PM   #4
comingme
Junior Member
 
Location: houston

Join Date: Feb 2010
Posts: 9
Default

Hi mixter,

I came across your post here while I am doing a google search.

Y. Xi and I, as authors of RRBSMAP, developed the mSuite tool. It is a methylation analysis pipeline. In brief, it does methylation calling on CG/CH/CGH/CHH. It reports some statistics. It does identification of differentially methylated Cytosines, DMC, identification differentially methylated regions, DMR. It does association of genome features with methylation.

I haven't mentioned it anywhere because the modules are not completely wrapped together and I am still preparing the manuscripts for the methods description. But you may use the methylation calling module without too much reading. There's no methods involved.

http://code.google.com/p/msuite/

Installation:

1.
Install Boost, Samtools, Rcpp(a R library) before compiling mSuite.

2.
Make sure $SAMTOOLS and $BOOST_ROOT are pointing to the correct location.

For example on my system,
export SAMTOOLS=/share/apps/samtools/0.1.16
export BOOST_ROOT=/share/apps/boost/boost-1.46.1
You may have to export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib if you installed Boost into /usr/local/lib as a superuser.

You may just hard code these commands in ~/.bashrc file.

3.
untar msuite and type make.


====following are installation instructions for Boost and Samtools====
Building mSuite from source
In order to build mSuite, you must have the Boost C++ libraries (version 1.38 or higher) installed on your system. See below for instructions on installing Boost.

Installing Boost
./bootstrap.sh
./b2 install

You may have to export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib if you installed Boost into /usr/local/lib as a superuser.

Installing the SAM tools

Download the SAM tools
Unpack the SAM tools tarball and cd to the SAM tools source directory.
Build the SAM tools by typing make at the command line.
Choose a directory into which you wish to copy the SAM tools binary, the included library libbam.a, and the library headers. A common choice is /usr/local/.
Copy libbam.a to the lib/ directory in the folder you've chosen above (e.g. /usr/local/lib/)
Create a directory called "bam" in the include/ directory (e.g. /usr/local/include/bam)
Copy the headers (files ending in .h) to the include/bam directory you've created above (e.g. /usr/local/include/bam)
Copy the samtools binary to some directory in your PATH.

If you install SAM tools in your home dir, then just replace the above string '/usr/local/' by your desired installation directory.

./configure (--with-bam=/home/dsun/ if you installed bam/*.h to /home/dsun/include/bam/ and libbam.a to /home/dsun/lib/)
make
make install
comingme is offline   Reply With Quote
Old 03-02-2015, 12:17 PM   #5
amdic2
Junior Member
 
Location: Quebec city

Join Date: Jul 2011
Posts: 6
Default

Dear mixter,
I hope you read this post even though you wrote about it almost 3 years ago
I am curently having the same problem as you did with a non-model species analyzed by RRBS.
Could you solve you low efficiency mapping with Bismark? If so, how did you do it?
Thanks in advance!
Anne-marie
amdic2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO