SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reference that matches vcf? brofallon Bioinformatics 5 02-24-2012 07:36 PM
counting RNA-seq matches shuang Bioinformatics 13 01-31-2012 07:50 AM
Finding Perfect matches and mismatches bhargavpbk88 Bioinformatics 2 01-19-2012 08:00 AM
Regarding Unique reads, Unique alignments sridharacharya RNA Sequencing 2 09-20-2010 06:39 AM
Unique VS Non-Unique read analysis samt Bioinformatics 2 09-29-2009 10:44 AM

Reply
 
Thread Tools
Old 11-28-2012, 06:06 AM   #1
mixter
Member
 
Location: Munich, Germany

Join Date: May 2010
Posts: 22
Default Bismark and unique matches

Hello,

I have a mostly theoretical question which my group would need to adress, though. Bismark (and possibly, other methylation mappers) have a hardcoded behavior of allowing only the unique best match to be returned as a result (Bowtie2 option -k 1, I believe).

My question would be whether this is just a recommended setting, or vitally necessary for Bismark-style methylation mapping to be correct? If so, why?

Our motivation is that we are thinking about mapping some specific data to a set of sequence subclasses (instead of the whole genome) and would like to get the 5 best hits in those sequence subclasses or so per sequence (in case more than one possible hit exists). And we're wondering whether this kind of result would be ok to interpret from Bismark methylation mapping, or why not, if not.

Many thanks
mixter is offline   Reply With Quote
Old 11-28-2012, 10:47 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Hi Mixter,
We chose to only report unique matches in Bismark simply because one can't tell the exact origin of a read with certainty if a read or read pair maps to several distinct locations in the genome with the same quality (say perfect matches). In a extreme situation, imagine the following two sequences:

(1) CGCGCGTTTCCCCC
(2) TGTGTGTTTTTTTT


Due to the 3-letter alignment mode both sequences would look exactly the same during the mapping step. In a multi-mapping scenario you would call region (1) to be 100% methylated, whereas region (2) would be called 0% methylated, but would you be confident about these results?

We did not implement a multi-alignment step (yet) since I think it is very likely that many would use it as their standard mode (greedy as humans are they would like to have as many methylation calls per $/ spent as possible...). I would imagine that it is not entirely trivial to separate such potentially unreliable reads out from uniquely mapping ones later on, and I can see lots of requests heading my way :P. While I agree that multi-mapping might be useful in certain specialised scenarios, I would not want to offer it as a standard option.

Just for the record, the Bowtie 1 mode in Bismark uses -k 2 to determine unique best alignments. You are right that this mode is hard-coded in order to determine unique alignments; theoretically, if one circumvented the determination of unique alignments one could feed several best alignments straight into the methylation calling routine per read-in sequence. In Bowtie 2 mode, the alignment score (AS:i is used to determine whether the next best hit (if there is any) is equally good or worse. Similarly, with some adaptation this might also be changed to report more than one alignment. This sounds fairly trivial, but as these things go it might actually be somewhat time-consuming...
fkrueger is offline   Reply With Quote
Old 12-12-2012, 04:47 AM   #3
mixter
Member
 
Location: Munich, Germany

Join Date: May 2010
Posts: 22
Default

Hello Felix,

Thanks for the detailed explanation, so, that means that it's conceptually possible but with the caveats you mentioned.

Meanwhile I'm practically interested in doing this, i.e. returning non-unique / ambiguously mapped sequences. I noticed that the latest Bismark (0.7.7) already supports the Bowtie2 option --most_valid_alignments. However, how do you return ambiguous alignments? --ambiguous just outputs a FASTA file; What would be the easiest way to get a SAM with the possible multiple genomic coordinates of ambiguously mapped reads, if any?
mixter is offline   Reply With Quote
Old 12-12-2012, 05:06 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Quote:
Originally Posted by mixter View Post
Hello Felix,

Thanks for the detailed explanation, so, that means that it's conceptually possible but with the caveats you mentioned.

Meanwhile I'm practically interested in doing this, i.e. returning non-unique / ambiguously mapped sequences. I noticed that the latest Bismark (0.7.7) already supports the Bowtie2 option --most_valid_alignments. However, how do you return ambiguous alignments? --ambiguous just outputs a FASTA file; What would be the easiest way to get a SAM with the possible multiple genomic coordinates of ambiguously mapped reads, if any?
Hi Mixter,

The Bowtie2 option --most_valid_alignments used to be the Bowtie2 option -M, however it is now deprecated (from the Bowtie2 side). I left the description in there for information only:

-most_valid_alignments <int>
This used to be the Bowtie 2 parameter -M. As of Bowtie 2 version 2.0.0 beta7 the option -M is
deprecated. It will be removed in subsequent versions. What used to be called -M mode is still the
default mode, but adjusting the -M setting is deprecated. Use the -D and -R options to adjust the
effort expended to find valid alignments.

This option just affected how much effort was spent by Bowtie2 to actually find the best alignment, but it does not mean that Bismark is actually reporting multiple ambiguous hits. The easiest option to get Bismark to do it would probably be to copy the entire subroutine "check_bowtie_results_single_end" (or paired end, with or w/o Bowtie2) and edit it so that it skips all checks that determine whether a read is unique or not. This sounds pretty straight forward, but there might be some twists...
fkrueger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO