SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
aligned seq > 1 times and skipping reads because of seed mismatches carolW Bioinformatics 12 05-22-2014 11:48 PM
TopHat/Bowtie - number of reads aligned mgibson Bioinformatics 7 10-22-2011 08:04 PM
Number of mismatches in Bowtie Rachelly Bioinformatics 0 05-08-2011 12:55 AM
bowtie sam output, number of mismatches sridharacharya Bioinformatics 2 01-08-2011 05:22 PM
the number of aligned reads by maq and bwa totalnew Bioinformatics 3 12-14-2009 11:18 AM

Reply
 
Thread Tools
Old 05-25-2014, 10:14 AM   #1
carolW
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 103
Default bowtie number of mismatches and multiple aligned reads

Hi
Should the --no-1mm-upfront parameter be used with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

Should 1 as cutoff for MAPQ be used to discriminate the exactly 1 time aligned vs >1 time aligned reads?

Look forward to your reply,

Carol
carolW is offline   Reply With Quote
Old 05-25-2014, 10:45 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

--no-1mm-upfront

Below is an excerpt from the Bowtie manual:

http://bowtie-bio.sourceforge.net/bo...gnment-options

"By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-end alignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed."


I don't think you can tell Bowtie to find exactly 1 or 2 mismatches,
I think you can only tell it the maximum number of mismatches to allow.
mastal is offline   Reply With Quote
Old 05-25-2014, 10:55 AM   #3
carolW
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 103
Default

So are you confirming that --no-1mm-upfront should be used as --no-1mm-upfront 1 or --no-1mm-upfront 2? Or should N and L be used?

Once > 1 time aligned reads are reported by bowtie, how is it possible to separate reads that aligned exactly once from those that aligned > 1 times?

Thanks
carolW is offline   Reply With Quote
Old 05-25-2014, 11:09 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It's just "--no-1mm-upfront" (it doesn't take an argument).

Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.
dpryan is offline   Reply With Quote
Old 05-25-2014, 11:43 AM   #5
carolW
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 103
Default

but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?

I meant mapping to repetitive regions by > 1 times alignment because in stats report, I get > 50% of > 1 times alignments. So the value of MAPQ is heureustic. In a given interval, how to choose the best?
carolW is offline   Reply With Quote
Old 05-25-2014, 03:54 PM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by carolW View Post
but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?
No, -no-1mm-upfront disables bowtie's default behaviour (which is to find alignments with 0 or 1 mismatches).
You can set -N 2 if you want to allow up to 2 mismatches in the seed region.
mastal is offline   Reply With Quote
Old 05-26-2014, 01:25 AM   #7
carolW
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 103
Default

When I set -N 2, I get error message:

Error: -N was set to 2, but cannot be set greater than 1
Error: Encountered internal Bowtie 2 exception (#1)

Is there any other parameter that should be set, too?
carolW is offline   Reply With Quote
Old 05-26-2014, 01:31 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).
dpryan is offline   Reply With Quote
Old 05-28-2014, 12:59 AM   #9
carolW
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 103
Default

Quote:
Originally Posted by dpryan View Post
It's just "--no-1mm-upfront" (it doesn't take an argument).

Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.
How could we judge a threshold as a reasonable? Does it depend of the data? All info is welcome.
carolW is offline   Reply With Quote
Old 05-28-2014, 01:31 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The MAPQ relates to the probability that the alignment is correct, so just pick a value that you're happy with depending on your downstream applications. For RNAseq, I usually use a theshold of 5, since there's enough coverage that a small amount of error won't have any considerable effect. For bisulfite sequencing data, on the other hand, I've found that a MAPQ threshold of 10 is usually the sweet spot, since there's less coverage per site, so one can't accept as much error. For variant calling, many of the callers utilize MAPQ and Phred scores in their call algorithms, so you may either not bother filtering or might just remove the highly unreliable alignments, which for bowtie2 are those with MAPQ of 0 or 1.

If you're looking for some objectively perfect filtering algorithm there is none, it's just a question of how much error your requirements can accept.
dpryan is offline   Reply With Quote
Old 05-28-2014, 06:08 AM   #11
carolW
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 103
Default

so it seems to be easy with my data as I have 0, 1, 42. 0 must corresponds to 0 time alignment as there is u in the strand column. 1 must be ambigous or aligned > 1 time and 42 unambigous, or aligned exactly once.
carolW is offline   Reply With Quote
Old 05-28-2014, 06:17 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Yeah, life is easy when you have just 3 values. A value of 42 is given when there's a perfect match and there's no valid next-best alignment. If you played with --score-min then you'd eventually get a larger variety of MAPQ scores, though that'd just overcomplicate your life
dpryan is offline   Reply With Quote
Old 05-28-2014, 06:22 AM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

BTW, there are actually 5 ways in which bowtie2 will yield a MAPQ of 0, only one of which is due to a read not being mapped (it's an unreliable alignment in any case). It's actually possible to have a "unique" alignment with a MAPQ of 0, assuming the definition of "unique" is having only one valid alignment given the --score-min and penalty settings.
dpryan is offline   Reply With Quote
Old 06-30-2014, 09:49 PM   #14
Lv Ray
Member
 
Location: GZ,China

Join Date: Jun 2014
Posts: 42
Default

agree with you
Lv Ray is offline   Reply With Quote
Old 12-22-2014, 11:16 PM   #15
gongjing
Junior Member
 
Location: China

Join Date: Dec 2014
Posts: 2
Default

Quote:
Originally Posted by dpryan View Post
Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).
so, what is the right way to set the overall permitted mismatches while mapping to the reference genome index with bowtie2? looking forward to your answer!
gongjing is offline   Reply With Quote
Old 12-23-2014, 07:01 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You need to tell bowtie2 to ignore base qualities and then set the --score-min parameter accordingly (e.g., "--score-min L,-30,0 --ignore-quals" would allow a maximum of 5 mismatches).
dpryan is offline   Reply With Quote
Old 12-23-2014, 04:50 PM   #17
gongjing
Junior Member
 
Location: China

Join Date: Dec 2014
Posts: 2
Default

Quote:
Originally Posted by dpryan View Post
You need to tell bowtie2 to ignore base qualities and then set the --score-min parameter accordingly (e.g., "--score-min L,-30,0 --ignore-quals" would allow a maximum of 5 mismatches).
Hi dpryan,
thank you for your wonderful suggestion! But I am confused with the output while following your advise.
First,I use the command bowtie2 -x index/9402_ref_ASM32557v1_chrUn.fa -c ATGACAGGCTTATGTCGA -a --no-hd ,and the alignment result is

1 reads; of these:
1 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
1 (100.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
100.00% overall alignment rate
0 16 gi|584418266|ref|NW_006435781.1| 1491702 255 18M * 0 0 TCGACATAAGCCTGTCAT IIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:4T13 YT:Z:UU

then I tried the command: bowtie2 -x index/9402_ref_ASM32557v1_chrUn.fa -c ATGACAGGCTTATGTCGA --score-min L,-30,0 --ignore-quals -a --no-hd, but the aligning returns the same result, the --score-min L,-30,0 --ignore-quals seems not to help find more alignments with tolerated mismatchs. I also tried other sequence for input, but the field in XM:i:n returned is all XM:i:1 or XM:i:0(I think the field indicates the mismatch number). So is there any problem with my command ?

Sincerely.
gongjing is offline   Reply With Quote
Old 12-23-2014, 05:03 PM   #18
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Firstly, there may not be any other valid hits with the specified number of mismatches. Secondly, even if there were, bowtie2 can miss alignments on occasion.
dpryan is offline   Reply With Quote
Reply

Tags
alinged reads, bowtie2, mismatches

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO