SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina Error rate ClemBuntu Bioinformatics 9 12-09-2015 12:52 PM
bbmap error rate - subs - indels danova Pacific Biosciences 2 10-15-2015 10:42 AM
BBMap Error Phage Hunter Bioinformatics 5 01-14-2015 04:34 AM
error rate der_eiskern Illumina/Solexa 0 12-11-2009 02:51 PM

Reply
 
Thread Tools
Old 02-18-2016, 06:59 AM   #1
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default Error rate in BBMAP

hello,
I want to know the equation error rate in BBMAP?
how you calcuer that rate?
Thank you for your reply.
@Brain?

Last edited by mido1951; 02-18-2016 at 12:35 PM.
mido1951 is offline   Reply With Quote
Old 02-18-2016, 02:58 PM   #2
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

any response for this?
mido1951 is offline   Reply With Quote
Old 02-18-2016, 03:35 PM   #3
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 503
Default

What do you mean by "equation error rate"? Are you asking what fraction of reads are aligned to the incorrect loci? Or how errors affect the alignment accuracy? Or something else?
HESmith is offline   Reply With Quote
Old 02-18-2016, 03:41 PM   #4
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

no, I want to know how you get the error rate?
what is the distance to get the error rate?
formally, how to express the error rate in BBMAP?

Code:
Error Rate:              23.1932%          281318         3.9989%(this)           24419957
mido1951 is offline   Reply With Quote
Old 02-18-2016, 03:55 PM   #5
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 503
Default

Again, to what error rate are you referring? There are multiple errors that can be measured, such as the two examples I provided.

What command(s) did you use to obtain the line of code in your previous post?

Last edited by HESmith; 02-18-2016 at 03:57 PM.
HESmith is offline   Reply With Quote
Old 02-18-2016, 07:16 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hmmm, I think mido is referring to the stderr output of BBMap after it finishes running. Those columns are:

(name), (% of reads with any errors), (number of reads with any errors), (% of bases with any errors), (number of bases with any errors).

Please note that the value can be a bit misleading if a lot of reads are mapped with long deletions. For Illumina reads, it's better to look at the substitution rate unless you reduce BBMap's default "maxindel" flag from the default 16000 down to a much lower value of perhaps 100.

The way it is calculated is based on the number of total alignment operations and number of matching alignment operations. Internally, when BBMap aligns a read to a reference, it supports 5 operations:

M: Match
S: Substitution (a base in the read differs from the reference)
I: Insertion (a base in the read not present in the reference)
D: Deletion (a base in the reference not present in the read)
N: No-call (undefined in the read or the reference)

These roughly correspond to cigar strings, but cigar strings do not have an equivalent of the "N" symbol, and they do have a lot of strange, poorly-defined symbols, rendering them not very useful in computation.

Simply, the sub rate is calculated as S/(M+S+I+D+N). The del rate is D/(M+S+I+D+N), and so forth. The error rate is (S+I+D+N)/(M+S+I+D+N).

For example, this match string:

mmmmSmmmmImmmmDDDDDDmmmmN

...has 16 matches, 1 sub, 1 insertion, 6 deletions, and 1 N, for 25 total operations. The cigar string would be 4=X4=I4=6D4=X, or something like that (the specification is not fully defined). In this case the sub rate would be 4%, ins rate 4%, del rate 24%, and N rate 4%, giving an error rate of (1+1+6+1)/(16+1+1+6+1)=36%.

Last edited by Brian Bushnell; 02-18-2016 at 07:21 PM.
Brian Bushnell is offline   Reply With Quote
Old 02-19-2016, 04:19 AM   #7
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

Quote:
Originally Posted by Brian Bushnell View Post
(name), (% of reads with any errors), (number of reads with any errors), (% of bases with any errors), (number of bases with any errors).
thankyou Brian for your Explanations.
I made a mapping with BBmap and i saw the error rate (% of bases with any errors) because you have told me the other day that we must take account of this error rate.
I am looking how you calculate that error rate and what is the equation because I need to put it in my research.
thank you
mido1951 is offline   Reply With Quote
Old 02-19-2016, 04:28 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Since you mostly appear to be working with MinION data is this question for data aligned with mapPacBio.sh?

If you mapped your MinION data with regular bbmap.sh then the errors could be different.
GenoMax is offline   Reply With Quote
Old 02-19-2016, 04:33 AM   #9
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

Quote:
Originally Posted by GenoMax View Post
Since you mostly appear to be working with MinION data is this question for data aligned with mapPacBio.sh?

If you mapped your MinION data with regular bbmap.sh then the errors could be different.
I did not use mapPacBio.sh because i have Minion reads.
But I have corrected reads and I made a mapping these reads to the reference genome and I see the error rate.
but I have to express the error rate and I have to put it in my research because I can not put an error rate without turning the equation of error rate.
mido1951 is offline   Reply With Quote
Old 02-19-2016, 05:15 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

You should have used mapPacBio.sh with raw MinION reads but that is not the main point here.

@Brian: Did provide an explanation of how the error rate is calculated above (with the equation). Numbers we see in the final output must be an average across all mapped reads.
GenoMax is offline   Reply With Quote
Old 02-19-2016, 05:18 AM   #11
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

the global error rate equation is:
The error rate is (S+I+D+N)/(M+S+I+D+N) ??
mido1951 is offline   Reply With Quote
Old 02-19-2016, 05:44 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Yes I think so. Per @Brian

Quote:
The way it is calculated is based on the number of total alignment operations and number of matching alignment operations.
GenoMax is offline   Reply With Quote
Old 02-19-2016, 06:00 AM   #13
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

what's the distance used in every M, D, I,....??
mido1951 is offline   Reply With Quote
Old 02-19-2016, 06:18 AM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@Brian will have to chime in with a final word but I think the rate is an average across all alignment operations as he indicated above. Each read will have its own M,S,I,D,N values from its CIGAR strings.
GenoMax is offline   Reply With Quote
Old 02-19-2016, 12:33 PM   #15
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

Is that the distance M=N=I=D=S=1??
any response Brian?

Last edited by mido1951; 02-19-2016 at 02:22 PM.
mido1951 is offline   Reply With Quote
Old 02-19-2016, 05:52 PM   #16
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by mido1951 View Post
Is that the distance M=N=I=D=S=1??
any response Brian?
The match strings are like expanded cigar strings, with one character per operation. So a read with 3 matches, then 2 substitutions, then 4 matches would have this match string:

mmmSSmmmm

In other words, every letter corresponds to 1 operation (basically 1 base). Does that answer your question?
Brian Bushnell is offline   Reply With Quote
Old 02-20-2016, 03:47 AM   #17
mido1951
Senior Member
 
Location: Tunisia

Join Date: May 2014
Posts: 123
Default

for me, I want to express the error rate in the equation.
From what I understood the error rate is equal: error rate = (S+I+D+N)/(M+S+I+D+N)
am i right?
then I want to see the distance used for each operation of M,S,I,D,N.
and from what you told me each operation takes one unit(operation) = 1.
am i right for this?
thanks
mido1951 is offline   Reply With Quote
Old 02-20-2016, 09:01 AM   #18
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Yes, that is correct.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO