SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Ion Torrent (http://seqanswers.com/forums/forumdisplay.php?f=40)
-   -   Ion torrent error correction (http://seqanswers.com/forums/showthread.php?t=64310)

skbrimer 11-16-2015 11:09 AM

Ion torrent error correction
 
I asked this question on the Ion Community a couple of months ago without an answer or reply so I thought I would try here.

Over the last several years Ion Torrent has improved it chemistry and base-calling algorithm and I'm wondering if error-correction is still advisable for ion data or not?

I'm afraid that if ion has already "corrected" the data in the single processing step if I would be introducing error by correcting it a second time.

Brian Bushnell 11-16-2015 12:01 PM

There's no problem with error-correcting data multiple times. But if you error-correct it, be sure to use a program that can tolerate indel-type errors.

skbrimer 11-16-2015 12:26 PM

Thanks Brian :)

Are there time when you would not want to error correct?

Brian Bushnell 11-16-2015 12:31 PM

You shouldn't error-correct if you are looking for rare variants (much less than the 50% ratio of a normal heterozygous diploid variant), or are doing amplicon sequencing, or are looking at tumor samples, or you have low coverage. Also, error-correction won't help much with platform-specific errors (like being unable to correctly determine the length of a long homopolymer), just with random errors.

If you have a reference, you can map before and after error-correction, and look at the error rates, to make sure error-correction improved things.

skbrimer 11-16-2015 12:50 PM

Sooo... how does one evaluate an error rate with a reference? Is it just a comparison of the vcf files?

Also why would it be bad to error correct in those situations, I imagine that it will have to due with "correcting" away an actual variant but a variant would still have to be present at a rate higher than the machine's error rate to be called with an confidance right? i.e. if you have a 1% error rate and 1000x coverage you could not call anything less than 10X right?

Brian Bushnell 11-16-2015 12:56 PM

Map to the reference with BBMap, like this:

bbmap.sh ref=reference.fa in=reads.fq out=mapped.sam mhist=mhist.txt ehist=ehist.txt qhist=qhist.txt indelhist=indelhist.txt

BBMap will print useful statistics to the screen:
Code:

Read 1 data:            pct reads      num reads      pct bases          num bases

mapped:                  99.6100%            9961        99.6100%            1494150
unambiguous:            97.8900%            9789        97.8900%            1468350
ambiguous:                1.7200%            172        1.7200%              25800
low-Q discards:          0.0000%              0        0.0000%                  0

perfect best site:        1.7500%            175        1.7500%              26250
semiperfect site:        1.7500%            175        1.7500%              26250

Match Rate:                  NA              NA        61.1359%            1409105
Error Rate:              96.0596%            9605        38.5408%            888317
Sub Rate:                87.2787%            8727        2.2734%              52398
Del Rate:                43.4543%            4345        35.1743%            810722
Ins Rate:                48.9249%            4892        1.0932%              25197
N Rate:                  50.2050%            5020        0.3232%              7450

....and you can also plot the mhist or other histograms, for more details.

Quote:

Originally Posted by skbrimer (Post 184798)
Also why would it be bad to error correct in those situations, I imagine that it will have to due with "correcting" away an actual variant but a variant would still have to be present at a rate higher than the machine's error rate to be called with an confidance right? i.e. if you have a 1% error rate and 1000x coverage you could not call anything less than 10X right?

Error correction relies on high depth. With low depth it just doesn't work, and low depth of a variant compared to the reference will lead to that variant getting corrected away.


All times are GMT -8. The time now is 08:38 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.