SEQanswers Metrics for usability of a SOLiD dataset
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post LizBent De novo discovery 7 06-25-2015 04:56 AM Newsbot! Literature Watch 0 01-04-2012 03:10 AM Estefania De novo discovery 1 12-17-2011 07:11 AM froggins Bioinformatics 2 07-27-2011 10:50 AM froggins The Pipeline 1 01-05-2011 07:53 AM

 09-29-2008, 12:57 PM #2 ECO --Site Admin--   Location: SF Bay Area, CA, USA Join Date: Oct 2007 Posts: 1,358 As far as I understand it...your script calculates single colorspace errors, right? Rather than "miscalls" in true basespace?
 09-30-2008, 07:23 AM #3 new300 Member   Location: northern hemisphere Join Date: Mar 2008 Posts: 50 I guess it's application dependent. What are you intended to do with SOLiD reads without a reference? I would have thought that with short color space reads there's little you can do but SNP calling against a reference, but I could be wrong. If you're aligning to a reference, any reference I would have thought it would make sense to calculate the error rate against this.
 10-02-2008, 03:29 PM #4 snetmcom Senior Member   Location: USA Join Date: Oct 2008 Posts: 158 You may find this useful, but it's falling into many of the pitfalls of non Solid informatics people.
10-04-2008, 09:33 AM   #5
pmiguel
Senior Member

Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,318

Quote:
 Originally Posted by ECO As far as I understand it...your script calculates single colorspace errors, right? Rather than "miscalls" in true basespace?
Yes. Except it estimates the number of errors per read based on the quality values assigned by the SOLiD base(color) caller.

For example, if a read is 35 bases and each base had a quality value of 10, then that is a 10% chance of error per base. So the estimated number of miscalls would be 3.5 =(0.1*35). But if each base had a quality value of 20, the estimated number of miscalls for that read would be 0.3 =(0.01*35).

Of course normal reads will have different quality values for each base. To estimate the number of miscalls, the script just adds up the estimated chance of a miscall for each base.

The major pitfall here is that I have no idea whether the SOLiD base caller accurately predicts its own error rate. I gather that the SOLiD base caller is tuned on mappable reads (those with 3 errors or less). Should be possible to check how it does on reads mapped with up to 6 errors against a reference sequence without a lot of redundant/low complexity segments. But I have not done this.

--
Phillip

10-13-2008, 11:41 PM   #6
ECO

Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358

Quote:
 Originally Posted by snetmcom You may find this useful, but it's falling into many of the pitfalls of non Solid informatics people.
I'd love to hear more on that line of thinking....

 Tags abi, quality values, solid