Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post nii 454 Pyrosequencing 4 10-15-2020 07:29 AM baohua100 Bioinformatics 24 10-11-2020 07:43 AM foolishbrat Bioinformatics 1 02-24-2009 02:59 AM baohua100 Bioinformatics 1 02-19-2009 10:21 AM baohua100 Bioinformatics 1 06-17-2008 09:09 AM

 01-09-2009, 06:36 AM #1 foolishbrat Member   Location: South East Asia Join Date: Nov 2008 Posts: 44 Interpreting Quality Score (Solexa) Dear all, Usually we find this kind of quality error of Solexa tag Code: `-33 31 -40 -34 -40 -40 -40 40 27 -27 -40 -40` Each four-numbers correspond to 1 base. Hence, the above quality refer to length 3 tags (e.g. "tca"). My question are as follows: What is the reasonable way to find single number to represent each base? (e.g should we average the 4 figures or pick the highest score out of 4?) How can we interpret the figure? e.g. Is base with positive quality score is better than negative quality score? In general, how do people use this type of quality score information?
 01-09-2009, 09:45 AM #2 swbarnes2 Senior Member   Location: San Diego Join Date: May 2008 Posts: 912 Avergeing would be bad. Each number in the set of 4 represents the score for A,C,G, or T respectively. So the sequence for your little bit there is CTA, because in the first base, the second number is the highest, and in the second 4-some, the fourth base is the highest, and in the third, the first base is the highest. The scores are Solexa quality scores, not exactly the same as Sanger quality score, though when the score is > 15, the two are virtually identical. There is a conversion equation around to convert the Solexa scores to Sanger scores, and an equation telling you what the error rate of a given Sanger quality score are supposed to be. A lot of alignment programs don't use the quality scores at all in alignment, though they will output the quality scores of mismatches, which helps you determine how likely it is that teh mismatch is a real polymorphism, and not an error. But read depth probably tells you more than quality scores when it comes to SNPs.
01-09-2009, 01:21 PM   #3
new300
Member

Location: northern hemisphere

Join Date: Mar 2008
Posts: 50

Quote:
 Originally Posted by foolishbrat Dear all, Usually we find this kind of quality error of Solexa tag Code: `-33 31 -40 -34 -40 -40 -40 40 27 -27 -40 -40` Each four-numbers correspond to 1 base. Hence, the above quality refer to length 3 tags (e.g. "tca"). My question are as follows: What is the reasonable way to find single number to represent each base? (e.g should we average the 4 figures or pick the highest score out of 4?) How can we interpret the figure? e.g. Is base with positive quality score is better than negative quality score? In general, how do people use this type of quality score information?
Sorry, you probably know most of this already but...

In general people would use the fastq files which are generated by the Gerald step of the GAPipeline. These files contain the base calls and an associated quality score (which is as estimation of how good the software thinks it's guess is). Most short read aligners used fastq files are their input and many (for example Maq) use this information to help find the correct alignment position. Fastq files look like this:

@complete:333:89
CGCCTTCGTATGTTTATCCTGCTTATCACATACTA
+complete:333:89
132057787<:9133*9,.65177;54;8)3)37/

The line following the @ contains the sequence and that following the + contains a ascii encoded number representing a quality score. There's a table here: http://www.genographia.org/portal/to...sheet.pdf/view to convert this to a "probability of error".

Quality scores are also useful in SNP calling you need more bases of low quality than high quality to call a SNP with confidence. You can also filter reads based on quality score in order to discard junk reads. All in all they are quite handy but you should make sure they are correctly calibrated (and therefore accurately assigned).

The prb file you've shown contains 4 quality scores for each base. So rather than just getting the probability that the correct base is right you also get probabilities for each of the other bases. So for example, you would be able to say "it was probably an A or a C, but it's very unlikely it was a G or a T". That might be useful information and some aligners are starting to take advantage of this information but it's not been fully exploited. However don't get too attached to these prb files as I believe they are set to disappear from the latest version of the GAPipeline.

 10-11-2020, 07:48 AM #4 ABC@123 Junior Member   Location: canada Join Date: Oct 2020 Posts: 6 Re Do you know there are many services which claim that they are the best but I think we should check them before using it because it is always important to find the right help for your problems?

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules

All times are GMT -8. The time now is 06:25 AM.