![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Interpreting Quality Score (Solexa) | foolishbrat | General | 3 | 10-11-2020 07:48 AM |
【10 USD reward】Probability question for my experiment! Please help me! | Godevil | Bioinformatics | 22 | 12-16-2011 07:01 AM |
Questions on the updated illumina quality score | zeam | Bioinformatics | 6 | 10-26-2011 12:08 PM |
Two Version of Solexa Quality Score Formula | foolishbrat | Bioinformatics | 1 | 02-24-2009 02:59 AM |
Questions about solexa quality score | baohua100 | Bioinformatics | 1 | 06-17-2008 09:09 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Canada Join Date: Jun 2008
Posts: 103
|
![]()
reads.fq file:
@4:1:518:715 GATACCATAAAAGCTGGATCCTTCTTCAAGCATAA +4:1:518:715 hhhhhhhhhhhhhhhdhhhhhhhhhhhdRehdhhP 1. How to change character (like 'e' or 'h') to quality score? 2. What's the meaning of this score? How to compute this score ( formula )? |
![]() |
![]() |
![]() |
#2 |
Member
Location: Pune, India Join Date: Apr 2008
Posts: 21
|
![]()
For a Fastq file, if the quality character is $q the corresponding Phred quality can be calculated with the following Perl code:
$Q = ord($q) - 33; |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: UK Join Date: Jun 2008
Posts: 1
|
![]()
This is correct if you are using quality scores encoded in "fastq" format. I believe the Illimina pipeline used a different ascii offset (64) according to their pipeline documentation. A value of zero = ascii 64 ('@'). The ascii value for a qv is therefore qv+64. So "h" = 104 - 64 = 40
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Pune, India Join Date: Apr 2008
Posts: 21
|
![]()
Dupe. Deleted.
Last edited by Farhat; 06-17-2008 at 08:20 AM. |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Pune, India Join Date: Apr 2008
Posts: 21
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Canada Join Date: Jun 2008
Posts: 103
|
![]()
Thanks.
what's the range of this score ? (0---40 ?) what's the meaning of this score? |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
The range is from -5 to 40
If P is probability of base then Solexa quality is 10 log10(P/(1-P)) A quality of -5 corresponds to P=0.25 |
![]() |
![]() |
![]() |
#8 |
Member
Location: Pune, India Join Date: Apr 2008
Posts: 21
|
![]() |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Farhats right for Solexa prb file formats from the base caller but for fastq format files the OP asked about, the range should be -5 to 40
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Pune, India Join Date: Apr 2008
Posts: 21
|
![]()
Yes, that's right, because for solexa PRB file the probability of A,C,G or T is given separately, and can be really low, whereas for fastq the lowest probability is 0.25 implying equal probability for any nucleotide.
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Canada Join Date: Jun 2008
Posts: 103
|
![]()
$sQ = -10 * log($e / (1 - $e))
when $sQ =40, $e=0.0001 when $sQ=0, $e=0.5 0.5>0.25 when $sQ=-4 $e=0.72 what's the probalibity of error????????????????????????????? |
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
If you are talking fastq format and have a quality of -4 then the probability of the base called is 0.28 and probability it is anyone of the other 3 bases is 0.72.
If you see a -4 in a prb format file then the probability of the base is 0.28 and the other bases will each have their own prb/qual value. |
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Israel Join Date: Sep 2008
Posts: 1
|
![]()
The output of a Solexa run generated a "quala" file of the following format:
>sequence_0 40 40 40 19 7 40 40 40 40 40 31 40 40 40 40 40 40 40 40 40 40 11 40 40 40 36 40 12 40 21 39 1 4 40 40 15 40 40 4 40 40 10 40 40 40 40 40 2 4 10 1 >sequence_1 40 40 8 13 12 40 40 40 40 17 27 40 25 17 4 40 40 40 21 40 40 37 40 40 37 4 40 33 40 25 40 3 20 40 40 20 40 40 4 40 8 7 40 40 15 4 10 1 5 20 1 etc... Does anybody know what those numbers mean? Are those simply the Solexa quality scores per base-pair? The range seems to be 1-40 --- why isn't it -5 to 40 as in fasq? |
![]() |
![]() |
![]() |
#14 |
Member
Location: US Join Date: Feb 2008
Posts: 13
|
![]()
Hi,
Does anyone know an easy way or an existing program to convert all the .prb files from one particular lane into one fastq file? Similar to the s_1_sequence.txt file but with no filters applied? We have trying hacking around the Perl scripts within GERALD but looks like you need an intermediate seqpre.tmp file which I think gets deleted after the completion of GERALD. We know this is possible by just running GERALD with the fastq parameter. However, we would like to generate a fastq file that is not affected by GERALD's filters. That way we can set up our own quality filters. Any ideas? Do I go ahead and write one? Thanks, Victor |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
I made my own very simple script, but here's a script of James Bonfield's here:
http://seqanswers.com/forums/showthread.php?t=282 The only problem ithat I see is this line foreach (glob("$fn/*seq.txt")) { which is going to get every single .seq in the directory, not just the ones from a single lane. So you'll have to fix that. |
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
Victor,
Run GERALD including the following line in the GERALD configuration file: QF_PARAMS '(1==1)' This is a conditional which is true 100% of the time; in other words, GERALD passes every read. (This technique comes from the Pipeline User Guide) Last edited by kmcarr; 10-07-2008 at 10:44 AM. Reason: To correct line and add attribution. |
![]() |
![]() |
![]() |
#17 |
Member
Location: University Park, PA Join Date: Apr 2008
Posts: 27
|
![]()
As said below (and also in the Solexa documentation), Solexa quality scores in their Fastq-like format are given by 10*log_10(P/(1-P)). I thought it might be useful for some people if I posted a lookup table based on this. Note I'm giving the probability that a base is erroneous, rounded to four decimal places. Please post a reply if you think this table is an incorrect translation:
Char ASCII Char-64 P(error) ; 59 -5 0.7597 < 60 -4 0.7153 = 61 -3 0.6661 > 62 -2 0.6131 ? 63 -1 0.5573 @ 64 0 0.5000 A 65 1 0.4427 B 66 2 0.3869 C 67 3 0.3339 D 68 4 0.2847 E 69 5 0.2403 F 70 6 0.2008 G 71 7 0.1663 H 72 8 0.1368 I 73 9 0.1118 J 74 10 0.0909 K 75 11 0.0736 L 76 12 0.0594 M 77 13 0.0477 N 78 14 0.0383 O 79 15 0.0307 P 80 16 0.0245 Q 81 17 0.0196 R 82 18 0.0156 S 83 19 0.0124 T 84 20 0.0099 U 85 21 0.0079 V 86 22 0.0063 W 87 23 0.0050 X 88 24 0.0040 Y 89 25 0.0032 Z 90 26 0.0025 [ 91 27 0.0020 \ 92 28 0.0016 ] 93 29 0.0013 ^ 94 30 0.0010 _ 95 31 0.0008 ` 96 32 0.0006 a 97 33 0.0005 b 98 34 0.0004 c 99 35 0.0003 d 100 36 0.0003 e 101 37 0.0002 f 102 38 0.0002 g 103 39 0.0001 h 104 40 0.0001 |
![]() |
![]() |
![]() |
#18 |
Member
Location: US Join Date: Feb 2008
Posts: 13
|
![]()
Hello,
We are looking a little closer at the quality of one of our runs. Interestingly, we see a pattern in most of our runs right at the 30th cycle. The information from the graph below comes from the s_N_export.txt files. Please ignore the graph from lane 4. This was a failed lane. The others however, including our control (lane 8) show this pattern. This was an IPAR run with the upgraded GAII and was one of our best runs. Other runs also show this pattern at the 30th cycle. Does anyone know the reason why the qualities drop so much after the 30th cycle? Have you seem this before in any of your runs? Thanks in advance. Victor ![]() Last edited by vruotti; 10-10-2008 at 02:52 PM. |
![]() |
![]() |
![]() |
#19 | |
Member
Location: Riverside, CA Join Date: Oct 2008
Posts: 13
|
![]() Quote:
Take a look at the un-normalized scores (s_<lane>_qraw.txt) instead, I think you'll find that the curve is more continuous between cycles. |
|
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: Canada Join Date: Jun 2008
Posts: 103
|
![]()
fastq file:
@I326_2_FC306FCAAXX:8:1:50:985 ATGTCCGAAGGGCAGTCTCAAGTGGTAAAATGGAT +I326_2_FC306FCAAXX:8:1:50:985 hhhWhhhchhhhhahShh\PO]LgXZXPNLUTZNO MAQ alignment output: I326_2_FC306FCAAXX:8:1:50:985 1 1 + 0 0 99 99 99 0 0 1 0 35 ATGTCCGAAGGGCAGTCTCAAGTGGTAAAATGGAT ```W```````````S``\PO]L`XZXPNLUTZNO what's the meaning of ```W```````````S``\PO]L`XZXPNLUTZNO ? not the same as hhhWhhhchhhhhahShh\PO]LgXZXPNLUTZNO |
![]() |
![]() |
![]() |
Thread Tools | |
|
|