SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina FASTQ Quality Scores - Missing Value Bio.X2Y Bioinformatics 24 08-29-2013 07:01 AM
Ideas on collecting quality scores per base in an illumina fastq file brachysclereid Bioinformatics 11 12-05-2011 01:00 PM
GATK base quality recalibration suppose to keep old and new quality scores? Heisman Bioinformatics 2 10-21-2011 07:40 AM
Illumina quality scores ewilbanks Bioinformatics 3 11-10-2010 08:52 AM
Illumina 1.3 v 1.8 quality scores Graham Etherington Bioinformatics 1 10-18-2010 07:00 AM

Reply
 
Thread Tools
Old 04-15-2009, 01:21 PM   #1
dlepp
Junior Member
 
Location: Canada

Join Date: Mar 2009
Posts: 5
Default Illumina quality scores

I wonder if someone with more intimate knowledge of the Solexa pipeline could shed some light on the different varieties of quality scores produced and how they relate to one another. Just to be clear, I'm not referring to the difference b/n Solexa and Phred scores or conversion to ascii. From my limited knowledge, there appear to be at least two types of Q-scores produced by the pipeline: intensity-based (found in .prb files from Bustard) and alignment based (found in fastq files from Gerald). There also seems to be some kind of quality calibration going on (using a "precalculated calibration table"?).
To give some context, I am working with paired-end reads from a bacterial genome using the v1.3 pipeline. I am finding the fastq quality scores are much lower than those from the .prb files (almost entirely Q22 compared to Q40). I'm wondering which scores better represent the quality and why Q22 would be so over-represented in the fastq.

Thanks!

BTW, here is a snippet of my fastq file in case I my interpretation is wrong:

@Paired_run:7:1:305:1931/1
GAAATAGATGAAGATTTAATTATTGCTCCTAAAT
+Paired_run:7:1:305:1931/1
VVVVVVVVVVVVVVVVVVVVVVVVUVVVVVUUUU
@Paired_run:7:1:315:1920/1
GACTAAACTGTAGCAATGGTTTAAATGATGATCT
+Paired_run:7:1:315:1920/1
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVUUUUU
@Paired_run:7:1:341:1932/1
GCTAATGATGTTCTTGATAATTTAAACAAAATTG
+Paired_run:7:1:341:1932/1
VVVVVVVVVVVVVVVUVVVVVVVVVVVVVVUUUS
@Paired_run:7:1:302:1939/1
GAAATAGATGAAGATTTAATTATTGCTCCTAAAT
+Paired_run:7:1:302:1939/1
VVVVVVVVVVVVVVVVVVVVVVVVUVVVVVUUUU
@Paired_run:7:1:212:1540/1
GTTAGAATTAATCAAATTGTATGGATGTGTGTAG
+Paired_run:7:1:212:1540/1
VUVVVVVVVVVVVVVVVVVUVVUUVVRVSVRUUS
@Paired_run:7:1:173:757/1
GTAGACGTATCAGGAGTTTCTAAAGGTAAGGGAT
+Paired_run:7:1:173:757/1
VVVVVVVVVVVUVVVVVVVVVVVVVUVVVVUUUU
dlepp is offline   Reply With Quote
Old 04-17-2009, 06:30 AM   #2
SillyPoint
Member
 
Location: Frederick MD, USA

Join Date: May 2008
Posts: 39
Default

I didn't know Gerald could produce fastq files directly. We use a perl script to extract information from the *_ub_custom_qseq.txt files produced by Gerald and convert it to fastq format (discarding the non-PF reads in the process). The ascii scores in the qseq files are scaled by 64.

Can you post the Gerald config file you used to create the fastq?

SillyPoint
SillyPoint is offline   Reply With Quote
Old 04-17-2009, 10:45 AM   #3
cbrennan
Member
 
Location: Ann Arbor

Join Date: Dec 2008
Posts: 28
Smile

Gerald can generate fasta, fastq, or scarf (default) files.

for fastq files put the line:

12345678:SEQUENCE_FORMAT --fastq

in your Gerald config file.

Christine
cbrennan is offline   Reply With Quote
Old 02-25-2011, 01:54 AM   #4
Sylphide
Member
 
Location: France

Join Date: Feb 2011
Posts: 11
Default

I looked for the meaning of illumina quality scores and couldn't find any direct translation so here it is (in case it is of any use to someone else)

Illumina quality score dictionary :

ASCII / numeric / base probability to be wrong
@ 0 1
A 1 0.7943282347
B 2 0.6309573445
C 3 0.5011872336
D 4 0.3981071706
E 5 0.316227766
F 6 0.2511886432
G 7 0.1995262315
H 8 0.1584893192
I 9 0.1258925412
J 10 0.1
K 11 0.0794328235
L 12 0.0630957344
M 13 0.0501187234
N 14 0.0398107171
O 15 0.0316227766
P 16 0.0251188643
Q 17 0.0199526231
R 18 0.0158489319
S 19 0.0125892541
T 20 0.01
U 21 0.0079432823
V 22 0.0063095734
W 23 0.0050118723
X 24 0.0039810717
Y 25 0.0031622777
Z 26 0.0025118864
[ 27 0.0019952623
\ 28 0.0015848932
] 29 0.0012589254
^ 30 0.001
_ 31 0.0007943282
` 32 0.0006309573
a 33 0.0005011872
b 34 0.0003981072
c 35 0.0003162278
d 36 0.0002511886
e 37 0.0001995262
f 38 0.0001584893
g 39 0.0001258925
h 40 0.0001
i 41 7.94328234724282E-005
j 42 6.30957344480193E-005
k 43 5.01187233627272E-005
l 44 3.98107170553497E-005
m 45 3.16227766016837E-005
n 46 2.51188643150957E-005
o 47 1.99526231496888E-005
p 48 1.58489319246111E-005
q 49 1.25892541179417E-005
r 50 0.00001
s 51 7.94328234724281E-006
t 52 6.30957344480192E-006
u 53 5.01187233627272E-006
v 54 3.98107170553497E-006
w 55 3.16227766016838E-006
x 56 2.51188643150958E-006
y 57 1.99526231496888E-006
z 58 1.58489319246111E-006
{ 59 1.25892541179417E-006
| 60 0.000001
} 61 7.9432823472428E-007
~ 62 0.000000631

Last edited by Sylphide; 02-27-2011 at 11:57 PM.
Sylphide is offline   Reply With Quote
Old 02-25-2011, 12:18 PM   #5
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default for converting SCARF format to fastq

Quote:
Originally Posted by Sylphide View Post
I looked for the meaning of illumina quality scores and couldn't find any direct translation so here it is (in case it is of any use to someone else)

Illumina quality score dictionary :

text illumina_score
@ 0
A 1
B 2
.
.
.
hello Sylphide,
Just to reconfirm. Can I use this conversion table to convert quality score in SCARF ASCII format to SCARF numeric, so that I can then use 'fq_all2std.pl' (from Maq site) to generate standard fastq format. The script assumes the quality score in .scarf file to be in numeric form whereas I have the files with scores in ASCII form.
I'm a beginner in sequencing data analysis. Kindly help out
thanks
amitm is offline   Reply With Quote
Old 02-27-2011, 11:54 PM   #6
Sylphide
Member
 
Location: France

Join Date: Feb 2011
Posts: 11
Default

hello
I'm also a beginner but I'll try to help.
You can use the conversion table I wrote to convert ASCII to numeric if you want to program it yourself. There must be some tool to make the conversion automatically but I couldn't find any.

ps : I added the probability for a base to be wrong in my previous message.
Sylphide is offline   Reply With Quote
Old 02-28-2011, 11:09 PM   #7
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

hello Sylphide,
I cleared my confusion from here. Basically what I understood is Solexa quality in ASCII is encoded with an offset of 33 whereas Illumina 1.3+ quality has an offset of 64. Now I can parse the .scarf file if I have to.
There are many tools to convert between qualities, but I know of only one which is free and accepts .scarf input. Thats the "fq_all2std.pl" from Maq site.
thanks anyways! I started hunt around about quality encoding from your post :-)
amitm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO