Seqanswers Leaderboard Ad

**SillyPoint** · 04-17-2009, 06:30 AM

I didn't know Gerald could produce fastq files directly. We use a perl script to extract information from the *_ub_custom_qseq.txt files produced by Gerald and convert it to fastq format (discarding the non-PF reads in the process). The ascii scores in the qseq files are scaled by 64.

Can you post the Gerald config file you used to create the fastq?

SillyPoint

**cbrennan** · 04-17-2009, 10:45 AM

Gerald can generate fasta, fastq, or scarf (default) files.

for fastq files put the line:

12345678:SEQUENCE_FORMAT --fastq

in your Gerald config file.

Christine

**Sylphide** · 02-25-2011, 02:54 AM

I looked for the meaning of illumina quality scores and couldn't find any direct translation so here it is (in case it is of any use to someone else)

Illumina quality score dictionary :

ASCII / numeric / base probability to be wrong
@ 0 1
A 1 0.7943282347
B 2 0.6309573445
C 3 0.5011872336
D 4 0.3981071706
E 5 0.316227766
F 6 0.2511886432
G 7 0.1995262315
H 8 0.1584893192
I 9 0.1258925412
J 10 0.1
K 11 0.0794328235
L 12 0.0630957344
M 13 0.0501187234
N 14 0.0398107171
O 15 0.0316227766
P 16 0.0251188643
Q 17 0.0199526231
R 18 0.0158489319
S 19 0.0125892541
T 20 0.01
U 21 0.0079432823
V 22 0.0063095734
W 23 0.0050118723
X 24 0.0039810717
Y 25 0.0031622777
Z 26 0.0025118864
[ 27 0.0019952623
\ 28 0.0015848932
] 29 0.0012589254
^ 30 0.001
_ 31 0.0007943282
` 32 0.0006309573
a 33 0.0005011872
b 34 0.0003981072
c 35 0.0003162278
d 36 0.0002511886
e 37 0.0001995262
f 38 0.0001584893
g 39 0.0001258925
h 40 0.0001
i 41 7.94328234724282E-005
j 42 6.30957344480193E-005
k 43 5.01187233627272E-005
l 44 3.98107170553497E-005
m 45 3.16227766016837E-005
n 46 2.51188643150957E-005
o 47 1.99526231496888E-005
p 48 1.58489319246111E-005
q 49 1.25892541179417E-005
r 50 0.00001
s 51 7.94328234724281E-006
t 52 6.30957344480192E-006
u 53 5.01187233627272E-006
v 54 3.98107170553497E-006
w 55 3.16227766016838E-006
x 56 2.51188643150958E-006
y 57 1.99526231496888E-006
z 58 1.58489319246111E-006
{ 59 1.25892541179417E-006
| 60 0.000001
} 61 7.9432823472428E-007
~ 62 0.000000631

**amitm** · 02-25-2011, 01:18 PM

for converting SCARF format to fastq

Originally posted by Sylphide View Post

I looked for the meaning of illumina quality scores and couldn't find any direct translation so here it is (in case it is of any use to someone else)

Illumina quality score dictionary :

text illumina_score
@ 0
A 1
B 2
.
.
.

hello Sylphide,
Just to reconfirm. Can I use this conversion table to convert quality score in SCARF ASCII format to SCARF numeric, so that I can then use 'fq_all2std.pl' (from Maq site) to generate standard fastq format. The script assumes the quality score in .scarf file to be in numeric form whereas I have the files with scores in ASCII form.
I'm a beginner in sequencing data analysis. Kindly help out

thanks

**Sylphide** · 02-28-2011, 12:54 AM

hello
I'm also a beginner but I'll try to help.
You can use the conversion table I wrote to convert ASCII to numeric if you want to program it yourself. There must be some tool to make the conversion automatically but I couldn't find any.

ps : I added the probability for a base to be wrong in my previous message.

**amitm** · 03-01-2011, 12:09 AM

hello Sylphide,
I cleared my confusion from here. Basically what I understood is Solexa quality in ASCII is encoded with an offset of 33 whereas Illumina 1.3+ quality has an offset of 64. Now I can parse the .scarf file if I have to.
There are many tools to convert between qualities, but I know of only one which is free and accepts .scarf input. Thats the "fq_all2std.pl" from Maq site.
thanks anyways! I started hunt around about quality encoding from your post :-)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Illumina quality scores

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News