SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Base qualities during qseq2fastq and export2sam conversion hrajasim Illumina/Solexa 1 05-25-2011 11:20 PM
SAMTOOLS AND GATK input file (base qualities) hrajasim Illumina/Solexa 0 05-13-2011 05:58 AM
Converting ABI colorspace qualities into base qualities szilva Bioinformatics 7 04-01-2010 11:38 PM
Bowtie, Color-space reads, and confusing base qualities at variable sites keebs42 Bioinformatics 6 03-13-2010 04:51 PM
BFAST - Alignment for ABI or Illumina sequencing - with qualities nilshomer Bioinformatics 4 11-20-2009 10:35 AM

Reply
 
Thread Tools
Old 11-04-2011, 10:43 AM   #1
fongchun
Member
 
Location: Vancouver, BC

Join Date: May 2011
Posts: 55
Default Base qualities for Illumina Sequencing

Hi all,

I have a question regarding the base qualities for Illumina Sequencing that I am trying to wrap my head around. I have qseq files and according to the qseq read me file it says "quality: the calibrated quality string (encoded in ASCII as 64+score)."

I've been doing some research and Wikipedia states that Illumina has its own way of encoding base qualities, but as of Illumina 1.3+ (assuming they mean the pipeline version) Illumina has switched from using Solexa/Illumina qualities to Phred qualities. And as of Illumina1.8+, the base qualities are the same as Sanger now which is Phred+33.

I was told my qseq files were generated through the Illumina1.8+, but the base qualities are ASCII as 64+score which to me makes no sense. Since Illumina should be Phred+33 and not Phred+64. Additional, it is ambiguous because it doesn't say whether they are phred or Solexa/Illumina base qualities since both are encoded in ASCII.

Can anyone shed a light on the discrepancy?

Thanks!
fongchun is offline   Reply With Quote
Old 11-04-2011, 11:13 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

If your files were generated with illumina pipeline v.1.8.x then the quality scores should be in the sanger fastq format.

It is possible that your sequencing facility switched to using the new pipeline recently and the relevant help files may not have been updated.

Edited: Indeed. As HESmith indicates below this data must have been produced with an earlier version of pipeline if "qseq" files were produced. You should contact your sequencing facility and double-check.

Last edited by GenoMax; 11-04-2011 at 11:39 AM.
GenoMax is offline   Reply With Quote
Old 11-04-2011, 11:27 AM   #3
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Actually, it's more complicated than GenoMax indicates for CASAVA v1.8. The fastq files are indeed Phred+33, but the export files are Phred+64. If the quality scores contain any lower case characters (i.e., >96) it's Phred+64.

Also, CASAVA v1.8 doesn't produce qseq files; they've been replaced by gzipped fastq. It sounds like your data were generated with an earlier version.

Last edited by HESmith; 11-04-2011 at 11:30 AM. Reason: corrected typos
HESmith is offline   Reply With Quote
Old 11-04-2011, 11:36 AM   #4
fongchun
Member
 
Location: Vancouver, BC

Join Date: May 2011
Posts: 55
Default

Thanks for the replies. Now I am even more confused. I am not 100% sure this is CASAVA 1.8 to be exact. The response I got was:

Illumina Pipeline version 1.8.0

But I can't see how this can't be CASAVA 1.8. Yes I remember reading how qseq files aren't being produced and they should be fastq. But I see no evidence of fastq files. I do have .bam files which were aligned using bwa. But that aligner, to my understanding, does not use base qualities during the alignment.
fongchun is offline   Reply With Quote
Old 11-04-2011, 11:44 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by fongchun View Post
I was told my qseq files were generated through the Illumina1.8+, but the base qualities are ASCII as 64+score which to me makes no sense. Since Illumina should be Phred+33 and not Phred+64. Additional, it is ambiguous because it doesn't say whether they are phred or Solexa/Illumina base qualities since both are encoded in ASCII.
Three or four different varieties of q-score encoding, multiple software pipeline names and separate version numbering schemes -- Illumina has left researchers with an inscrutable problem simply trying to figure out what format their data are in.

First, there isn't really any software called Illumina 1.8. What is meant here is CASAVA 1.8. Besides the CASAVA pipeline, Illumina also has (or had) RTA (Real Time Analysis which runs on the instrument computer) and OLB (OffLine Basecalling, no longer really used). You say you have QSEQ files but Illumina has stopped using these, CASAVA 1.8 does not generate them so it seems unlikely that your QSEQs were generated by CASAVA 1.8. It is possible that they were created by OLB 1.8 (the second to last version of OLB) which did generate QSEQs by default and did use ASCII + 64 encoding of Phred scale quality scores. The Solexa Q-Scores (as opposed to Phred) haven't been used in long time so it is probably safe to assume your data is not in this format.

If you haven't already seen it look at the FASTQ article on Wikipedia, it contains a lot of useful information about the various quality encodings use by Illumina.
kmcarr is offline   Reply With Quote
Old 11-04-2011, 02:37 PM   #6
fongchun
Member
 
Location: Vancouver, BC

Join Date: May 2011
Posts: 55
Default

Thanks for the reply kmcarr. That helped a bit. I digged a little further and it appears that it was CASAVA 1.8, but actually:

Illumina 1.10.0 RTA 1.10.36.0

I don't get the version number, but it appears that it is RTA (Real Time Analysis) so it appears that maybe that explains why I am getting qseq and not fastq files. Anyone have any information on what "Illumina 1.10.0 RTA 1.10.36.0" is?
fongchun is offline   Reply With Quote
Old 11-29-2011, 12:04 AM   #7
airtime
Member
 
Location: Germany

Join Date: Jan 2011
Posts: 14
Default

Hi fongchun,

did you know which version your SCS at the Sequence-PC is?
Or how get you the qseq files?
I suppose that old versions through qseq files (and so it must be phred + 64), the new version create bcl, stats, pos and so on files which could be translate with CASAVA.
From CASAVA 1.8+ the phred is +33 (OLB is still +64).
Also you can look at the user guides for the tools, respectively.
airtime is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO