SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
RTA Beta mcnelson.phd Illumina/Solexa 0 03-02-2013 06:32 AM
SAM format. Strange encoding in QUAL field -no ASCII tomjan Bioinformatics 9 04-26-2012 08:08 AM
Does ASCII 64 to 104 eq ASCII offset 33? louis7781x Illumina/Solexa 1 08-11-2011 05:52 AM
Scs 2.9/rta 1.9 protist Bioinformatics 0 04-19-2011 03:22 AM
naive question about phred score and ASCII endcoding and Bowtie andrewj Bioinformatics 3 03-17-2011 04:30 PM

Reply
 
Thread Tools
Old 04-12-2013, 01:47 AM   #1
ymwur
Member
 
Location: Taiwan

Join Date: Nov 2012
Posts: 11
Default Illumina RTA 1.13.48 and ASCII 66 to 105

I am get Illumina sequecing reads from BGI.
They told me that the version is RTA 1.13.48 and the quality code is ASCII 66 to 105.
(a) According to the info I read for Wiki and others, generally there should be two types of quality coding for illumina: (1) ASCII 64 to 126; Illiumina 1.3+ (2) ASCII 33 to 126; Illumina 1.8+.
So, I am confused about which type of coding my data are? ASCII 64 or 33.

(b) I want to use BWA and SAMtools to analysis my data. I will do BWA first and then apply SAMtools to the output BAM file.
WBA's default setting is for ASCII 33, but the "-I" will allow data with ASCII 64.
I am wondering what option should I use?
Samtool's default is also for ASCII 33 (and "-6" in mpileup allow data with ASCII 64). I read some info show that BWA converts the SAM/BAM output to ASCII 33. So, I don't need to worry about the coding type for samtools analysis as long as the output is from BWA. Is that correct?

Thanks,

Chih
ymwur is offline   Reply With Quote
Old 04-12-2013, 04:11 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Quote:
Originally Posted by ymwur View Post
I am get Illumina sequecing reads from BGI.
They told me that the version is RTA 1.13.48 and the quality code is ASCII 66 to 105.
(a) According to the info I read for Wiki and others, generally there should be two types of quality coding for illumina: (1) ASCII 64 to 126; Illiumina 1.3+ (2) ASCII 33 to 126; Illumina 1.8+.
So, I am confused about which type of coding my data are? ASCII 64 or 33.

Thanks,

Chih
See the "Encoding" section from the Wikipedia Article on FASTQ format: http://en.wikipedia.org/wiki/FASTQ_format. I am quoting two lines from that section below.

Quote:
Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores 0 to 2 have a slightly different meaning. The values 0 and 1 are no longer used and the value 2, encoded by ASCII 66 "B", is used also at the end of reads as a Read Segment Quality Control Indicator.
Presumably the data you are receiving is in the illumina (1.5+) format and thus would start with Q-scores of ASCII 66.

Did BGI run this data in past (> year ago) and you are getting it now?
GenoMax is offline   Reply With Quote
Old 04-12-2013, 07:35 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,173
Default

Quote:
Originally Posted by ymwur View Post
I am get Illumina sequecing reads from BGI.
They told me that the version is RTA 1.13.48 and the quality code is ASCII 66 to 105.
(a) According to the info I read for Wiki and others, generally there should be two types of quality coding for illumina: (1) ASCII 64 to 126; Illiumina 1.3+ (2) ASCII 33 to 126; Illumina 1.8+.
So, I am confused about which type of coding my data are? ASCII 64 or 33.
The various version number above refer to different parts of the whole software stack. RTA (Real Time Analysis) is the on instrument software which is responsible for taking the data from the raw images to Illumina's BCL format files. BCL files are a proprietary Illumina binary file containing base calls and quality scores. The current, latest version of RTA is 1.13.48. The version of RTA has no bearing on which ASCII offset is used for quality scores. The ASCII character encoding of quality scores is strictly a feature of the FASTQ file format and RTA does not produce FASTQ files.

BCL files are most commonly converted to FASTQ files by Illumina's CASAVA pipeline. The current version of CASAVA is 1.8.2. It is this version number which is referred to in the FASTQ Wikipedia article as the "Illumina" version. (For completeness sake this is because Illumina used to call this toolset by names other than CASAVA while maintaining the same version number progression so for simplicity they are all referred to as "Illumina".) If, as BGI says your FASTQ files will have quality scores encoded by ASCII characters in the range of 66-105 then it would appear that they are still using a v1.5 toolset. The information in the Wikipedia article is a little misleading with regards to v1.5. The quality values may include '2' (ASCII 66 == 'B') meant to indicate a base whose quality score could not be accurately determined, and '41' (ASCII 105 == 'i'). So the true ASCII range for v1.5 is 66-105. (Why BGI continues to use this older version is a separate question.)

But regardless of what BGI (or any other service provider) tells you the encoding is you can (and probably should) check the FASTQ files yourself to determine what the offset is. Look at a few lines from the file to see what range of characters are present in the quality strings and compare those to the ranges shown in the Wiki article. If you see any digits in the quality string that is a clear indication of Phred-33 scores. By contrast if you see any of the lowercase letters in the range a-i that indicates Phred-64.

Quote:
(b) I want to use BWA and SAMtools to analysis my data. I will do BWA first and then apply SAMtools to the output BAM file.
WBA's default setting is for ASCII 33, but the "-I" will allow data with ASCII 64.
I am wondering what option should I use?
If your FASTQ files are indeed in the v1.5 format then yes, you should use the -I option with BWA.

Quote:
Samtool's default is also for ASCII 33 (and "-6" in mpileup allow data with ASCII 64). I read some info show that BWA converts the SAM/BAM output to ASCII 33. So, I don't need to worry about the coding type for samtools analysis as long as the output is from BWA. Is that correct?
The SAM/BAM file format standard is to represent qualities with Phred-33 offsets so you are correct. Once you have a BAM file produced by BWA you should expect the quality values in it to be Phred-33.
kmcarr is offline   Reply With Quote
Old 04-12-2013, 07:43 AM   #4
ymwur
Member
 
Location: Taiwan

Join Date: Nov 2012
Posts: 11
Default

GenoMax, thanks for your info.
You should be right. I just run fatsQC for my reads and it suggests the reads is in illumina 1.5.
I get the reads late last year (2012) and this year from BGI. I was expecting it should be in newest format of illumina.

My concern is that I am using BWA to map these reads. BWA's default setting is for illumina 1.8+(ASCII 33), but it also allows illumina 1.3+ (ASCII 64) if I use the "-I" option for its "aln" function. Should I identify my read as illumina 1.3+ (ASCII 64) for BWA since there is no option for illumina 1.5?

Can someone give me advise?
Thanks a lot,

Chih
ymwur is offline   Reply With Quote
Old 04-12-2013, 07:49 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Quote:
Originally Posted by ymwur View Post
GenoMax, thanks for your info.
You should be right. I just run fatsQC for my reads and it suggests the reads is in illumina 1.5.
I get the reads late last year (2012) and this year from BGI. I was expecting it should be in newest format of illumina.

My concern is that I am using BWA to map these reads. BWA's default setting is for illumina 1.8+(ASCII 33), but it also allows illumina 1.3+ (ASCII 64) if I use the "-I" option for its "aln" function. Should I identify my read as illumina 1.3+ (ASCII 64) for BWA since there is no option for illumina 1.5?

Can someone give me advise?
Thanks a lot,

Chih
As stated by kmcarr in the post above you should use -I with BWA.
GenoMax is offline   Reply With Quote
Old 04-12-2013, 07:53 AM   #6
ymwur
Member
 
Location: Taiwan

Join Date: Nov 2012
Posts: 11
Default

kmcarr and GenoMax,

Thank you for your clear explanations.
Now I know what to do.

Chih
ymwur is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO