I am currently working with fastq files that originated from a pac bio instrument and were converted from their native output format to fastq by some process. This was done at a different site and i haven't been able to find out what this process was yet.
Specifically, i am interested in how the quality scores get converted to phred like scores within the quality string of the fastq files. There are tools we would like to use that expect quality scores with standard illumina format offsets. For example
in a standard illumina fastq output file like the length truncated example below, the quality score encoding of "G" for the first base resolves to 38 with an ASCII offset of 33.
@M01472:163:000000000-AA5WV:1:1101:10006:14422 1:N:0:217
ACTCGGCCCA
+
GFFHGDFEE2
ord("G") - 33 = 38
This falls within the expected quality score range of 0 - 40
With the pac bio fastq data, Im not seeing scores consistently within this range. Truncated example below:
@m140929_224119_42136_c100670242550000001823127812201400_s1_p0/32918/ccs 1 28
ATCTCAGTCC
+
qqqqqqqq=q
offset 33 ord("q") - 33 = 80 ???
offset 64 ord("q") - 64 = 49 ???
Are these scores a combination of the bases of multiple reads or is there something else about the formatting i am missing?
Specifically, i am interested in how the quality scores get converted to phred like scores within the quality string of the fastq files. There are tools we would like to use that expect quality scores with standard illumina format offsets. For example
in a standard illumina fastq output file like the length truncated example below, the quality score encoding of "G" for the first base resolves to 38 with an ASCII offset of 33.
@M01472:163:000000000-AA5WV:1:1101:10006:14422 1:N:0:217
ACTCGGCCCA
+
GFFHGDFEE2
ord("G") - 33 = 38
This falls within the expected quality score range of 0 - 40
With the pac bio fastq data, Im not seeing scores consistently within this range. Truncated example below:
@m140929_224119_42136_c100670242550000001823127812201400_s1_p0/32918/ccs 1 28
ATCTCAGTCC
+
qqqqqqqq=q
offset 33 ord("q") - 33 = 80 ???
offset 64 ord("q") - 64 = 49 ???
Are these scores a combination of the bases of multiple reads or is there something else about the formatting i am missing?
Comment