Seqanswers Leaderboard Ad

**kmcarr** · 12-02-2009, 01:27 PM

This FASTQ file is standard Sanger quality encoding, which means take the ASCII value of each character in the quality string and subtract 33 from it. The 'highest' character you have is 'C' == 67 and the lowest is '%' == 37. These would translate to Q scores of 34 and 4 which is an expected range of Phred scores.

The quickest way to distinguish Sanger Q-score encoding (ASCII-33) from Illumina (Solexa) Q-score encoding (ASCII-64) is to look for numerals [0-9] in the quality string. The numerals have ASCII values from 48-57 so it would be non-sensical to subtract 64 from them. If there are numerals in your quality string then the Q-score encoding is Sanger.

**crinfante** · 12-02-2009, 01:51 PM

Got it -- thanks for explaining it so clearly.

**crinfante** · 12-02-2009, 01:53 PM

Got it -- thanks for explaining this so clearly.

**MoBi** · 12-10-2009, 08:07 AM

Hello,

I have a related issue, I don't know in which FASTQ format my reads are?

@XXX010005.1 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:1319:692 length=51
ACGATGTGACGTACGCGTATGCTCGTATACACACGCATGACGAGCGACGAT
+XXX010005.1 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:1319:692 length=51
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@I
@XXX10005.2 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:395:487 length=51
TTTTTCGTGTCGGCGGCCCGTCGCCTCTCCACCCCACCACACCCCCCACCC

**Jonathan** · 12-10-2009, 12:19 PM

That would be Illumina/Solexa Fastq;
But as I can't see the version of the pipeline, it's not possible to tell if this is the new linear fastq metric or the older log-score fastq metric.
The difference is small, so it shouldn't matter.

Short question:
Are all your reads of quality "IIIIII"?
Strikes me as funny and mayhaps erroneous
Best
-Jonathan

**swbarnes2** · 12-10-2009, 12:48 PM

That's a kind of old Solexa fastq format. (Old being about a year old with this application!) The characters in the quality line ranged from -5 to 40, with ! being 0 and I being 40.

Fastq format looks like this:

@read name
sequence
+read name again (or just + and nothing, to save file space)
quality score for each letter in above read

**MoBi** · 12-11-2009, 06:04 PM

Many thanks Jonathan and swbarnes2.

Yes actually frankly most of my reads are with quality values of IIII!

So I'm trying to use VAAL in order to assemble several bacterial genomes to a reference and detect SNPs, VAAL requires that I convert these FASTQ files into .fasta and .qual.

Do you know an easy way of doing that, given that I'm not an expert in bioinformatics?!

I tried this one:

404 Not Found

http://www.bugaco.com/converter/biology/sequences/index.php

But it keeps giving me errors.

Thanks a lot!

**maubp** · 01-22-2010, 05:30 AM

Originally posted by MoBi View Post

I tried this one:

404 Not Found

http://www.bugaco.com/converter/biology/sequences/index.php

But it keeps giving me errors.

Thanks a lot!

I'm pretty sure from the format names etc that this website is using Biopython internally to do the conversion. However, it looks like there is a bug in the website with quotes (which can occur in FASTQ quality strings) being "escaped" with extra slash characters. As a result, the data given to Biopython is corrupted, and the conversion fails.

You would be better off using Biopython directly (especially for large files, it would be silly to try and upload/convert/download anything bigger than a few megabytes).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

another solexa/phred question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News