SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to fix Illumina FASTQ files with read length varies errors antgomo Illumina/Solexa 5 08-19-2014 11:28 PM
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 01:26 AM
Why are Illumina paired-end SRA datasets made up of 3 FASTQ files? Bio.X2Y Illumina/Solexa 9 12-21-2010 11:36 AM
BWA mapping fastq files with Illumina quality maricu Bioinformatics 3 11-19-2010 11:18 AM
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? jjw14 Illumina/Solexa 2 06-01-2010 04:35 PM

Reply
 
Thread Tools
Old 08-06-2008, 05:12 PM   #1
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default different Illumina convention in fastq files?

We have a few different Illumina data sets here, and I'm wondering if there has been a change in the characters used in fastq files, or that we have just been given bad data.

we have some sequences for which the translation works fine:
10*log(1+10**(c - 64/10.0))/log(10)

@SLXA-EAS1_89:3:1:715:750/1
GTCTTGAAAGCTATGATGTCAAGATTAATTTAATC
+SLXA-EAS1_89:3:1:715:750/1
bbbbbbbbbbbbbbbbb^bbbbbbbbbbbbbbbba
@SLXA-EAS1_89:3:1:747:756/1
GTGTATTGCTCAATCTTCGAACGGGGGGAGGATTG
+SLXA-EAS1_89:3:1:747:756/1
bbbbb^bbbbbbbbbbbbbbbbbbbbbb\bbObbb
@SLXA-EAS1_89:3:1:859:343/1
GTTAATAGATTTAATTGCCACCGCAATACCAGCAA
+SLXA-EAS1_89:3:1:859:343/1
ccccccccccccccccbcccccccbbbbbbbYbb^

However we have other sequences where the quality is 10 or below using the same translation. Is one using Sanger center convention and the other Illumina?

@ILunknown_unknown_1_1_107_394
TATTCCCCGCCCCCCCGTCGTGCCCGGTCTTTGTCC
+
IIIIIIGI/<I8IIIIII.II//I);II)%<>III&
@ILunknown_unknown_1_1_118_376
TTGGGAAGCGCACCCGGCCCGTGTTGGCTTTCGCCT
+
I%I!I(,@+I"%",!#&)"5$'&$&%$.*%$"#"""
@ILunknown_unknown_1_1_342_174
GCGGCGGGTTATCGGGGCACTCCCCCCCTCCCGCAC
+
IIIIII2I<#?II+@CI1/CI0III5%(5'I%0-%$
@ILunknown_unknown_1_1_121_440
TGGCGTCCAGGCCGGCCTGGCTGCCGTCCGGCCCCC
+
II50I5I.&-*+9-&>2$&&3#&#"#%%"&"#&#&&
@ILunknown_unknown_1_1_109_530
TTTGCGTTAGGTAAAATGCTAGAAGCAGGTGAGCAG
+
IIIIIIIIIIIIIIIIIIIIIII&II0IIII%II0A

thanks,
-mark
mchaisso is offline   Reply With Quote
Old 08-07-2008, 07:22 AM   #2
acnoll
Member
 
Location: Kansas City

Join Date: Mar 2008
Posts: 14
Default

The top is illumina/solexa format (what you would see in s_[1-8]_sequence.txt) and the bottom is the fastq Sanger format.

To convert solexa values
perl -e 'my $qual = 10 * log(1 + 10 ** ((ord($myChar) - 64) / 10.0))/log(10); print int($qual);'

To convert fastq values
perl -e 'my $qual = ord($myChar) -33; print $qual;'

hope that helps
acnoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO