SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange Illumina fastq header chariko Bioinformatics 2 07-27-2016 03:59 AM
how to convert general fastq to fastq int format? feng Bioinformatics 21 07-03-2014 11:40 PM
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 01:26 AM
strange Illumina txt format m_elena_bioinfo Bioinformatics 11 10-03-2013 08:15 AM
Question about using sra_toolkit to transform the SRA format into FASTQ format areyousad Bioinformatics 0 05-16-2010 10:56 PM

Reply
 
Thread Tools
Old 05-19-2017, 09:04 AM   #1
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 55
Default Help!, strange fastq format

Dear All

I download some fastq reads from SRA, but the format is wired. The reads are all number and dots.
Please give me some pointers if you have seen this before? Thank you in advance.

example:
Code:
@SRR1175538.1 1_11_419_F5-RNA length=35
G.12.1.21.2.20.1.3013..321..20233033
+SRR1175538.1 1_11_419_F5-RNA length=35
!!@@!@!@@!@!@@!@!@@@@!!@@@!!@@@?4<@/
@SRR1175538.2 1_12_62_F5-RNA length=35
G.02.0.01.2301.1.1222.1333.111102223
+SRR1175538.2 1_12_62_F5-RNA length=35
!!@@!@!@@!@@@=!@!@@@@!@@2@!@?>?@@@@@
@SRR1175538.3 1_12_580_F5-RNA length=35
G.10.0.13.0032.2.3123.1121.022330003
+SRR1175538.3 1_12_580_F5-RNA length=35
!!<@!@!@?!@@@8!@!@2@@!@;@@!@;@@@2@@@
@SRR1175538.4 1_12_1917_F5-RNA length=35
G.00.2.12.3.10.1.1133.2333.100133010
+SRR1175538.4 1_12_1917_F5-RNA length=35
!!@@!@!@@!@!@@!@!@@@@!@@?@!@@@@@@@@@
@SRR1175538.5 1_13_1110_F5-RNA length=35
G132.2120.22320023122.3200.320102100
+SRR1175538.5 1_13_1110_F5-RNA length=35
!@@@!@?@A!5=833052/3-!@?.2!@-55/@6.?
@SRR1175538.6 1_13_1767_F5-RNA length=35
G002.3231.03320230131.1020.131301132
+SRR1175538.6 1_13_1767_F5-RNA length=35
!@@@!@@8@!@@//76*?<0@!@20@!@62/@@=-?
@SRR1175538.7 1_14_646_F5-RNA length=35
G232.1032.21300133222.2022.233022033
+SRR1175538.7 1_14_646_F5-RNA length=35
!@@@!@@@@!@@@@<@@@@@@!@@@/!@@;@@@/=@
@SRR1175538.8 1_15_1325_F5-RNA length=35
G331.2201.30111132130.0011.202011230
+SRR1175538.8 1_15_1325_F5-RNA length=35
!@@=!@*@@!@@@@@@@@@@>!@@@@!@@@?@@@?@
@SRR1175538.9 1_15_1850_F5-RNA length=35
G110.3000.03303201101.1013.001113220
+SRR1175538.9 1_15_1850_F5-RNA length=35
!:@0!9:@*!=2;.6/6@.68!@*0:!@*.@;@.3:

Last edited by GenoMax; 05-19-2017 at 09:23 AM.
illuminaGA is offline   Reply With Quote
Old 05-19-2017, 09:20 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 658
Default

It's data from the SOLiD platform.

They use colorspace rather than basespace,
so you get a sequence of 0,1,2,3 rather than A,T,G,C.

The . indicate missing base calls.
mastal is offline   Reply With Quote
Old 05-19-2017, 10:01 AM   #3
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 55
Default

Thank you.

Quote:
Originally Posted by mastal View Post
It's data from the SOLiD platform.

They use colorspace rather than basespace,
so you get a sequence of 0,1,2,3 rather than A,T,G,C.

The . indicate missing base calls.
illuminaGA is offline   Reply With Quote
Old 05-19-2017, 10:13 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,208
Default

If you want to map those to a reference:
(1) Don't. Delete the file and pretend like it never existed.
(2) No, really, I mean it. Delete the file.
(3) Okay, the reason there are so many "." in the sequence is that the SOLiD gave you reads from every bead it found, no matter how bad the data was. Also the .fastq's started at the very top of the slide, near the edge, where the data is worst.
(4) But you don't need to know this, right? Please tell me you took my advice and deleted the file?
(5) I mean, you can't just take the numbers and convert them to sequence. Any string of numbers could encode any of 4 sequences. It is the base at the beginning of each sequence that tells you the right conversion path. But if any "base" in the read was incorrect, it corrupts the conversion path. So if you did map these, you would want to use an old-time mapper in "colorspace" mode to do the mapping. BWA and Bowtie probably can still do this type of mapping.
(6) But you don't need to know that because you deleted the file, right? If not, well, I did warn you...

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-19-2017, 10:31 AM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,620
Default

Haha

But, yes, pmiguel is completely correct and I suggest you follow his advice.

Well, actually, I wrote really good software for mapping and variant-calling of Solid reads. But then I deleted them because, well, the platform was obsolete. And unfortunately, UTSW is very strict about preventing employees from providing knowledge to the rest of the world... their legal department told me very bluntly that all of the taxpayer- and grant-funded development that I did was their property, which they choose to keep secret.

So! The Solid platform is terrible. I wrote software that actually makes it useful. But I'm not allowed to share it with you.

Still, no matter how good the software is, Illumina is vastly better than Solid, so it's best to discard the Solid data and use an alternative platform.

Last edited by Brian Bushnell; 05-19-2017 at 10:55 AM.
Brian Bushnell is offline   Reply With Quote
Old 05-19-2017, 12:17 PM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 787
Default

I completely agree with pmiguel:

https://groups.google.com/d/msg/trin...c/xS3loMsNcpYJ

http://seqanswers.com/forums/showpos...56&postcount=4
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO