To validate one of my hypothesis, I've downloaded some public data from EMBL-EBI ENA (European Nucleotide Archive) (http://www.ebi.ac.uk/ena/).
The data is from a paper published in Nature structural & molecular biology in 2011. It was generated by AB SOLID 4 System.
As described in the ENA for this data set, the Fastq files are available both via ftp or galaxy.
The problem is that , I found that the fastq file that I downloaded is so wired
and I have never faced this before. Details are showed as following.
###Eg. 09_public_data$ less ERR042386.fastq
@ERR042386.1 solid0032_385_1_4_20100830_FRAG
T32120132000132211310023202201202002303130332322311
+
!@%62B8?=A690@>><->8=51%:==5521=582<@9>9><,6785.>4&
Generally, in a classic fastq format file, first line is begin with "@", 2nd line is the sequence of reads, 3rd line is a "+" and 4th line is the quality.
However in these fastq files, the sequence of reads are some numbers ("0,1,2,3"). I really have no idea what does it means ...
Is that ("0,1,2,3") represent ("A,G,C,T") respectively ?
or is it a unique format for ABI solid sequence output format ?
Does someone have experience to deal with this kind of data ?
All suggestions are appreciated ...
The data is from a paper published in Nature structural & molecular biology in 2011. It was generated by AB SOLID 4 System.
As described in the ENA for this data set, the Fastq files are available both via ftp or galaxy.
The problem is that , I found that the fastq file that I downloaded is so wired
and I have never faced this before. Details are showed as following.
###Eg. 09_public_data$ less ERR042386.fastq
@ERR042386.1 solid0032_385_1_4_20100830_FRAG
T32120132000132211310023202201202002303130332322311
+
!@%62B8?=A690@>><->8=51%:==5521=582<@9>9><,6785.>4&
Generally, in a classic fastq format file, first line is begin with "@", 2nd line is the sequence of reads, 3rd line is a "+" and 4th line is the quality.
However in these fastq files, the sequence of reads are some numbers ("0,1,2,3"). I really have no idea what does it means ...
Is that ("0,1,2,3") represent ("A,G,C,T") respectively ?
or is it a unique format for ABI solid sequence output format ?
Does someone have experience to deal with this kind of data ?
All suggestions are appreciated ...
Comment