Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD

Similar Threads
Thread Thread Starter Forum Replies Last Post
HAPS: Hybrid Assembly Pipeline with SOLiD System relipmoc SOLiD 1 01-13-2012 12:47 AM
Did Solid change their primer system? SongLi Bioinformatics 1 12-20-2010 03:56 PM
mRNA-Sequencing Whole Transcriptome Analysis of a Single Cell on the SOLiD System GeeSharpMinor Literature Watch 0 12-02-2009 03:44 PM
GC bias in Solid system? yamayaya SOLiD 2 07-28-2009 10:12 AM
Does anyone have a SOLID system running terabase SOLiD 5 05-23-2008 07:07 AM

Thread Tools
Old 04-26-2013, 04:27 AM   #1
Junior Member
Location: shanghai

Join Date: Apr 2013
Posts: 2
Unhappy format of AB SOLiD 4 System sequencing output

To validate one of my hypothesis, I've downloaded some public data from EMBL-EBI ENA (European Nucleotide Archive) (
The data is from a paper published in Nature structural & molecular biology in 2011. It was generated by AB SOLID 4 System.
As described in the ENA for this data set, the Fastq files are available both via ftp or galaxy.
The problem is that , I found that the fastq file that I downloaded is so wired
and I have never faced this before. Details are showed as following.

###Eg. 09_public_data$ less ERR042386.fastq

@ERR042386.1 solid0032_385_1_4_20100830_FRAG

Generally, in a classic fastq format file, first line is begin with "@", 2nd line is the sequence of reads, 3rd line is a "+" and 4th line is the quality.
However in these fastq files, the sequence of reads are some numbers ("0,1,2,3"). I really have no idea what does it means ...

Is that ("0,1,2,3") represent ("A,G,C,T") respectively ?
or is it a unique format for ABI solid sequence output format ?

Does someone have experience to deal with this kind of data ?
All suggestions are appreciated ...

christear is offline   Reply With Quote
Old 04-26-2013, 04:48 AM   #2
Senior Member
Location: uk

Join Date: Mar 2009
Posts: 667
Default format of AB SOLiD 4 System sequencing output

It's a unique format for SOLiD, what you're seeing is the sequence in colorspace.

SOLiD uses a dibase encoding system, where each color represents a sequence of two bases.

Have a look at some of the manuals on the Life Technologies website,
mastal is offline   Reply With Quote
Old 04-26-2013, 04:58 AM   #3
Junior Member
Location: shanghai

Join Date: Apr 2013
Posts: 2

Thanks a lot ...
I know SOLID using dinucleotides enconding the sequence.
However, what i downloaded is already fastq format file, at least it should be converted to AGCT...
I have analyzed solid data before whereas it's the normal sequence in Fastq format file...
Anyway, thanks a lot ... Do you know some tools to do the convention?
christear is offline   Reply With Quote
Old 04-26-2013, 07:23 AM   #4
Senior Member
Location: Wales

Join Date: May 2008
Posts: 114

With Solid data you need to do the mapping in base-space, not convert the fastq and map that. If you're not familiar with it I would recommend tracking down an expert.

I don't know if it is still available but I think the Life Tech software was called BioScope, they now have 'LifeScope' but I don't know if that is good for v4 machines.
JPC is offline   Reply With Quote

ena, fastq, sequecne, solid 4

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:23 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO