SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
HAPS: Hybrid Assembly Pipeline with SOLiD System relipmoc SOLiD 1 01-13-2012 12:47 AM
Did Solid change their primer system? SongLi Bioinformatics 1 12-20-2010 03:56 PM
mRNA-Sequencing Whole Transcriptome Analysis of a Single Cell on the SOLiD System GeeSharpMinor Literature Watch 0 12-02-2009 03:44 PM
GC bias in Solid system? yamayaya SOLiD 2 07-28-2009 10:12 AM
Does anyone have a SOLID system running terabase SOLiD 5 05-23-2008 07:07 AM

Reply
 
Thread Tools
Old 04-26-2013, 04:27 AM   #1
christear
Junior Member
 
Location: shanghai

Join Date: Apr 2013
Posts: 2
Unhappy format of AB SOLiD 4 System sequencing output

To validate one of my hypothesis, I've downloaded some public data from EMBL-EBI ENA (European Nucleotide Archive) (http://www.ebi.ac.uk/ena/).
The data is from a paper published in Nature structural & molecular biology in 2011. It was generated by AB SOLID 4 System.
As described in the ENA for this data set, the Fastq files are available both via ftp or galaxy.
The problem is that , I found that the fastq file that I downloaded is so wired
and I have never faced this before. Details are showed as following.

###Eg. 09_public_data$ less ERR042386.fastq

@ERR042386.1 solid0032_385_1_4_20100830_FRAG
T32120132000132211310023202201202002303130332322311
+
!@%62B8?=A690@>><->8=51%:==5521=582<@9>9><,6785.>4&

Generally, in a classic fastq format file, first line is begin with "@", 2nd line is the sequence of reads, 3rd line is a "+" and 4th line is the quality.
However in these fastq files, the sequence of reads are some numbers ("0,1,2,3"). I really have no idea what does it means ...

Is that ("0,1,2,3") represent ("A,G,C,T") respectively ?
or is it a unique format for ABI solid sequence output format ?

Does someone have experience to deal with this kind of data ?
All suggestions are appreciated ...

christear is offline   Reply With Quote
Old 04-26-2013, 04:48 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default format of AB SOLiD 4 System sequencing output

It's a unique format for SOLiD, what you're seeing is the sequence in colorspace.

SOLiD uses a dibase encoding system, where each color represents a sequence of two bases.

Have a look at some of the manuals on the Life Technologies website,

http://www.appliedbiosystems.com/abs...printable.html
mastal is offline   Reply With Quote
Old 04-26-2013, 04:58 AM   #3
christear
Junior Member
 
Location: shanghai

Join Date: Apr 2013
Posts: 2
Default

Thanks a lot ...
I know SOLID using dinucleotides enconding the sequence.
However, what i downloaded is already fastq format file, at least it should be converted to AGCT...
I have analyzed solid data before whereas it's the normal sequence in Fastq format file...
Anyway, thanks a lot ... Do you know some tools to do the convention?
christear is offline   Reply With Quote
Old 04-26-2013, 07:23 AM   #4
JPC
Senior Member
 
Location: Wales

Join Date: May 2008
Posts: 114
Default

With Solid data you need to do the mapping in base-space, not convert the fastq and map that. If you're not familiar with it I would recommend tracking down an expert.

I don't know if it is still available but I think the Life Tech software was called BioScope, they now have 'LifeScope' but I don't know if that is good for v4 machines.
JPC is offline   Reply With Quote
Reply

Tags
ena, fastq, sequecne, solid 4

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO