Unconfigured Ad

**mastal** · 05-19-2017, 09:20 AM

It's data from the SOLiD platform.

They use colorspace rather than basespace,
so you get a sequence of 0,1,2,3 rather than A,T,G,C.

The . indicate missing base calls.

**illuminaGA** · 05-19-2017, 10:01 AM

Thank you.

Originally posted by mastal View Post

It's data from the SOLiD platform.

They use colorspace rather than basespace,
so you get a sequence of 0,1,2,3 rather than A,T,G,C.

The . indicate missing base calls.

**pmiguel** · 05-19-2017, 10:13 AM

If you want to map those to a reference:
(1) Don't. Delete the file and pretend like it never existed.
(2) No, really, I mean it. Delete the file.
(3) Okay, the reason there are so many "." in the sequence is that the SOLiD gave you reads from every bead it found, no matter how bad the data was. Also the .fastq's started at the very top of the slide, near the edge, where the data is worst.
(4) But you don't need to know this, right? Please tell me you took my advice and deleted the file?
(5) I mean, you can't just take the numbers and convert them to sequence. Any string of numbers could encode any of 4 sequences. It is the base at the beginning of each sequence that tells you the right conversion path. But if any "base" in the read was incorrect, it corrupts the conversion path. So if you did map these, you would want to use an old-time mapper in "colorspace" mode to do the mapping. BWA and Bowtie probably can still do this type of mapping.
(6) But you don't need to know that because you deleted the file, right? If not, well, I did warn you...

--
Phillip

**Brian Bushnell** · 05-19-2017, 10:31 AM

Haha

But, yes, pmiguel is completely correct and I suggest you follow his advice.

Well, actually, I wrote really good software for mapping and variant-calling of Solid reads. But then I deleted them because, well, the platform was obsolete. And unfortunately, UTSW is very strict about preventing employees from providing knowledge to the rest of the world... their legal department told me very bluntly that all of the taxpayer- and grant-funded development that I did was their property, which they choose to keep secret.

So! The Solid platform is terrible. I wrote software that actually makes it useful. But I'm not allowed to share it with you.

Still, no matter how good the software is, Illumina is vastly better than Solid, so it's best to discard the Solid data and use an alternative platform.

**gringer** · 05-19-2017, 12:17 PM

I completely agree with pmiguel:

https://groups.google.com/d/msg/trinityrnaseq-users/HUOmE-3JgSc/xS3loMsNcpYJ

SOLiD seq process: Covert colorspace to basespace - SEQanswers

http://seqanswers.com/forums/showpost.php?p=59156&postcount=4

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 51 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Help!, strange fastq format

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News