SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOLiD sequencing protocol and how to obtain quality of SOLiD reads? anibhax SOLiD 4 02-02-2014 08:41 PM
CNV analysis on Illumina / Solid genomes rcorbett Bioinformatics 3 07-06-2012 07:27 AM
PubMed: SOLiD pyrosequencing of four Vibrio vulnificus genomes enables comparative ge Newsbot! Literature Watch 2 10-01-2010 12:03 PM
Convert Solid BAM/SAM to Solid GFF? ioannis Bioinformatics 0 07-20-2010 07:06 AM
using BWA to align SOLiD fastq files from 1000 Genomes tgenahmet Bioinformatics 1 10-15-2009 05:19 PM

Reply
 
Thread Tools
Old 05-20-2016, 10:38 AM   #1
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 219
Default SOLiD for Genomes

Does SOLiD work well for genomes with a lot of repeats? Theoretically it should, but in practice?

Thanks,
cement_head is offline   Reply With Quote
Old 05-23-2016, 08:21 AM   #2
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

No. Besides that it is obsolete it gave far too short reads.
Chipper is offline   Reply With Quote
Old 05-23-2016, 09:17 AM   #3
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 219
Default

Hello,

It is not obsolete - Complete Genomics (BGI) use sequencing-by-ligation?

URL: http://bgi-international.com/service...her-platforms/

-Andor
cement_head is offline   Reply With Quote
Old 05-23-2016, 02:34 PM   #4
cmbetts
Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 99
Default

They may both use sequencing by ligation, but SOLiD and Complete Genomics are different technologies. As far as I can tell, SOLiD has been discontinued, having been beaten by Illumina and replace by Ion Torrent long ago.
Either would still be inappropriate for de novo genome sequencing. Complete has always been exclusively for human genome resequencing, and the colorspace reads of SOLiD were best when a reference was available because sequencing errors introduced frameshifts in the base encoding.
cmbetts is offline   Reply With Quote
Old 05-24-2016, 02:42 AM   #5
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 403
Default

There are still quite a few SOLiDs out there, see for example this data just into the SRA:

http://www.ncbi.nlm.nih.gov/sra/ERX1488475[accn]

Raw read accuracy is excellent, but keep in mind paired end reads do not really work at all (R1 was ~ 75 bp, 60bp after trimming, and R2 was just pure rubbish).

A 60bp SE read is too short to place accurately in many/most genomes. Also de novo assembly simply does not work, which rules out all other than resequencing applications (you need a very good reference genome too).
colindaven is offline   Reply With Quote
Old 05-24-2016, 09:12 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

My experience with Solid 4 was that it had terrible accuracy... on both read 1 and read 2.
Brian Bushnell is offline   Reply With Quote
Old 05-24-2016, 11:43 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by colindaven View Post
A 60bp SE read is too short to place accurately in many/most genomes.
Going off the topic here (which is that the SOLiD is not good for denovo work) I wonder where you get that statement. It seems to me that 60 quality bases would be enough to place accurately except for long repeat regions (e.g., LTRs).
westerman is offline   Reply With Quote
Old 05-25-2016, 12:00 AM   #8
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 403
Default

@westerman

It wasn't clear from the start whether the topic was de novo or reference based assembly.

Have a look at the genome mappability score which came out of Mike Schatz's lab as one example (http://bioinformatics.oxfordjournals...8/16/2097.full).

Even with 100bp perfect simulated single reads there are regions which cannot be mapped to reliably. Therefore, 60 bp reads containing errors won't be so nice to deal with. I remember working on human twin genomes and getting ~40-50,000 differences in VCF despite various SNP callers and stringent mapping quality filters.

http://bioinformatics.oxfordjournals...expansion.html

By the way, I work on plant genomes, and repetitive regions can be > 80%, so I thought the original poster might have similar issues.
colindaven is offline   Reply With Quote
Old 06-30-2016, 11:40 AM   #9
RickC7
Member
 
Location: Baton Rouge, Louisiana

Join Date: Feb 2010
Posts: 30
Default

Reagent support for SOLiD until May2017 or sooner per demand.

We use/used SOLiD for SAGE, great for short reads but more expensive than Illumina runs. Converting everything over to Illumina adapters now...

The couple times we did targeted reseq or whole transciptome, reverse read quality was bad.
RickC7 is offline   Reply With Quote
Old 07-01-2016, 05:38 AM   #10
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 219
Default

Ok, thanks
cement_head is offline   Reply With Quote
Old 07-02-2016, 06:07 AM   #11
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 824
Default

Quote:
Originally Posted by westerman View Post
Going off the topic here (which is that the SOLiD is not good for denovo work) I wonder where you get that statement. It seems to me that 60 quality bases would be enough to place accurately except for long repeat regions (e.g., LTRs).
I suspect I've discussed this with you previously, but I might as well say things I haven't said before:

Homopolymers look identical in colour-space, which causes havoc for transcriptome assemblies (e.g. distinguishing between poly-T and poly-A sequences). Other simple repeats would also cause issues for genomic assembly (e.g. ACACACACAC and GTGTGTGTGT are identical, despite having both a base shift and a complementation). The assemblies are only likely to be useful in colour-space, because colour-space errors propagate through as very different sequences in base-space. Also, every contig has four possible base-space representations, which among other things makes it quite difficult to use other genome assemblies as scaffolds for a colour-space assembly.
gringer is offline   Reply With Quote
Old 07-02-2016, 01:00 PM   #12
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 219
Default

Quote:
Originally Posted by gringer View Post
I suspect I've discussed this with you previously, but I might as well say things I haven't said before:

Homopolymers look identical in colour-space, which causes havoc for transcriptome assemblies (e.g. distinguishing between poly-T and poly-A sequences). Other simple repeats would also cause issues for genomic assembly (e.g. ACACACACAC and GTGTGTGTGT are identical, despite having both a base shift and a complementation). The assemblies are only likely to be useful in colour-space, because colour-space errors propagate through as very different sequences in base-space. Also, every contig has four possible base-space representations, which among other things makes it quite difficult to use other genome assemblies as scaffolds for a colour-space assembly.
I guess I still don't understand the "issues" with deconvoluting colour-space. It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly (attached).
Attached Files
File Type: pdf nrg.2016.49.pdf (2.11 MB, 4 views)
cement_head is offline   Reply With Quote
Old 07-02-2016, 01:14 PM   #13
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 824
Default

Quote:
Originally Posted by cement_head View Post
It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly.
If our preferred model of DNA were colour-space, then it might have been more accurate with sufficient technology development. As it is, Illumina has had plenty of opportunity to improve the accuracy of their technology, and benefits from their chemical model being almost a direct representation of the DNA model that we use for sequencing.
gringer is offline   Reply With Quote
Old 07-05-2016, 06:55 AM   #14
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Quote:
Originally Posted by cement_head View Post
I guess I still don't understand the "issues" with deconvoluting colour-space. It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly (attached).
The quoted error rate (<0.1%) must be after reference-based correction. The problem with SOLiD was the high raw error rate of the ligation based chemistry (compared to Illumina) and the short read lengths which makes it essentially useless for de novo assembly.

I think the best option today for a large genome and a low budget would be to use the 10x Chromium with HiseqX (~$2000 for one lane PE150 linked reads from long fragments).
Chipper is offline   Reply With Quote
Old 07-05-2016, 12:09 PM   #15
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 219
Default

Quote:
Originally Posted by Chipper View Post
The quoted error rate (<0.1%) must be after reference-based correction. The problem with SOLiD was the high raw error rate of the ligation based chemistry (compared to Illumina) and the short read lengths which makes it essentially useless for de novo assembly.

I think the best option today for a large genome and a low budget would be to use the 10x Chromium with HiseqX (~$2000 for one lane PE150 linked reads from long fragments).
So I took another look at this and it strikes me that the whole problem is the use of only four fluors for 16 combinations. (Seems odd that this wasn't the primary issue attempted to be solved; i.e generating 16 distinct fluors.) Once I got that part, it became obvious why there's an issue with colourspace. Curiously, I just found out that MiniSeq and NextSeq from Illumina use only two fluors - seems like a huge potential issue is one isn't resequencing a human genome...
cement_head is offline   Reply With Quote
Old 07-05-2016, 12:20 PM   #16
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by cement_head View Post
So I took another look at this and it strikes me that the whole problem is the use of only four fluors for 16 combinations. (Seems odd that this wasn't the primary issue attempted to be solved; i.e generating 16 distinct fluors.) Once I got that part, it became obvious why there's an issue with colourspace. Curiously, I just found out that MiniSeq and NextSeq from Illumina use only two fluors - seems like a huge potential issue is one isn't resequencing a human genome...
Yes, if Solid had used 16 colors it might have been substantially better, though that would have added its own unique issues (like potentially taking 4x as long to sequence).

Illumina's 2-color chemistry is not like Solid Colorspace, though. It's just a binary encoding of bases -> colors; no information is lost (since no two bases share the same pair of color polarities), except that you can no longer distinguish between no signal and one of the bases. It works fairly well in practice (for de-novo sequencing) and you don't need to align sequences to determine what they are. The 2-color platforms have weaknesses, but it is not clear that the weaknesses are linked to the number of dyes.

Last edited by Brian Bushnell; 07-05-2016 at 12:22 PM.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
genomes, repeats, solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO