Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fabio25
    Member
    • Aug 2008
    • 13

    Solid VS Solexa

    Dear Everybody,
    I would like to ask if someone can help me to understand the difference among the Solid machine and the Solexa one. Which tools are used to analyze such type of data and which is the difference as wet work?
    Thanks a lot
  • joa_ds
    Member
    • Dec 2008
    • 52

    #2
    good question.

    First of all, everybody says their machine is the best choice (of course, they bought it...). No real benchmarking has been done in the past.

    They all have their own good points and weak points...

    But a solexa and a Solid are waaaay different. Just the chemistry is totally different. Nucleotide vs color space. I can imagine the solid pipeline will be totally different. Maybe only the image processing can be somewhat the same, but even then...



    happy reading

    Comment

    • new300
      Member
      • Mar 2008
      • 50

      #3
      Originally posted by joa_ds View Post
      good question.

      First of all, everybody says their machine is the best choice (of course, they bought it...). No real benchmarking has been done in the past.
      I think a bunch of benchmarking has been done, by genome centres for example. But I don't think much of it has been published.

      From what I've heard SNP call error rate on a good SOLiD run is about the same as an Illumina. I don't know what the run failure rate is like.

      From by brief look at the single colour change error rate on the SOLiD it's somewhere around 7%, that was in the E.Coli data release they did: http://www.genographia.org/portal/to...rimer.pdf/view Illumina error rate is around 1 or 2% on a good run (including contamination in both cases, doing a brute force alignment). The other issue is that a single error in a SOLiD read effectively corrupts the rest of the read, unless you have a reference. So for anything de novo, you're a bit stuck.

      My feeling is that the market is showing that right now the GA is a more versatile platform, with longer reads and a lower base error rate. If you look at the number of GA publications against the number of SOLiD publications that gives you a good idea of how useful people are finding that data. There's a neat graph here: www.mrgc.com.my

      Comment

      • Chipper
        Senior Member
        • Mar 2008
        • 323

        #4
        Originally posted by new300 View Post
        I think a bunch of benchmarking has been done, by genome centres for example. But I don't think much of it has been published.

        From what I've heard SNP call error rate on a good SOLiD run is about the same as an Illumina. I don't know what the run failure rate is like.

        From by brief look at the single colour change error rate on the SOLiD it's somewhere around 7%, that was in the E.Coli data release they did: http://www.genographia.org/portal/to...rimer.pdf/view Illumina error rate is around 1 or 2% on a good run (including contamination in both cases, doing a brute force alignment). The other issue is that a single error in a SOLiD read effectively corrupts the rest of the read, unless you have a reference. So for anything de novo, you're a bit stuck.

        My feeling is that the market is showing that right now the GA is a more versatile platform, with longer reads and a lower base error rate. If you look at the number of GA publications against the number of SOLiD publications that gives you a good idea of how useful people are finding that data. There's a neat graph here: www.mrgc.com.my
        1. Why do you compare the single color change to the base call error rate on Illumina? 2. SOLiD de novo assembly can be done with error correction at least with Velvet. 3. The publication numbers reflect more the ratio of Ill:SOLiD in use than anything else I guess. With the coming updates it will probably be competetive with Ill. in terms of handling as well.

        Comment

        • new300
          Member
          • Mar 2008
          • 50

          #5
          Originally posted by Chipper View Post
          1. Why do you compare the single color change to the base call error rate on Illumina?
          Because that's what you'll have to deal with if you're doing de novo stuff. The 2 colour change makes it comparable for SNP calling but not de novo...

          Originally posted by Chipper View Post
          2. SOLiD de novo assembly can be done with error correction at least with Velvet. 3.
          Error correction only buys you so much, and the quality of your assembly will be a function of the base error rate and read length. GA reads have a lower error rate... so I think for this application are probably better.

          I've not seen any de novo SOLiD assemblies so if anybody has this I'd be interested in taking a look.

          Originally posted by Chipper View Post
          The publication numbers reflect more the ratio of Ill:SOLiD in use than anything else I guess.
          I think the fact that there are more GAs in use reflects the fact that people prefer them and get more data out of them...

          Originally posted by Chipper View Post
          With the coming updates it will probably be competetive with Ill. in terms of handling as well.
          Yep, we'll have to wait and see the markets always changing.

          Comment

          • jkbonfield
            Senior Member
            • Jul 2008
            • 146

            #6
            Originally posted by Chipper View Post
            1. Why do you compare the single color change to the base call error rate on Illumina? 2. SOLiD de novo assembly can be done with error correction at least with Velvet. 3. The publication numbers reflect more the ratio of Ill:SOLiD in use than anything else I guess. With the coming updates it will probably be competetive with Ill. in terms of handling as well.
            When doing denovo sequence assembly you're essentially aligning in colour space, just treating the 0, 1, 2 and 3 as 4 characters to align (eg rename them, albeit misleadingly, ACGT if it makes programs work). In this respect it's incorrect to claim that a single error makes the rest of the read unusable from that point on. However it's also incorrect to assume you need two adjacent errors for a problem to arise.

            For what it's worth even when doing mapping experiments the single error rate DOES still somewhat matter. It directly relates to your mapping confidence values. Have too many errors and you'll find it both hard to map and also have a significant chance of placing things in the wrong location.

            Comment

            • new300
              Member
              • Mar 2008
              • 50

              #7
              Originally posted by jkbonfield View Post
              When doing denovo sequence assembly you're essentially aligning in colour space, just treating the 0, 1, 2 and 3 as 4 characters to align (eg rename them, albeit misleadingly, ACGT if it makes programs work). In this respect it's incorrect to claim that a single error makes the rest of the read unusable from that point on. However it's also incorrect to assume you need two adjacent errors for a problem to arise.
              I probably didn't explain myself very well. If you align in colour space your "colour assembly" will probably be ok. However you need to get back in to base space at some point. The two options I can see are:

              1. You translate the assembled colour space contigs in to base space. This will be bad because a single error will corrupt the rest of contig.

              2. You align in colour space, then translate the individual reads back in to basespace. In this case you limit corruption to that remaining part of that single read. So... the read was useful for building contigs. But unless you were able to correct the error not for basecalling.

              In practice I'd expect this to cause significant issues for de novo assembly, but there might be ways round these issues I've not considered.

              Comment

              • jkbonfield
                Senior Member
                • Jul 2008
                • 146

                #8
                Originally posted by new300 View Post
                1. You translate the assembled colour space contigs in to base space. This will be bad because a single error will corrupt the rest of contig.
                You sort of want to avoid doing this until as late as possible - but fundamentally there'll come a time when it needs to be done before the assembly has been finished. Eg to merge with other data or to start sequence analysis on an unfinished genome.

                Originally posted by new300 View Post
                2. You align in colour space, then translate the individual reads back in to basespace. In this case you limit corruption to that remaining part of that single read. So... the read was useful for building contigs. But unless you were able to correct the error not for basecalling.
                Well that individual read's contribution was poor for the consensus generation, but the rest are hopefully enough to compensate.

                I think there's a 3rd route too which is a combination of 1 and 2 above. You can compute the consensus from all reads in colour space, like option 1, before converting to DNA space for use in other tools. However using the known last base of the primer for each read we can verify whether the sequence matches a consensus. If it doesn't then it implies in the last few bases a consensus colour call was incorrect and our colour to dna conversion became out of sync.

                Essentially this is using the last primer base as an auto-correction system to ensure that we always know which of the 4 "phases" the colour to base conversion system should be in. If we have sufficient depth then we'll get the resolution quite high, possibly to the base level (say 25 fold and above). It's not as robust as comparison against a reference sequence and SNP correction as we only have one correcting factor per read rather than per base, but there's sufficient information to use still. This of course assumes that the assembly is correct. Misassemblies would cause problems still.

                Rather messy and personally not appealing unless we could see some tangible gain for having to go through the extra hoops.

                Comment

                • lh3
                  Senior Member
                  • Feb 2008
                  • 686

                  #9
                  I second James' 3rd idea. Both AB and maq implement "reference-based translation" from color space to nucleotide space. Such translation is very robust to color errors: if the read mapping is right, we can confidently correct most of color errors. I do not know how AB achieves this, maq does so with a simple O(4*4*L)-time dynamical programming (the DP part is just in 50 lines of C codes). This DP can also realize James' idea: we take color contig as a "read" and take the sequence of first nucleotide on each real individual read as "reference"; some holes on the "reference" should not matter too much. Translating color contig in this way is also very robust to color errors.

                  Comment

                  • new300
                    Member
                    • Mar 2008
                    • 50

                    #10
                    Originally posted by jkbonfield View Post
                    I think there's a 3rd route too which is a combination of 1 and 2 above. You can compute the consensus from all reads in colour space, like option 1, before converting to DNA space for use in other tools. However using the known last base of the primer for each read we can verify whether the sequence matches a consensus. If it doesn't then it implies in the last few bases a consensus colour call was incorrect and our colour to dna conversion became out of sync.

                    Essentially this is using the last primer base as an auto-correction system to ensure that we always know which of the 4 "phases" the colour to base conversion system should be in.
                    ok yes. If I've understood correctly:

                    You build a colour space consensus. Align colour space reads to it. Then convert the consensus and the first bases of the read to base space (on the fly). If you compare these and you get a mismatch, then you know your consensus has gone out of phase (or there was some really horrible error in the read).

                    So... when you do detect an error the bases between the last know good initial base, and next known good are in doubt. To get this down to base resolution you'd need coverage==read length.

                    I think that's a neat trick that could help out a fair bit, though like you say it's not as robust as the SNP trick.

                    I think there is some probability of error in that first base. It's a single colour change error so based on the data I've seen I'd guess around 2% as for a normal base it's about 6->8%... So you could end up marking a bad/good, good/bad. The resolution would need to be high to avoid this effecting a large number of bases.

                    Originally posted by jkbonfield View Post
                    If we have sufficient depth then we'll get the resolution quite high, possibly to the base level (say 25 fold and above). It's not as robust as comparison against a reference sequence and SNP correction as we only have one correcting factor per read rather than per base, but there's sufficient information to use still. This of course assumes that the assembly is correct. Misassemblies would cause problems still.

                    Rather messy and personally not appealing unless we could see some tangible gain for having to go through the extra hoops.
                    It sounds like a technique like this would be the right kind of strategy for Solid data. I think it's a work around for the problems caused by 2 colour changes in de novo assembly though, it's not buying you an error correction (which respect to basespace) like with the SNP stuff.

                    So when your doing your actual assembly you're still left with the single colour change error rate when overlapping. I think with an error rate this high, and short reads you'd be lucky to produce a consensus good enough to work from... maybe if you could filter out a lot of the errors...

                    Right now I can't see that the Solids are likely to be competitive for de novo, not compared with the GAs. Read lengths would need to be longer and the single colour change error rate lower. Either that or they'd need a throughput advantage of at least an order of magnitude.

                    Comment

                    • Chipper
                      Senior Member
                      • Mar 2008
                      • 323

                      #11
                      Is an error rate of 6-8 % (single color change) really normal? How is this value calculated and what is the corrisponding error rate for the Illumina? Are the numbers affected by the lack of filtering of empty or mixed beads on the SOLiD and if so would it be better to apply quality filtering on SOLiD data before doing de novo assembly?

                      Comment

                      • new300
                        Member
                        • Mar 2008
                        • 50

                        #12
                        Originally posted by Chipper View Post
                        Is an error rate of 6-8 % (single color change) really normal? How is this value calculated and what is the corrisponding error rate for the Illumina? Are the numbers affected by the lack of filtering of empty or mixed beads on the SOLiD and if so would it be better to apply quality filtering on SOLiD data before doing de novo assembly?
                        It's what I saw when I did a brute force alignment of the E.Coli data release (http://www.genographia.org/portal/to...rimer.pdf/view). My understanding is that empty beads are filtered early on. There were also no reads with more than 8 errors. This makes me think some filtering had been applied. Some MSc students I was working with also saw similar error rates in the Yorubian dataset.

                        You also should also be able to calculate the single colour change error rate from the SNP miscall rate. I've seen this quoted as 0.036 which I think should be roughly equivalent to a single colour change error rate of 6%. Those are the only numbers I have to go on, a comprehensive review would be useful.

                        I think additional filtering would help, it's a trade off between that throughput.

                        As delivered by device/quoted in throughput numbers the Illumina error rate is around 1%. They apply relatively harsh filtering to remove mixed clusters during primary data analysis. I'd be interesting to see a Solid dataset where filtering had been applied to get the single colour change error rate down to 1%, that would make for a useful comparison.

                        Comment

                        • bioinfosm
                          Senior Member
                          • Jan 2008
                          • 483

                          #13
                          I am still curious as to how SOLiD and Solexa compare apples-to-apples. Both produce short reads, but still not much about how similar or complementary they are!

                          Met a few at AGBT and still could not find the answers..
                          --
                          bioinfosm

                          Comment

                          • westerman
                            Rick Westerman
                            • Jun 2008
                            • 1104

                            #14
                            Wasn't there a paper within the last several months which compared all three platforms and basically came up with the conclusion that all three platforms were equally good -- at least on bacteria. The SOLiD may have come out ahead on SNP calling.

                            I believe the problem is not apples-to-apples but rather the other considerations:

                            (1) Ease of lab prep.
                            (2) Cost of running.
                            (3) Length of reads.
                            (4) Number of reads.
                            (5) Which machines my organization will pony up the money for. :-)

                            My organization has two sequencers -- a 454 and a SOLiD. As a computer guy which do I like better? It depends on the project. Would I like a Solexa? Sure. Supposedly easier chemistry than the SOLiD with longer reads but more expensive to run with fewer reads and not as good SNP calling as the SOLiD. But heck, if the powers that be want to buy us a Solexa and pay for the service contract ... well, I suspect that we find room in our already over-crowded lab for it.

                            What I really want to see is a paper comparing the sequencing of repetitive eukaryotic organisms (not human!) when given a project with "X" dollars to spend and "Y" weeks to complete it.

                            Comment

                            • new300
                              Member
                              • Mar 2008
                              • 50

                              #15
                              Originally posted by westerman View Post
                              Wasn't there a paper within the last several months which compared all three platforms and basically came up with the conclusion that all three platforms were equally good -- at least on bacteria. The SOLiD may have come out ahead on SNP calling.
                              I remember seeing this but not having the time to read it, do you have the citation?

                              Originally posted by westerman View Post
                              I believe the problem is not apples-to-apples but rather the other considerations:

                              (1) Ease of lab prep.
                              (2) Cost of running.
                              (3) Length of reads.
                              (4) Number of reads.
                              (5) Which machines my organization will pony up the money for. :-)
                              Agreed and it's not always a question of purchasing but freebies also get handed out to promote a product. It all muddies the water somewhat.

                              Originally posted by westerman View Post
                              My organization has two sequencers -- a 454 and a SOLiD. As a computer guy which do I like better? It depends on the project. Would I like a Solexa? Sure. Supposedly easier chemistry than the SOLiD with longer reads but more expensive to run with fewer reads and not as good SNP calling as the SOLiD. But heck, if the powers that be want to buy us a Solexa and pay for the service contract ... well, I suspect that we find room in our already over-crowded lab for it.
                              How many raw and aligned reads per run do you get out of your Solid?

                              Originally posted by westerman View Post
                              What I really want to see is a paper comparing the sequencing of repetitive eukaryotic organisms (not human!) when given a project with "X" dollars to spend and "Y" weeks to complete it.
                              I guess what you really want is to look at a variety of sequence structure for a variety of applications (SNP calling, de novo assembly, CNV, structural variant stuff etc.). Be very interesting.

                              Most of the genome centers seem to be gearing up with Illuminas at the moment. Sanger have 40 odd, WashU 35... I've not seem much hard evidence to backup Solids but then I've mostly worked with Solexa data.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...