Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by bioinfosm View Post
    I am still curious as to how SOLiD and Solexa compare apples-to-apples. Both produce short reads, but still not much about how similar or complementary they are!

    Met a few at AGBT and still could not find the answers..
    It's not easy to compare since throughput changes so fast on both instruments - for example the latest Genome Biology RNA-seq paper used 38 lanes to get 138 M aligned reads which is a number you can get from one SOLiD slide (1/2 run) today. What the current numbers are for the GA-II I do not know. What sort of apples are you interrested in comparing?...

    Comment


    • #17
      I am interested in the quality of data. Using say 6million 35bp reads on the same sample, which instrument should one prefer, say for SNP calling. From a celegans comparison paper, it looks SOLiD has a slight advantage in calling rare SNP? Does its 2-base encoding really give more accurate results?
      --
      bioinfosm

      Comment


      • #18
        Originally posted by new300 View Post
        How many raw and aligned reads per run do you get out of your Solid?
        From a project that I have been working on this week since the data come off the sequencer Monday evening. This is one run. Mate-paired 25-base to a non-human eukaryotic organism. One region/plate.

        Raw reads: ~142M

        Mapped R3 reads: ~114M for unique & random at 3 mismatches
        Mapped F3 reads: ~118M (ditto)

        Mapped R3 reads: ~77M for uniquely placed reads at 3 mismatches
        Mapped F3 reads: ~75M (ditto)

        Paired F3-R3 reads: ~78M

        So Approximately 3900 Mbases. (78M times 50 bases).

        SNP analysis is currently in progress on the paired reads. From my work with the mapped but not-paired reads we should obtain quite a few SNPs.

        Comment


        • #19
          Originally posted by bioinfosm View Post
          I am interested in the quality of data. Using say 6million 35bp reads on the same sample, which instrument should one prefer, say for SNP calling. From a celegans comparison paper, it looks SOLiD has a slight advantage in calling rare SNP? Does its 2-base encoding really give more accurate results?
          In theory color-space should give more accurate results for SNP calling. The concept is that it takes two adjacent color space mismatch to indicate a SNP. If you see a single color-space mismatch then you can flag read that as a sequencer error. Compare this to traditional base-space where, when you see a single mismatch, you have no idea if this arises from a sequencer error or a SNP. Depth of coverage can take help resolve the problem but there are limits to that especially for rare SNPs.

          In practice the rate of sequencer error could play a major role. Obviously if there is too much sequencer error then too much data will be thrown away and nothing will be found. The SOLiD's error rate may be higher than the Solexa's. I do not have firm numbers on this, however.

          Let's do a couple of thought experiments. Say that there is a common SNP that occurs in 50% of the population. Furthermore say that the SOLiD has a 0.5% error rate per base while the Solexa is 1/5 that - 0.1% per base [note that I am just making up those numbers -- the actual rates are probably much different]. If we pool 100 individuals together in a run of 25 mers then -- very roughly since I am doing simple probability here --

          The SOLiD run will -- for sequencer errors -- generate 12 - 13 runs with a single mismatch and 0 - 1 runs with adjacent mismatches.

          Co-mingled with the above will be 50 runs with 2 adjacent mismatches that represent the SNPs.

          So overall there will be about:

          44 runs without mismatches -- the non-SNPs
          44 runs with adjacent mismatches - the SNPs plus *maybe* 1 error run
          12 runs with non-adjacent mismatch(es) -- errors for both non-SNPs and SNPs

          When we look at the data we would toss out the non-adjacent mismatch reads as errors. We would then pick up 44 adjacent mismatch runs representing the same SNP and maybe 1 run representing a different (and erroneous) SNP.

          For the Solexa there would be:
          52 runs with a mismatch(es) -- 50 real SNPs and 2 or maybe 3 runs with errors.
          48 runs without mismatches.

          Once again it is easy to pick up the true SNP since 50 of the runs all have a mismatch in the same location and the 2 or 3 runs that indicate SNPs are simply errors and could be tossed.

          Now ... for the rare variant that occurs in 2% of the population.

          The SOLiD has
          84 runs with no mismatches
          12 runs with non-adjacent mismatch(es)
          2 runs with adjacent mismatches and *maybe* 1 adjacent mismatch error run

          Those two adjacent mismatches are the real SNP. The errors are simply tossed.

          The Solexa has
          96 runs with no mismatches
          4 (maybe 5) runs with mismatches.

          2 of the adjacent mismatches are the real SNP while 2 or 3 are errors.

          In neither case does the platform pick up the real SNP unambiguously -- it is hard to do when sequencers generate errors -- but the SOLiD (and color space) does work, in theory, better with the rare variants. It works even better if we assume that the sequencer error is the same as the Solexa's.

          Next up: color space and indels. Once my head stops hurting.

          Comment


          • #20
            Originally posted by westerman View Post
            So Approximately 3900 Mbases. (78M times 50 bases).
            So, I can't really see the throughput advantage of the Solid there. GA1 runs I've seen are around 4Gb. If you look at the short read archive GA2 runs are around 7Gb+ with 35bp reads. For PhiX around 95% of Illumina reads align within 2 errors. For human I think you tend to see about 80%. Those are 35bp reads I believe. There are 50bp reads in the SRA which appear to go up to 14Gb.

            Comment


            • #21
              Thanks Westerman, those are useful thoughts, and I believe the same, the SOLiD may perform better for rare variants even with same error rate of instrument as illumina
              --
              bioinfosm

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X