Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pmiguel
    Senior Member
    • Aug 2008
    • 2328

    dual base encoding and de novo assembly

    AB takes a lot of flak for what is arguably a major strength of their methodology: dual base encoding. Yesterday someone mentioned to me that dual base encoding was only an advantage when one had a reference sequence to map to.

    We haven't tried de novo assembly on SOLiD data sets much, so I did not really argue the point at the time. However, in retrospect, I do not see any reason that the benefits of dual base encoding would not play out in de novo assembly. That is, a single base miscall might be recognized as a miscall in the context of other reads assembling into the same contig.

    But do color space aware de novo assemblers take advantage of dual base encoding? Anyone know?

    --
    Phillip
  • aguffanti
    Member
    • Dec 2008
    • 29

    #2
    Yes there is a wrapper around velvet which uses something like that, it is available on the SOLiD web site. I am also working at something like this, but sloowly

    HTH

    Alessandro


    Originally posted by pmiguel View Post
    AB takes a lot of flak for what is arguably a major strength of their methodology: dual base encoding. Yesterday someone mentioned to me that dual base encoding was only an advantage when one had a reference sequence to map to.

    We haven't tried de novo assembly on SOLiD data sets much, so I did not really argue the point at the time. However, in retrospect, I do not see any reason that the benefits of dual base encoding would not play out in de novo assembly. That is, a single base miscall might be recognized as a miscall in the context of other reads assembling into the same contig.

    But do color space aware de novo assemblers take advantage of dual base encoding? Anyone know?

    --
    Phillip

    Comment

    • snetmcom
      Senior Member
      • Oct 2008
      • 159

      #3
      It definitely does when you build your contigs. I'm still trying to wrap my ahead around their new tool, but it's supposed to do just that.

      Comment

      • pmiguel
        Senior Member
        • Aug 2008
        • 2328

        #4
        Originally posted by snetmcom View Post
        It definitely does when you build your contigs. I'm still trying to wrap my ahead around their new tool, but it's supposed to do just that.
        http://solidsoftwaretools.com/gf/project/saet/
        Whoa! Thanks for pointing that out snetmcom!

        From the blurb on given on the page, that is not all what I thought the SOLiD Accuracy Enhancement Tool did. I thought it was a tool to discard reads judged to contain too many errors.

        But after reading the pdf, looks like you are right. The tool actually appears be using both quality values and dual-base encoding to correct base calling errors. Interesting.

        --
        Phillip

        Comment

        • KevinLam
          Senior Member
          • Nov 2009
          • 204

          #5
          Originally posted by pmiguel View Post
          Whoa! Thanks for pointing that out snetmcom!

          From the blurb on given on the page, that is not all what I thought the SOLiD Accuracy Enhancement Tool did. I thought it was a tool to discard reads judged to contain too many errors.

          But after reading the pdf, looks like you are right. The tool actually appears be using both quality values and dual-base encoding to correct base calling errors. Interesting.

          --
          Phillip
          the saet looks interesting indeed. any solid developers here?
          I am curious so if i were to use this tool would the base space data after this running this tool be good enough for de novo assembly?

          would you still need to run the saet if you are already doing de novo assembly with a color space aware program?
          http://kevin-gattaca.blogspot.com/

          Comment

          • aguffanti
            Member
            • Dec 2008
            • 29

            #6
            However, you still need a conversion (in the current state of the art) from color space to 'pseudo' nucelotides in order to use an assembler like Velvet. They are not number aware. Since I am workink with colleagues on a porting of SSAKE to Color Space I was thinking that maybe it could be the good opportunity to work directly in colors with the assembler ? But I am worried by the intial T ... Anyone here working on de novo assembly with SOLiD?

            A simpler idea would be to use SAET to obtain an high-quality dataset, convert the sequences in real nucleotide space following the established rules and then use a 'traditional' assembler. I think I could use this strategy on a small de novo viral genome. Anybody interested, just give me whistle.

            Alessandro

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              Unless I skipped something in reading the documentation, all that SAET does is correct reads with missing (dot/period) color-space calls in them. While that is nice it is hardly significant. One of our last runs had 678K (missing calls) out of 65000K (total) reads. Or an missing call rate of about 1%.

              Comment

              • aguffanti
                Member
                • Dec 2008
                • 29

                #8
                Hi. No apparently he does more than this, but I am testing it just now with genome resequencing fragment.

                It is also evident from the doc examples that it actually corrects sequences even without dots:

                Input: reads.csfasta
                >1015_1635_189_F3_I1
                T0320310030001120012311330

                Output: reads.csfasta
                >1015_1635_189_F3_I1
                T0320310030001122012311330

                HTH

                Alessandro



                Originally posted by westerman View Post
                Unless I skipped something in reading the documentation, all that SAET does is correct reads with missing (dot/period) color-space calls in them. While that is nice it is hardly significant. One of our last runs had 678K (missing calls) out of 65000K (total) reads. Or an missing call rate of about 1%.

                Comment

                • westerman
                  Rick Westerman
                  • Jun 2008
                  • 1104

                  #9
                  I see what you mean. That di-base gets changed and the quality goes from 8 to 0.

                  I must say that I am uncomfortable with the idea of changing data without knowing what the data will be eventually used for. However it is nice to see SOLiD using the quality values. Corona lite and, I believe, Bioscope do not take QVs into account. SAET does take QVs into account and what seems to be in a safe manner. The documentation says "... Positions with quality values above 10 should not be corrected."

                  Comment

                  • westerman
                    Rick Westerman
                    • Jun 2008
                    • 1104

                    #10
                    So I downloaded SAET and put one of our datasets called 'rojo' through it. This is the one that had 678K beads with dots. Out of 65.4 million beads (this was a partial run) the SAET program corrected 30.6 million beads. Very interesting! This is a SNP calling project thus it will be interesting to see what extra SNPs can be called (or are now missing) with the corrected data. I'll report when I can.

                    BTW, SAET took 4 hours to process the 65.4M data using a 16-core, high memory computer.

                    Comment

                    • KevinLam
                      Senior Member
                      • Nov 2009
                      • 204

                      #11
                      Originally posted by westerman View Post

                      BTW, SAET took 4 hours to process the 65.4M data using a 16-core, high memory computer.
                      Neat am doing benchmark testing as well but on sample data going to generate some fake data later to test de novo assembly.

                      how do you get a 16 core machine?
                      did u run it on a cluster with PBS? how does one do that?
                      http://kevin-gattaca.blogspot.com/

                      Comment

                      • westerman
                        Rick Westerman
                        • Jun 2008
                        • 1104

                        #12
                        1) You buy a 16-core machine. Ours cost about USD $10,000 with 128 GB memory. With the cost of the SOLiD itself hovering around $500,000 and the cost of a SOLiD run hovering around $10,000 it becomes quite easy to convince the powers-that-be to throw a bit of money towards computer hardware. If nothing else I point out that the lab techs can go through $10K of reagents in seconds :-)

                        2) SAET doesn't run under PBS as far as I can tell; i.e., I believe it runs only on a single machine but can use all of the cores on that single machine. Corona Lite and Bioscope do use PBS.

                        3) I still don't have SNP calling done on the SAET-corrected data. I was hoping that this process would complete overnight but no such luck. I think that Bioscope's messaging service crashed on me. :-( So ... probably no SNP results until after the upcoming holidays.

                        Comment

                        Latest Articles

                        Collapse

                        • GATTACAT
                          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by GATTACAT
                          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                          Yesterday, 11:43 AM
                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Today, 11:08 AM
                        0 responses
                        1 view
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-30-2026, 05:37 AM
                        0 responses
                        11 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-26-2026, 11:10 AM
                        0 responses
                        18 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        52 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...