Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dual base encoding and de novo assembly

    AB takes a lot of flak for what is arguably a major strength of their methodology: dual base encoding. Yesterday someone mentioned to me that dual base encoding was only an advantage when one had a reference sequence to map to.

    We haven't tried de novo assembly on SOLiD data sets much, so I did not really argue the point at the time. However, in retrospect, I do not see any reason that the benefits of dual base encoding would not play out in de novo assembly. That is, a single base miscall might be recognized as a miscall in the context of other reads assembling into the same contig.

    But do color space aware de novo assemblers take advantage of dual base encoding? Anyone know?

    --
    Phillip

  • #2
    Yes there is a wrapper around velvet which uses something like that, it is available on the SOLiD web site. I am also working at something like this, but sloowly

    HTH

    Alessandro


    Originally posted by pmiguel View Post
    AB takes a lot of flak for what is arguably a major strength of their methodology: dual base encoding. Yesterday someone mentioned to me that dual base encoding was only an advantage when one had a reference sequence to map to.

    We haven't tried de novo assembly on SOLiD data sets much, so I did not really argue the point at the time. However, in retrospect, I do not see any reason that the benefits of dual base encoding would not play out in de novo assembly. That is, a single base miscall might be recognized as a miscall in the context of other reads assembling into the same contig.

    But do color space aware de novo assemblers take advantage of dual base encoding? Anyone know?

    --
    Phillip

    Comment


    • #3
      It definitely does when you build your contigs. I'm still trying to wrap my ahead around their new tool, but it's supposed to do just that.

      Comment


      • #4
        Originally posted by snetmcom View Post
        It definitely does when you build your contigs. I'm still trying to wrap my ahead around their new tool, but it's supposed to do just that.
        http://solidsoftwaretools.com/gf/project/saet/
        Whoa! Thanks for pointing that out snetmcom!

        From the blurb on given on the page, that is not all what I thought the SOLiD Accuracy Enhancement Tool did. I thought it was a tool to discard reads judged to contain too many errors.

        But after reading the pdf, looks like you are right. The tool actually appears be using both quality values and dual-base encoding to correct base calling errors. Interesting.

        --
        Phillip

        Comment


        • #5
          Originally posted by pmiguel View Post
          Whoa! Thanks for pointing that out snetmcom!

          From the blurb on given on the page, that is not all what I thought the SOLiD Accuracy Enhancement Tool did. I thought it was a tool to discard reads judged to contain too many errors.

          But after reading the pdf, looks like you are right. The tool actually appears be using both quality values and dual-base encoding to correct base calling errors. Interesting.

          --
          Phillip
          the saet looks interesting indeed. any solid developers here?
          I am curious so if i were to use this tool would the base space data after this running this tool be good enough for de novo assembly?

          would you still need to run the saet if you are already doing de novo assembly with a color space aware program?
          http://kevin-gattaca.blogspot.com/

          Comment


          • #6
            However, you still need a conversion (in the current state of the art) from color space to 'pseudo' nucelotides in order to use an assembler like Velvet. They are not number aware. Since I am workink with colleagues on a porting of SSAKE to Color Space I was thinking that maybe it could be the good opportunity to work directly in colors with the assembler ? But I am worried by the intial T ... Anyone here working on de novo assembly with SOLiD?

            A simpler idea would be to use SAET to obtain an high-quality dataset, convert the sequences in real nucleotide space following the established rules and then use a 'traditional' assembler. I think I could use this strategy on a small de novo viral genome. Anybody interested, just give me whistle.

            Alessandro

            Comment


            • #7
              Unless I skipped something in reading the documentation, all that SAET does is correct reads with missing (dot/period) color-space calls in them. While that is nice it is hardly significant. One of our last runs had 678K (missing calls) out of 65000K (total) reads. Or an missing call rate of about 1%.

              Comment


              • #8
                Hi. No apparently he does more than this, but I am testing it just now with genome resequencing fragment.

                It is also evident from the doc examples that it actually corrects sequences even without dots:

                Input: reads.csfasta
                >1015_1635_189_F3_I1
                T0320310030001120012311330

                Output: reads.csfasta
                >1015_1635_189_F3_I1
                T0320310030001122012311330

                HTH

                Alessandro



                Originally posted by westerman View Post
                Unless I skipped something in reading the documentation, all that SAET does is correct reads with missing (dot/period) color-space calls in them. While that is nice it is hardly significant. One of our last runs had 678K (missing calls) out of 65000K (total) reads. Or an missing call rate of about 1%.

                Comment


                • #9
                  I see what you mean. That di-base gets changed and the quality goes from 8 to 0.

                  I must say that I am uncomfortable with the idea of changing data without knowing what the data will be eventually used for. However it is nice to see SOLiD using the quality values. Corona lite and, I believe, Bioscope do not take QVs into account. SAET does take QVs into account and what seems to be in a safe manner. The documentation says "... Positions with quality values above 10 should not be corrected."

                  Comment


                  • #10
                    So I downloaded SAET and put one of our datasets called 'rojo' through it. This is the one that had 678K beads with dots. Out of 65.4 million beads (this was a partial run) the SAET program corrected 30.6 million beads. Very interesting! This is a SNP calling project thus it will be interesting to see what extra SNPs can be called (or are now missing) with the corrected data. I'll report when I can.

                    BTW, SAET took 4 hours to process the 65.4M data using a 16-core, high memory computer.

                    Comment


                    • #11
                      Originally posted by westerman View Post

                      BTW, SAET took 4 hours to process the 65.4M data using a 16-core, high memory computer.
                      Neat am doing benchmark testing as well but on sample data going to generate some fake data later to test de novo assembly.

                      how do you get a 16 core machine?
                      did u run it on a cluster with PBS? how does one do that?
                      http://kevin-gattaca.blogspot.com/

                      Comment


                      • #12
                        1) You buy a 16-core machine. Ours cost about USD $10,000 with 128 GB memory. With the cost of the SOLiD itself hovering around $500,000 and the cost of a SOLiD run hovering around $10,000 it becomes quite easy to convince the powers-that-be to throw a bit of money towards computer hardware. If nothing else I point out that the lab techs can go through $10K of reagents in seconds :-)

                        2) SAET doesn't run under PBS as far as I can tell; i.e., I believe it runs only on a single machine but can use all of the cores on that single machine. Corona Lite and Bioscope do use PBS.

                        3) I still don't have SNP calling done on the SAET-corrected data. I was hoping that this process would complete overnight but no such luck. I think that Bioscope's messaging service crashed on me. :-( So ... probably no SNP results until after the upcoming holidays.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X