Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAP denovo assembly results

    Hi,

    I was able to assemble more than 100 million reads to contigs using SOAP. That was cool, as all other tools ran into memory issues..

    However, I wish to understand some of the properties of contigs, like
    - how many reads were actually used in the assembly
    - what kind of depth of coverage do the contigs have from overlapping reads
    - other properties to determine how confident I could be of the contigs

    any pointers... and any help with extracting info from the soap assembly results (except the contig sequences I already have)
    --
    bioinfosm

  • #2
    Originally posted by bioinfosm View Post
    Hi,

    I was able to assemble more than 100 million reads to contigs using SOAP. That was cool, as all other tools ran into memory issues..

    However, I wish to understand some of the properties of contigs, like
    - how many reads were actually used in the assembly
    - what kind of depth of coverage do the contigs have from overlapping reads
    - other properties to determine how confident I could be of the contigs

    any pointers... and any help with extracting info from the soap assembly results (except the contig sequences I already have)
    This is a good question. I also wanna know. could anyone kindly help us?
    Btw, how much time did it take to denovo assemble more than 100 million reads into contigs?

    Best

    Jing

    Comment


    • #3
      How much RAM did you need with SOAP? I'm curious ^^
      L. Collado Torres, Ph.D. student in Biostatistics.

      Comment


      • #4
        Less than 100Gb RAM, I did not track it (as long as it was not crashing, I was happy)
        Time, it took less than a day to get done with its various steps. A lot of contigs are length 24 and definitely not useful.
        >1 length 24 cvg_0_tip_0
        AAAAAAAAAAAAAAAAAAAAAAAA
        >3 length 24 cvg_0_tip_0
        AAAAAAAAAAAAAAAAAAAAAAAC
        ...
        >347 length 65 cvg_10_tip_0
        TTCAGTAATAACGGCAGACTAATCACCTCAGAAAACACAAAGCACAAGCTTGTGCTTGTCACTTC


        Looking for some documentation to understand this better...
        --
        bioinfosm

        Comment


        • #5
          Hi

          A better way to ask for help about SOAPdenovo , is join to SOAP's mailing list !

          And here is the SOAP site: http://soap.genomics.org.cn . You can submit your email on the home page .

          PS: SOAP use Google group as it's mailing list .
          Another developer ! Focus on Bioinformatics,Internet,Math :0 with URL yangzt.com and vbio.info

          Comment


          • #6
            Hi,

            Can anyone refer me to good documentation on quality assessment of soap denovo assembly (e.g., n50 values, how to compute coverage, and what the output files mean, etc...)

            Thanks

            Comment


            • #7
              Denovo assembly pipeline

              I'm curious if there is a pipeline available for Soap De novo assembly. Is there a requirement for the number of genomes required for Denovo assembly?

              Comment


              • #8
                Dear Members

                I am doing de-novo assembly of human genome from fastq data files. I get contigs as well as scaffolds from tools that I use. I know that scaffolds are a combination of contigs with estimated gaps in between them. Does this mean that downstream analysis when comparing it to another genome such as the reference should be done with contigs more reliably than with scaffolds ?

                Aby

                Comment


                • #9
                  bioinfosm:
                  However, I wish to understand some of the properties of contigs, like
                  - how many reads were actually used in the assembly
                  - what kind of depth of coverage do the contigs have from overlapping reads
                  - other properties to determine how confident I could be of the contigs
                  The way our "lab" (aka office) has looked at the number of reads used in an assembly is to take the raw reads and back align them against the newly created denovo contigs. It should give a pretty good indication. We use Kanga for our back alignments. Damned quick and efficient:

                  Comment


                  • #10
                    I am confused about contig numbers reported by Soap.I was trying to
                    run SoapDenovo for small microbial genomes (Size varies from 5-10MB).

                    However, the number of contigs reported in .contig files is very high
                    (always in thousands) whereas other assemblers giving me contigs less
                    than 500. But Soap-Scaffolding output was better than other
                    assemblers.

                    I am not sure, if I am looking at some intermediate contig file OR
                    contig number is always high in Soap? From my experience contig and
                    Scaffolds numbers differ by few 100s only (at least for small microbial genomes). There is no drastic change, but in Soap contigs were in 2000-3000 range while scaffolds were in 200-500 range. Please explain.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    47 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X