Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAP denovo assembly results

    Hi,

    I was able to assemble more than 100 million reads to contigs using SOAP. That was cool, as all other tools ran into memory issues..

    However, I wish to understand some of the properties of contigs, like
    - how many reads were actually used in the assembly
    - what kind of depth of coverage do the contigs have from overlapping reads
    - other properties to determine how confident I could be of the contigs

    any pointers... and any help with extracting info from the soap assembly results (except the contig sequences I already have)
    --
    bioinfosm

  • #2
    Originally posted by bioinfosm View Post
    Hi,

    I was able to assemble more than 100 million reads to contigs using SOAP. That was cool, as all other tools ran into memory issues..

    However, I wish to understand some of the properties of contigs, like
    - how many reads were actually used in the assembly
    - what kind of depth of coverage do the contigs have from overlapping reads
    - other properties to determine how confident I could be of the contigs

    any pointers... and any help with extracting info from the soap assembly results (except the contig sequences I already have)
    This is a good question. I also wanna know. could anyone kindly help us?
    Btw, how much time did it take to denovo assemble more than 100 million reads into contigs?

    Best

    Jing

    Comment


    • #3
      How much RAM did you need with SOAP? I'm curious ^^
      L. Collado Torres, Ph.D. student in Biostatistics.

      Comment


      • #4
        Less than 100Gb RAM, I did not track it (as long as it was not crashing, I was happy)
        Time, it took less than a day to get done with its various steps. A lot of contigs are length 24 and definitely not useful.
        >1 length 24 cvg_0_tip_0
        AAAAAAAAAAAAAAAAAAAAAAAA
        >3 length 24 cvg_0_tip_0
        AAAAAAAAAAAAAAAAAAAAAAAC
        ...
        >347 length 65 cvg_10_tip_0
        TTCAGTAATAACGGCAGACTAATCACCTCAGAAAACACAAAGCACAAGCTTGTGCTTGTCACTTC


        Looking for some documentation to understand this better...
        --
        bioinfosm

        Comment


        • #5
          Hi

          A better way to ask for help about SOAPdenovo , is join to SOAP's mailing list !

          And here is the SOAP site: http://soap.genomics.org.cn . You can submit your email on the home page .

          PS: SOAP use Google group as it's mailing list .
          Another developer ! Focus on Bioinformatics,Internet,Math :0 with URL yangzt.com and vbio.info

          Comment


          • #6
            Hi,

            Can anyone refer me to good documentation on quality assessment of soap denovo assembly (e.g., n50 values, how to compute coverage, and what the output files mean, etc...)

            Thanks

            Comment


            • #7
              Denovo assembly pipeline

              I'm curious if there is a pipeline available for Soap De novo assembly. Is there a requirement for the number of genomes required for Denovo assembly?

              Comment


              • #8
                Dear Members

                I am doing de-novo assembly of human genome from fastq data files. I get contigs as well as scaffolds from tools that I use. I know that scaffolds are a combination of contigs with estimated gaps in between them. Does this mean that downstream analysis when comparing it to another genome such as the reference should be done with contigs more reliably than with scaffolds ?

                Aby

                Comment


                • #9
                  bioinfosm:
                  However, I wish to understand some of the properties of contigs, like
                  - how many reads were actually used in the assembly
                  - what kind of depth of coverage do the contigs have from overlapping reads
                  - other properties to determine how confident I could be of the contigs
                  The way our "lab" (aka office) has looked at the number of reads used in an assembly is to take the raw reads and back align them against the newly created denovo contigs. It should give a pretty good indication. We use Kanga for our back alignments. Damned quick and efficient:

                  Comment


                  • #10
                    I am confused about contig numbers reported by Soap.I was trying to
                    run SoapDenovo for small microbial genomes (Size varies from 5-10MB).

                    However, the number of contigs reported in .contig files is very high
                    (always in thousands) whereas other assemblers giving me contigs less
                    than 500. But Soap-Scaffolding output was better than other
                    assemblers.

                    I am not sure, if I am looking at some intermediate contig file OR
                    contig number is always high in Soap? From my experience contig and
                    Scaffolds numbers differ by few 100s only (at least for small microbial genomes). There is no drastic change, but in Soap contigs were in 2000-3000 range while scaffolds were in 200-500 range. Please explain.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    23 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X