Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weird stats output of SOAP de Novo

    Dear all,

    I have run SOAP de Novo to assemble a nematode genome.
    SOAP de Novo output a stats file .scafStatistics which I do not understand. I am especially confused about the N50 values. Why are there 2 values?

    Here is the output:

    <-- Information for assembly Scaffold 'SoapOutput-SB372.scafSeq'.(cut_off_length < 100bp) -->

    Size_includeN 76698637
    Size_withoutN 65796073
    Scaffold_Num 15259
    Mean_Size 5026
    Median_Size 160
    Longest_Seq 1079942
    Shortest_Seq 100
    Singleton_Num 11712
    Average_length_of_break(N)_in_scaffold 714

    Known_genome_size NaN
    Total_scaffold_length_as_percentage_of_known_genome_size NaN

    scaffolds>100 15047 98.61%
    scaffolds>500 4059 26.60%
    scaffolds>1K 3004 19.69%
    scaffolds>10K 688 4.51%
    scaffolds>100K 216 1.42%
    scaffolds>1M 1 0.01%

    Nucleotide_A 18790176 24.50%
    Nucleotide_C 14159122 18.46%
    Nucleotide_G 14204857 18.52%
    Nucleotide_T 18641918 24.31%
    GapContent_N 10902564 14.21%
    Non_ACGTN 0 0.00%
    GC_Content 43.11% (G+C)/(A+C+G+T)

    N10 488263 12
    N20 319382 32
    N30 235181 60
    N40 182496 96
    N50 138908 144
    N60 102282 210
    N70 73846 298
    N80 43901 429
    N90 5795 899

    NG50 NaN NaN
    N50_scaffold-NG50_scaffold_length_difference NaN

    <-- Information for assembly Contig 'SoapOutput-SB372.contig'.(cut_off_length < 100bp) -->

    Size_includeN 66764916
    Size_withoutN 66764916
    Contig_Num 69780
    Mean_Size 956
    Median_Size 458
    Longest_Seq 33978
    Shortest_Seq 100

    Contig>100 69392 99.44%
    Contig>500 33098 47.43%
    Contig>1K 20004 28.67%
    Contig>10K 138 0.20%
    Contig>100K 0 0.00%
    Contig>1M 0 0.00%

    Nucleotide_A 19146203 28.68%
    Nucleotide_C 14420728 21.60%
    Nucleotide_G 14387230 21.55%
    Nucleotide_T 18810755 28.17%
    GapContent_N 0 0.00%
    Non_ACGTN 0 0.00%
    GC_Content 43.15% (G+C)/(A+C+G+T)

    N10 6141 779
    N20 4338 2094
    N30 3326 3858
    N40 2586 6144
    N50 2011 9076
    N60 1536 12880
    N70 1122 17959
    N80 755 25179
    N90 410 37034

    NG50 NaN NaN
    N50_contig-NG50_contig_length_difference NaN

    Number_of_contigs_in_scaffolds 58068
    Number_of_contigs_not_in_scaffolds(Singleton) 11712
    Average_number_of_contigs_per_scaffold 16.4

    I have looked all over for the answer but didn´t manage to find it.

    All the best,
    Sophie

  • #2
    The first one is scaffold N50, the second is contig N500. Look for
    <-- Information for assembly Scaffold 'SoapOutput-SB372.scafSeq'.(cut_off_length < 100bp) -->
    resp
    <-- Information for assembly Contig 'SoapOutput-SB372.contig'.(cut_off_length < 100bp) -->
    to see what type of sequences does the statistics refer to.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    8 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    8 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    49 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    66 views
    0 likes
    Last Post seqadmin  
    Working...
    X