Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • some trouble with SOAPdenovo, my assembly and the BGI output

    Hi there,
    this is my first post in SEQanswers but I looked the community from long time.
    I started these last 6 month to work with metagenomics using the sequencing service and analysis of BGI of some samples (formally I'm a biochemist with good computer skills) .

    My problem is that to learn about the assembly strategies i tried to reproduce the outpout of BGI on my first sample using SOAPdenovo with the k-mer values that are indicated in the final report to check the best N50 and N90 score, the best length etc.

    So I prepared my SOAP_config_file with the illumina_clean_reads:

    ---------
    #maximal read length
    max_rd_len=90
    [LIB]
    #average insert size
    avg_ins=170
    #if sequence needs to be reversed
    reverse_seq=0
    #in which part(s) the reads are used
    asm_flags=3
    #use only first 100 bps of each read
    rd_len_cutoff=90
    #in which order the reads are used while scaffolding
    rank=1
    # cutoff of pair number for a reliable connection (at least 3 for short insert size)
    pair_num_cutoff=3
    #minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
    map_len=32
    #a pair of fastq file, read 1 file should always be followed by read 2 file
    q1=clean_read_1.fq
    q2=clean_read_2.fq
    ---------

    BGI (testing 21<kmer<63) report that for this sample the best was kmer=33 with
    sequence n°: 1543
    total length: 6169928
    max length: 126110
    min length: 500
    N50: 15863
    N90: 1181
    Then I run SOAPdenovo-63mer with 21<kmer<63.

    My statistics however for the same samples with kmer=33 showed:
    sequence n°: 188926
    total length: 16591907
    max length: 15246
    min length: 34
    N50: 74
    N90: 42
    Then I applied a filter to remove the sequences with less than 500bp and the result was:

    sequence n°: 2529
    total length: 3094882
    max length: 15246
    min length: 500
    N50: 1384
    N90: 584
    Somebody know why these results are so different?

    I tried to run SOAPdenovo both in step-by-step mode and in single command but the result does not change and the same differences are present with the other kmer values if compared with the BGI comparison of assembly result on different kmer.

  • #2
    Did you get a more complete report on all the commands and options used by BGI? Its possible they did a few parameter sweeps and different preprocessing of your reads (ie error correction, quality trimming).

    Comment


    • #3
      Originally posted by Wallysb01 View Post
      Did you get a more complete report on all the commands and options used by BGI? Its possible they did a few parameter sweeps and different preprocessing of your reads (ie error correction, quality trimming).
      Hi Wally,
      in the BGI report there is just the kmers screening, no information about others parameters but in any case for my trial I used the clean reads that should be already trimmed and "cleaned" from BGI.

      Maybe an indication could be the name of the output files that is K33_L90.scafSeq.more500.fa

      Any idea?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:35 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Working...
      X