Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • some trouble with SOAPdenovo, my assembly and the BGI output

    Hi there,
    this is my first post in SEQanswers but I looked the community from long time.
    I started these last 6 month to work with metagenomics using the sequencing service and analysis of BGI of some samples (formally I'm a biochemist with good computer skills) .

    My problem is that to learn about the assembly strategies i tried to reproduce the outpout of BGI on my first sample using SOAPdenovo with the k-mer values that are indicated in the final report to check the best N50 and N90 score, the best length etc.

    So I prepared my SOAP_config_file with the illumina_clean_reads:

    ---------
    #maximal read length
    max_rd_len=90
    [LIB]
    #average insert size
    avg_ins=170
    #if sequence needs to be reversed
    reverse_seq=0
    #in which part(s) the reads are used
    asm_flags=3
    #use only first 100 bps of each read
    rd_len_cutoff=90
    #in which order the reads are used while scaffolding
    rank=1
    # cutoff of pair number for a reliable connection (at least 3 for short insert size)
    pair_num_cutoff=3
    #minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
    map_len=32
    #a pair of fastq file, read 1 file should always be followed by read 2 file
    q1=clean_read_1.fq
    q2=clean_read_2.fq
    ---------

    BGI (testing 21<kmer<63) report that for this sample the best was kmer=33 with
    sequence n°: 1543
    total length: 6169928
    max length: 126110
    min length: 500
    N50: 15863
    N90: 1181
    Then I run SOAPdenovo-63mer with 21<kmer<63.

    My statistics however for the same samples with kmer=33 showed:
    sequence n°: 188926
    total length: 16591907
    max length: 15246
    min length: 34
    N50: 74
    N90: 42
    Then I applied a filter to remove the sequences with less than 500bp and the result was:

    sequence n°: 2529
    total length: 3094882
    max length: 15246
    min length: 500
    N50: 1384
    N90: 584
    Somebody know why these results are so different?

    I tried to run SOAPdenovo both in step-by-step mode and in single command but the result does not change and the same differences are present with the other kmer values if compared with the BGI comparison of assembly result on different kmer.

  • #2
    Did you get a more complete report on all the commands and options used by BGI? Its possible they did a few parameter sweeps and different preprocessing of your reads (ie error correction, quality trimming).

    Comment


    • #3
      Originally posted by Wallysb01 View Post
      Did you get a more complete report on all the commands and options used by BGI? Its possible they did a few parameter sweeps and different preprocessing of your reads (ie error correction, quality trimming).
      Hi Wally,
      in the BGI report there is just the kmers screening, no information about others parameters but in any case for my trial I used the clean reads that should be already trimmed and "cleaned" from BGI.

      Maybe an indication could be the name of the output files that is K33_L90.scafSeq.more500.fa

      Any idea?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X