Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Velvet 1.2.10: why the big difference in results with -long vs -short w/ 250bp reads?

    Greetings.

    I am doing some de novo assembly of a 23 Mb genome using MiSeq paired end Illumina reads (250bp reads, 400bp insert (SD 130)). These reads, however, have been trimmed for quality and range widely in their finished size, with most at about 190bp. Assembly using -long/-longPaired vs -short/shortPaired gives surprisingly different final results. Any ideas why this is happening or which results are more reliable?

    Thanks!

    Commands:
    Code:
    velveth Genome1_71 71 -short -fastq reads_R1.trimmed.fastq.se reads_R2.trimmed.fastq.se  -shortPaired -separate -fastq reads_R1.trimmed.fastq.pe reads_R2.trimmed.fastq.pe
    velvetg Genome1_71 -exp_cov 43 -ins_length 407 -ins_length_sd 130
    
    velveth Genome1_71 71 -long -fastq reads_R1.trimmed.fastq.se reads_R2.trimmed.fastq.se  -longPaired -separate -fastq reads_R1.trimmed.fastq.pe reads_R2.trimmed.fastq.pe
    velvetg Genome1_71 -exp_cov 43 -ins_length_long 407 -ins_length_long_sd 130
    Results, short:
    Code:
    Final graph has 128642 nodes and n50 of 17324, max 332339, total 26267561, using 6042595/7501247 reads
    Results, long:
    Code:
    Final graph has 148426 nodes and n50 of 1610, max 28675, total 26984545, using 6083488/7501247 reads

  • #2
    Zerbino tells us there shouldn't be any difference, but what you've found is interesting.

    Have you tried this without the singletons and just the paired reads?

    It's interesting that your -long flag increases read utilization and subsequently affects your n50. It's breaking up your reads since you've lost fragments larger than 28675...

    Does that 28675 fragment exist in your -short assembly?

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    18 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    47 views
    0 likes
    Last Post seqadmin  
    Working...
    X