Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Oases questions

    Hello people!

    I am doing a de novo transcriptome assembly and after reading the literature I went for Oases. Even though it seemed to give very good results, I am not getting really good values for my assemblies.
    The max contig length is very low (highest is around 7000 for kmer , but after the oases merge this value always drops down to around 2500 tops.
    Also, the N50 values are very low (between 26 and 200). The kmer interval that I am using is between 17 and 65, step of 8. The cov_cutoff is of 5 and the merging kmer 27. I started with very high quality illumina reads (100 bp) and 80 bp insert length.

    Could anyone give me some tips on the assembly, and why am I getting really small contigs? Also, I am struggling with the runs, since they are taking several days do finish (7 days, 10 core 50 g per core).

    Finally, I want to re-run Oases on the Velvet runs, but change the kmers. Can I do that? For instance, after running with the python script with a kmer interval of 17-65, do another merge with the runs from the kmer interval 25 - 55.

    Many thanks in advance!!

    Susana

  • #2
    You'll have to give us more information on your starting reads. For example you left out the very critical point of just exactly how many reads you have. Also what type of contamination do you have? Does the program FastQC show anything unusual about your reads? What size contigs are you expecting? I'll agree that 2500 bases is low but it could simply be coverage problems or perhaps it is your organism.

    Comment


    • #3
      Hello!
      Sorry for the late reply.

      I understood what was happening. To confirm the parameters, I used the real length of the contigs (and not the values on the stats or log files) to calculate the avg and max contig length and N50. I got completely different values (and more or less what I was expecting to get). The N50 value is in fact ~2000, average is ~1500 and max ~28000 (this a bit longer than expected, but only a reduced number of contigs have this high length value).
      After contacting the author I realised that the lengths used to calculte these parameters by the software were measured in k-mers (not bps), which explained why they didn't match.

      Also, and according to the author, the runs were taking too long because very short k-mers may lead to too many false overlaps that overwhelm the system. I was trying to run an interval of 17 to 65 k-mer (for 100 bp reads).

      Just to let you know, FastQC showed their quality was very good, and all was filtered with scythe/sickle for the contaminants. 30M reads, more or less.

      Thanks for the interest!! I starting despairing a bit when the assemblies weren't getting no where, and when they where they were showing (apparently) really bad results :/
      all sorted out now! It was my fault not the confirm the N50 (and so) values before posting here...

      All the best!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      29 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X