Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • advice from more experienced users

    Hi
    I am new to using NGS and need advice. Let me start by thanking all of you who post replies to the forums- it takes time to share, but believe me- we newbies really appreciate it! The learning curve on this stuff is steep!

    I am trying to assemble a 5Mb bacterial genome from Illumina 40bp single reads. FastQC says that we got about 30 million reads. The base quality on all of the reads is above 30 so I do not believe that I have to trim or filter.
    I would like to try de novo assembly since the genes I am interested in are novel and likely to reside on a plasmid, hence it would be difficult to use a reference genome to make the contigs. In addition they are likely flanked by non- unique DNA. I am not computer savy, but I do have access to Velvet and have run it a few times before. I have been using Tablet to see the .afg file output.

    Here are my questions:
    1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?

    2. On a related note, until we get the recommendations back from projects like GAGE, is there a series of hints that anyone can forward or a post somewhere with hints that will help with this kind of analysis for single read data?

    3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.

    4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?

    Many thanks

  • #2
    I'm no velvet expert, but here goes

    Originally posted by salmonella View Post
    1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?
    The coverage parameters - expected coverage and coverage cutoff.

    Originally posted by salmonella View Post
    3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.
    Curtain, a related project to curtain, comes with a fairly simple program, statsContigAll which gives a reasonable set of stats, min, max, N10 .. N90 etc.

    Originally posted by salmonella View Post
    4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?
    Depends on the target genome - if you have a relatively simple genome (and i guess you have), you should get only a few large contigs. You should also get a total size in the right range (unless you have a lot of repeats).

    That said, you could really use longer paired reads - 40 bases is on the short side, and paired data is sooo much better for de-novo.

    BTW, i wouldn't rule out the need to filter / trim - most assemblers ignore quality scores entirely, so why not help them by removing the crud, even if it is a relatively small percentage of the data. Removing adapters is also strongly recommended.

    Comment


    • #3
      As has been suggested, the coverage parameters are important, particularly when you want to separate the chromosome and plasmids (plasmids tend to have much higher coverage).

      n50, max contig size, and some other information about the assembly should be readily available in the velvet log file.

      Hope this helps.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X