Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • advice from more experienced users

    Hi
    I am new to using NGS and need advice. Let me start by thanking all of you who post replies to the forums- it takes time to share, but believe me- we newbies really appreciate it! The learning curve on this stuff is steep!

    I am trying to assemble a 5Mb bacterial genome from Illumina 40bp single reads. FastQC says that we got about 30 million reads. The base quality on all of the reads is above 30 so I do not believe that I have to trim or filter.
    I would like to try de novo assembly since the genes I am interested in are novel and likely to reside on a plasmid, hence it would be difficult to use a reference genome to make the contigs. In addition they are likely flanked by non- unique DNA. I am not computer savy, but I do have access to Velvet and have run it a few times before. I have been using Tablet to see the .afg file output.

    Here are my questions:
    1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?

    2. On a related note, until we get the recommendations back from projects like GAGE, is there a series of hints that anyone can forward or a post somewhere with hints that will help with this kind of analysis for single read data?

    3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.

    4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?

    Many thanks

  • #2
    I'm no velvet expert, but here goes

    Originally posted by salmonella View Post
    1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?
    The coverage parameters - expected coverage and coverage cutoff.

    Originally posted by salmonella View Post
    3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.
    Curtain, a related project to curtain, comes with a fairly simple program, statsContigAll which gives a reasonable set of stats, min, max, N10 .. N90 etc.

    Originally posted by salmonella View Post
    4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?
    Depends on the target genome - if you have a relatively simple genome (and i guess you have), you should get only a few large contigs. You should also get a total size in the right range (unless you have a lot of repeats).

    That said, you could really use longer paired reads - 40 bases is on the short side, and paired data is sooo much better for de-novo.

    BTW, i wouldn't rule out the need to filter / trim - most assemblers ignore quality scores entirely, so why not help them by removing the crud, even if it is a relatively small percentage of the data. Removing adapters is also strongly recommended.

    Comment


    • #3
      As has been suggested, the coverage parameters are important, particularly when you want to separate the chromosome and plasmids (plasmids tend to have much higher coverage).

      n50, max contig size, and some other information about the assembly should be readily available in the velvet log file.

      Hope this helps.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin


        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      39 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X