Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I go about evaluating my assembly?

    I have made an assembly.

    Here are my tasks. I do not know where do go from step 1 and I do not even know how to attempt step 2 and step 3.

    1. Align assembly to reference genome.

    Grab coordinates of the set of sequences that aligns to the reference genome and grab coordinates of the set of sequences that DO NOT align to the reference genome.

    I used to MUMmer for this and got a .coords file



    2. Take sequences that did not align and map them against a given plasmids database. Differentiate between nuclear genome and plasmids. Then take what's left over and map against a virulent gene database to see what the virulent genes are.

    I was told to use BLAST for this but I have no idea what to do.

    If there are still unaligned sequences left over, then I have to use a new reference to align remaining a sequences.



    3. Gene annotation, obtain gene locations



    4. SNP calling


    Edit:

    I have Step 4 down.

  • #2
    BLAST would be online (through NCBI) for single fastas, or downloading and compiling blast on your end along with the database, and running a search of your contigs against the database.

    Curious if something like metaphlan, phytophlan or Kraken against your assemblies (and your raw reads, just to check) would tell you what you have. Of course, "clade-specific markers" and Kmer search is prone to some degree of noise.

    Comment


    • #3
      How did you make your assembly(de-novo or reference-guilded) ?
      Is this a meta project?

      Comment


      • #4
        If you have a reference (and it appears that you do), I recommend QUAST; it's quite effective!


        Even if you don't have a reference, it still tells you things like the number of predicted genes of size>=X; better assemblies tend to have more longer genes and fewer short genes.

        Also, you could try ALE (Assembly Likelihood Evaluator), which does not need a reference and estimates the correctness of an assembly from a sam file, based on statistics of variations, coverage, and insert size:
        ALE is released as open source software under the UoI/NCSA license at http://www.alescore.org. It is implemented in C and Python.


        ALE is not designed to evaluate the quality of a single assembly, but rather, the relative quality of multiple assemblies from the same set of reads. But that's still quite useful when you have several assemblies and need to pick the best one.

        EST capture is also a good method when you have EST data.

        You can also capture metrics like the percent of source reads that align to the assembly, and the rate of substitutions/insertions/deletions in those reads. The higher the mapping rate, and the lower the error count, the better the assembly is. For this you should use a normal aligner, not mummer.
        Last edited by Brian Bushnell; 01-23-2014, 10:08 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X