Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I go about evaluating my assembly?

    I have made an assembly.

    Here are my tasks. I do not know where do go from step 1 and I do not even know how to attempt step 2 and step 3.

    1. Align assembly to reference genome.

    Grab coordinates of the set of sequences that aligns to the reference genome and grab coordinates of the set of sequences that DO NOT align to the reference genome.

    I used to MUMmer for this and got a .coords file



    2. Take sequences that did not align and map them against a given plasmids database. Differentiate between nuclear genome and plasmids. Then take what's left over and map against a virulent gene database to see what the virulent genes are.

    I was told to use BLAST for this but I have no idea what to do.

    If there are still unaligned sequences left over, then I have to use a new reference to align remaining a sequences.



    3. Gene annotation, obtain gene locations



    4. SNP calling


    Edit:

    I have Step 4 down.

  • #2
    BLAST would be online (through NCBI) for single fastas, or downloading and compiling blast on your end along with the database, and running a search of your contigs against the database.

    Curious if something like metaphlan, phytophlan or Kraken against your assemblies (and your raw reads, just to check) would tell you what you have. Of course, "clade-specific markers" and Kmer search is prone to some degree of noise.

    Comment


    • #3
      How did you make your assembly(de-novo or reference-guilded) ?
      Is this a meta project?

      Comment


      • #4
        If you have a reference (and it appears that you do), I recommend QUAST; it's quite effective!


        Even if you don't have a reference, it still tells you things like the number of predicted genes of size>=X; better assemblies tend to have more longer genes and fewer short genes.

        Also, you could try ALE (Assembly Likelihood Evaluator), which does not need a reference and estimates the correctness of an assembly from a sam file, based on statistics of variations, coverage, and insert size:
        ALE is released as open source software under the UoI/NCSA license at http://www.alescore.org. It is implemented in C and Python.


        ALE is not designed to evaluate the quality of a single assembly, but rather, the relative quality of multiple assemblies from the same set of reads. But that's still quite useful when you have several assemblies and need to pick the best one.

        EST capture is also a good method when you have EST data.

        You can also capture metrics like the percent of source reads that align to the assembly, and the rate of substitutions/insertions/deletions in those reads. The higher the mapping rate, and the lower the error count, the better the assembly is. For this you should use a normal aligner, not mummer.
        Last edited by Brian Bushnell; 01-23-2014, 10:08 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X