Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ORF prediction in assembled NGS metagenomic/metatranscriptomic data

    Hi, Sorry I'm relatively new to this and am just looking for some qualified input.
    I have sequenced several metatranscriptomes using an Illumina GA system and I have assembled them using meta-velvet. Now I would like to do a reliable gene-prediction on the assembled contigs.

    Basically I would use Orphelia or FragGeneScan for this, however from the respective descriptions I gather that they have been specifically designed for the Identification of coding regions in short single-reads not in assembled data. Whereas classical ORF-prediction tools like GLIMMER suppose that you have homogenic and more-or-less complete sequence data.
    My data however is very heterogenic and my largest contigs are just over 2kb.

    Therefore, what would be the best approach for ORF-prediction in assembled metatranscripome/metagenome-data?

    Can I use Orphelia or FragGeneScan for this, or are they unreliable for datasets of highly varying sequence lengths? Do you know of any better suited tools?

    EDIT:
    I've tested FragGeneScan on my data and do get peptides of varying lengths, which is hopeful. But since its metatranscriptomic data, I can't really validate how much false positives i get or how much of the genetic potential is missed.

    Any experience or opinions on wether I should optimize my data for such gene-predictions (For example to sort the contigs into size and so seperate predictions for each contig-sizerange (e.g. <100bp, <500bp and >500bp)?
    Last edited by someperson; 07-03-2013, 07:34 AM. Reason: further detailing the question

  • #2
    I am not answering your question but wanted to mention MG-RAST, just in case http://metagenomics.anl.gov/.

    Comment


    • #3
      Thanks for the reply. I am aware of MG-RAST, which also does gene-prediction based on FragGeneScan. But here also, as far as I know, this is basically done on unassembled data (MG-RAST does not assemble, does it?).

      Also I don't want to have an overview of the functions present (as MG-RAST gives) but I want to get to the actual peptide sequences to do comparative further in-depth analyses on amino-acid sequence level.

      Comment


      • #4
        Most people use this http://www.bioinformatics.org/sms2/orf_find.html

        Comment


        • #5
          FGS works just fine on assemblies, it's what I've using on Illumina assemblies. Be-aware though that the latest version 1.16 has a bug which leads to memory allocation failures on random sets. However, some guy from the mg-rast team has fixed it. It's available here. Just remember to use the correct parameters, I believe '-complete=1 -train=complete' for predicting from contigs..
          savetherhino.org

          Comment


          • #6
            Thanks, that's good to know!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X