Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Metagenomics w/ 454 tips?

    Hi yall,

    Quick question for you. Does anyone have tips, tricks or recommendations for metagenomic assembly and binning programs? I'm working with two 200K read (100-350bp) datasets from microbial communities that are relatively simple (predicted to have fewer than 100 taxa, with a handful of dominant organisms). What are your favorites? Any pitfalls to avoid?

    Cheers,
    Lizzy

  • #2
    assembly for 454 data

    454 data is a mess, but its the only long read technology as of today.
    Before, you try assembly be strict on your front end cleaning of you data. You must screen your reads hardcore (if you barcoded any samples) use tag cleaner to remove tags. Also, a removal of Ns and low quality scores would be helpful. You could try a de noising program if it is amplicon but I have not tried it for metas.
    Once you have removed all the homopolymers etc.
    Then forge or mira would be good start for your assembly.
    What percentage of your reads are 100 bp?
    If 50% then try abyss or velvet.

    More details would be help?

    Comment


    • #3
      you can try QIIME to process the data. http://qiime.sourceforge.net/index.html

      Comment


      • #4
        Qiime!!! Is not for metas!!!

        Not for metas!

        Comment


        • #5
          Ah yes, its not 16s metagenomics. Definitely need another cup of coffee

          Comment


          • #6
            I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

            I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.

            Comment


            • #7
              I would not try to assemble the data at all. 200k 454 reads seems very low to get any decent assembly even in very simple communities (or even in single genomes).

              200.000 reads x 250 bp read length = 50 Mb of sequence.

              50 Mb of sequence = 10x coverage of 1 genome.

              The easy way is to upload your data to the MG-RAST server (http://metagenomics.anl.gov/).

              It automatically annotates your sample to various databases and allows for comparison with a lot of public metagenomes.

              In addition to MG-RAST i've been using MEGAN and I very much like the reasoning behind the apporach. But if you do not have a reasonable computer cluster available it will take too long to BLASTX 200k reads against e.g. NCBI nr..

              rgds
              Mads

              Comment


              • #8
                Metagenomic binning?

                Originally posted by cliffbeall View Post
                I have been following the literature and it seems a new metagenome binning or taxonomy program comes out every month. It would be nice to see a comparison.

                I have used MEGAN, I think that it is one of the more used tools. It parses BLASTx results using the NCBI taxonomy, SEED, and KEGG. The BLASTx search is computationally intensive - A 275 megabase Illumina data set took about 1600 hours of computer time on our local cluster.
                Cliff, did you assemble the illumina data set with abyss or velvet first?
                BlastX has a hard time with 76 bp or 100 bp read lengths.
                Meta Velvet looks like a sexy new way to assemble short read meta data.
                MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
                The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
                BlastX takes forever!!

                Comment


                • #9
                  Originally posted by raw937 View Post
                  Cliff, did you assemble the illumina data set with abyss or velvet first?
                  BlastX has a hard time with 76 bp or 100 bp read lengths.
                  Meta Velvet looks like a sexy new way to assemble short read meta data.
                  MEGAN is a good one and is most used, it does have a HIGH false positive rate. For microbes, SOrt-items and various IMER binning programs are around. Provide is great for viral metas. However, many others. Would be interested in a program that can input both 454 and illumina data without the flowgrams from 454.
                  The other way I have thought about it is assemble then find protein orfs, then use blastp to compare various binning programs.
                  BlastX takes forever!!
                  In the example I was quoting I didn't assemble first. I have done assembly with SOAP denovo but I didn't have enough coverage except for the most abundant sequences. Fortunately I get free time on the cluster (way to go, Ohio!).

                  Comment


                  • #10
                    To add a data point, I did a quick benchmark with USEARCH. In my hands it is about 10X faster than blastx for searching Illumina reads against nr.

                    The drawbacks are that it uses more memory than blast so I had to split the database, and the results are not directly importable into MEGAN, though that should be doable with some work.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Recent Advances in Sequencing Analysis Tools
                      by seqadmin


                      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                      05-06-2024, 07:48 AM
                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 05-10-2024, 06:35 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-09-2024, 02:46 PM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-07-2024, 06:57 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-06-2024, 07:17 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X