Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to use sam files in MEGAN metagenomics

    hello everybody..

    I have two environmental bacteria data sequenced on Illumina for metagenomics (approx 14 million paired-end reads for one dataset and 16 million for the other. ~70 bp read length). Since I knew that the sequences consists of only bacteria, I've downloaded the all bacteria sequences from NCBI ( 14 GB file size) instead of downloading nr/nt database and started standalone blast as suggested in MEGAN manual. It continuously ran for 9 constant days and then i had to stop the process, since the blast result file size was more than 45 GB. I know this is not a memory issue. Then I did the alignment with bowtie (bowtie-0.12.7) and it gave me the sam alignment file (7 GB and 12 percent of the reads got aligned to the reference). I also downloaded GI to NCBI taxon id file from megan website ( the bin file). Now I uploaded both the files ( sam and bin) file as exactly mentioned in the manual and it gives me no result, somehow.

    Can you please help me as to what I did wrong..

    I appreciate your help

    Christopher

  • #2
    Hi Chris,

    BLAST using Illumina reads is not recommended due to extreme computational challenges. Before getting into your experiment design, can you share what you had intended to achieve for your sequencing project?

    Best regards,
    Douglas

    Comment


    • #3
      Perhaps run something like Qiime first. It will do 16S identification and will reduce the size of your dataset (as a fasta file) so you can run it in MEGAN. I assume you're using MEGAN for functional analysis?

      Comment


      • #4
        MetaPhlAn may be a right tool for this.

        Best regards,
        Douglas

        Comment


        • #5
          thanks for replies..

          well, i want to have a complete metagenomics analysis as to how many and what species are in the sample and phylogeny too.. is this what this program let me do it..

          chris

          Comment


          • #6
            MetaPhlAn can do that.

            Best regards,
            Douglas

            Comment


            • #7
              I suspect MEGAN might not be able to parse the taxa id from your alignment results because the format is slightly different in the database you're using. You might be able to tweak it to get it working.

              Blastx against nr might be doable if you have access to a cluster - I blasted an Illumina dataset about the size of yours, just chopping it into little pieces and farming it out to separate nodes. I had to buy more memory to run MEGAN on it, though.

              Comment


              • #8
                thank you all for your replies


                I used metaphlan with the marker db that is provided by them and very happy with the results, but if I want to map against the database that Ive downloaded from NCBI, is it possible? because as far as I have understood is that database comprises of ~2800 genome markers and in this case there are chances that we might be losing on information on genomes which are currently not present in that list. I'm sorry if I am completely wrong, I'm novice and trying to understand it

                christopher

                Comment


                • #9
                  Hi Chris,

                  Please read the paper on MetaPhlAn. The authors screened for representative genes in each family/class. If you use a general database, I am not sure if the results are useful or not. I recommend you contact the author(s) to discuss.

                  Best regards,
                  Douglas

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X