Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina bacterial metagenomics

    Hello SEQ-users,

    After working for some time on 454 16S microbiota analysis, I recently moved to Illumina and metagenomics. And the problems begin...

    I have this dataset from an Illumina run. It has been cleaned (QC) of low quality bases and sequences by the BGI (which performed the actual sequencing). The result is a 35Gb file of 100 bp X 2 (paired ends) sequences. I want to analyse gene pathways and taxa and compare them with other samples.

    The first suggestion I had to manipulate the data was to assemble the contigs. I tried Metavelvet and Spades, but to no avail so far. If I understand correctly the error output log, they are using to much memory. I am running them on the university cluster but apparently 24 Gb of RAM is not enough... And this is just a test run, we are supposed to receive anytime soon microbiota sequences from hundreds of patients...

    Is there something wrong in my usage of the programs ?

    Is there a better solution than assembly for this kind of data ?

    Should I just move on to a more powerful environment ? If yes, which one ?

    Any piece of information would be welcome at this point

  • #2
    MetaVelvet and SPAdes would typically need much more than 24 GB RAM, something like >= 128 GB would be appropriate.

    Comment


    • #3
      you also can try directly blastx reads against nr database using diamond, then compare samples using Megan

      Comment


      • #4
        For the pathway analysis you can try RTG mapx and then feed the results to Megan. mapx is significantly faster than blastx.

        If you want to look at taxonomy-aware composition analysis, perhaps RTG species or JHU kraken, although you may be pushing things RAM-wise.
        Len Trigg, Ph.D.
        Real Time Genomics
        www.realtimegenomics.com

        Comment


        • #5
          Thanks for the answers !

          I got confirmation from the Spades programmers that I would need something like 200 Gb of RAM in order to assemble my file. I am trying to gain access to a bigger calculation node, and I will also cut my file in several pieces (I have the feeling the sequencing depth was too high anyway).

          Thanks also for the info on the other analysis tools possible. I am going to have a closer look into those.

          Cheers !

          Comment


          • #6
            With IDBA_UD you can set some restrictions on the "support" per node in an assembly. You can first assembly that with high values for that, then add the contigs from the previous run as "long reads", and then decrease the value, etc.

            Comment


            • #7
              You could try using mothur (http://www.mothur.org/wiki/MiSeq_SOP). I had success using it for 454 data and colleagues are using it for MiSeq 2x250 sequencing with good results.

              Comment


              • #8
                Hi all !

                Thanks for all your answers and suggestions. I analyzed smaller parts of my dataset and found out it was heavily contaminated with human DNA. I got rid of the human sequences using Deconseq, now the assemblers should work better.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Working...
                X