Go Back   SEQanswers > Applications Forums > Metagenomics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina Miseq primer TM and library preparation for fungal metagenomics Cyrus Taheri Illumina/Solexa 12 03-16-2015 10:45 AM
Amplicon-seq for bacterial metagenomics studies Bioo Scientific Vendor Forum 1 10-16-2013 08:20 AM
Illumina Hi Seq 2000 for viral metagenomics morning latte Illumina/Solexa 2 07-17-2013 01:34 PM
Illumina Metagenomics data ssharma Metagenomics 12 04-05-2012 11:55 AM

Thread Tools
Old 03-19-2015, 04:55 AM   #1
Junior Member
Location: Amsterdam

Join Date: Mar 2015
Posts: 8
Default Illumina bacterial metagenomics

Hello SEQ-users,

After working for some time on 454 16S microbiota analysis, I recently moved to Illumina and metagenomics. And the problems begin...

I have this dataset from an Illumina run. It has been cleaned (QC) of low quality bases and sequences by the BGI (which performed the actual sequencing). The result is a 35Gb file of 100 bp X 2 (paired ends) sequences. I want to analyse gene pathways and taxa and compare them with other samples.

The first suggestion I had to manipulate the data was to assemble the contigs. I tried Metavelvet and Spades, but to no avail so far. If I understand correctly the error output log, they are using to much memory. I am running them on the university cluster but apparently 24 Gb of RAM is not enough... And this is just a test run, we are supposed to receive anytime soon microbiota sequences from hundreds of patients...

Is there something wrong in my usage of the programs ?

Is there a better solution than assembly for this kind of data ?

Should I just move on to a more powerful environment ? If yes, which one ?

Any piece of information would be welcome at this point
smatamoros is offline   Reply With Quote
Old 03-19-2015, 05:10 AM   #2
Senior Member
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137

MetaVelvet and SPAdes would typically need much more than 24 GB RAM, something like >= 128 GB would be appropriate.
sarvidsson is offline   Reply With Quote
Old 03-19-2015, 06:14 AM   #3
Location: florida

Join Date: Jan 2013
Posts: 67

you also can try directly blastx reads against nr database using diamond, then compare samples using Megan
yzzhang is offline   Reply With Quote
Old 03-22-2015, 01:25 PM   #4
Len Trigg
Registered Vendor
Location: New Zealand

Join Date: Jun 2011
Posts: 29

For the pathway analysis you can try RTG mapx and then feed the results to Megan. mapx is significantly faster than blastx.

If you want to look at taxonomy-aware composition analysis, perhaps RTG species or JHU kraken, although you may be pushing things RAM-wise.
Len Trigg, Ph.D.
Real Time Genomics
Len Trigg is offline   Reply With Quote
Old 03-23-2015, 12:49 AM   #5
Junior Member
Location: Amsterdam

Join Date: Mar 2015
Posts: 8

Thanks for the answers !

I got confirmation from the Spades programmers that I would need something like 200 Gb of RAM in order to assemble my file. I am trying to gain access to a bigger calculation node, and I will also cut my file in several pieces (I have the feeling the sequencing depth was too high anyway).

Thanks also for the info on the other analysis tools possible. I am going to have a closer look into those.

Cheers !
smatamoros is offline   Reply With Quote
Old 04-08-2015, 08:08 AM   #6
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98

With IDBA_UD you can set some restrictions on the "support" per node in an assembly. You can first assembly that with high values for that, then add the contigs from the previous run as "long reads", and then decrease the value, etc.
bastianwur is offline   Reply With Quote
Old 05-08-2015, 07:02 AM   #7
Junior Member
Location: Memphis, TN

Join Date: Mar 2015
Posts: 4

You could try using mothur ( I had success using it for 454 data and colleagues are using it for MiSeq 2x250 sequencing with good results.
jmhopkins is offline   Reply With Quote
Old 05-08-2015, 07:34 AM   #8
Junior Member
Location: Amsterdam

Join Date: Mar 2015
Posts: 8

Hi all !

Thanks for all your answers and suggestions. I analyzed smaller parts of my dataset and found out it was heavily contaminated with human DNA. I got rid of the human sequences using Deconseq, now the assemblers should work better.
smatamoros is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:17 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO