![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Hello All.I have annotated my metagenomic dataset using MG-RAST and now want to gener | Gaurav_genomics | Metagenomics | 0 | 08-06-2015 02:28 AM |
Partitioning of metagenomic dataset | MikeT | Bioinformatics | 0 | 07-30-2015 05:39 AM |
toxonomic classification and abundance research in metagenomic dataset | aipolly | Bioinformatics | 0 | 10-24-2013 09:07 PM |
Best way to search for gene homologue in metagenomic dataset | Noa | Metagenomics | 2 | 10-08-2013 06:00 AM |
Assembling very large reads | k-gun12 | Bioinformatics | 2 | 03-12-2011 09:06 AM |
![]() |
|
Thread Tools |
![]() |
#1 | ||
Junior Member
Location: La Jolla Join Date: Mar 2017
Posts: 2
|
![]()
I need to assemble a large metagenomics dataset from Illumina NextSeq reads. My read depth is approximately 20 million reads per sample (28 samples) and the concatenated R1 and R2 reads are 130 GB each. I'm using 64-threads and it's still not enough.
I've been using metaspades which has been doing a great job. This is the command I ran: Quote:
Quote:
I do not want to assemble in stages because it is difficult to collapse the data into a single dataset. We thought about randomly selecting R1 and R2 reads but is there another method? This method seems interesting to do unsupervised clustering of the reads before hand but I haven't seen any application-based implementations. Last edited by jol.espinoz; 03-01-2017 at 12:16 PM. |
||
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
There are several possible approaches here. First, you can try other assemblers:
Megahit - we use this routinely for metagenome assemblies because the resource requirements (time and memory) are much lower than Spades. Disco - an overlap-based assembler designed for metagenomes, which uses a similar amount of memory to the size of the input data. Second, you can reduce the memory footprint of the data through preprocessing. This involves filtering and trimming the data, and potentially by error-correcting it and/or discarding reads with very high coverage or with too low coverage to assemble. An example is posted here; at least, the first 5 steps. For a large metagenome, I also recommend removing human reads (just prior to error-correction) as a way to reduce memory consumption. Normalization can be done like this: Code:
bbnorm.sh in1=./paired_1.fastq in2=./paired_2.fastq out=normalized.fq target=100 min=3 |
![]() |
![]() |
![]() |
Tags |
assemblers, assembly, big data, large dataset, metagenomics |
Thread Tools | |
|
|