View Single Post
Old 04-24-2017, 12:13 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The flags/documentation for BBNorm might be a little misleading in this case... BBNorm acts in 3 phases per pass:

1) Prefilter construction (which finished). This uses all reads.
2) Main hashtable construction (which finished). This uses all reads.
3) Normalization (which started but did not finish). This uses the number of reads you specify with the "reads" flag.

So, the reason it took so long was because it was using all reads in phases 1 and 2 (otherwise, the depths would be incorrect).

To reduce the number of reads from the first two phases, you would need to specify "tablereads"; e.g. "reads=10000000 tablereads=10000000". Or most simply just pull the first 10m reads first:

Code:
reformat.sh in=reads.fq out=10m.fq reads=10m
bbnorm.sh prefilter=t passes=1 target=50 in=10m.fq out=normalized.fq
For binning, you might try MetaBAT. Binning tools are generally better for assemblies, so that tetramer frequencies become more accurate. But, they can work with reads as well, particularly when you have multiple samples of the same metagenome (preferably from slightly different locations / conditions / times), as they can use read depth covariance to assist in binning. But I don't know how well it will work or how long it will take. Also, the longer the reads are, the better; so if the reads mostly overlap, it's prudent to merge them first.
Brian Bushnell is offline   Reply With Quote