View Single Post
Old 02-02-2017, 12:15 PM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,696

Generally, I would recommend assembling them all together, and using error-correction if necessary to deal with the introduction of errors. 20x is also a very low target for normalization prior to assembly; if you want to normalize, I typically recommend a target of 100x.

For optimal assembly, I recommend a bit of preprocessing first. Using BBMap, and starting with the raw reads (assuming these are 2x150bp Illumina data):

Code: in=r1.fq in2=r2.fq out=trimmed.fq minlen=90 ktrim=r k=23 mink=11 hdist=1 tbo tpe ref=adapters.fa maxns=0 qtrim=r trimq=10 in=trimmed.fq out=filtered.fq ref=phix174_ill.ref.fa.gz,sequencing_artifacts.fa.gz k=31 in=filtered.fq out=ecco.fq ecco mix strict adapters=default in=ecco.fq out=ecct.fq ecc

#Normalization may or may not be helpful; it depends on the dataset and assembler.
#So, I suggest assembling both with and without to see which is better. in=ecct.fq out=normalized.fq target=100 min=2
Then try assembling. If you assemble the libraries separately rather than together, you can use Dedupe to remove duplicate contigs:

Code: in=a.fa,b.fa,c.fa out=deduped.fa s=5
This will remove duplicate and contained sequences, allowing up to 5 substitutions.
Brian Bushnell is offline   Reply With Quote