Does the level of sequence duplication in Illumina reads affect the way velvet works?
I'm trying velvet to assembly some bacterial Illumina data. It's not doing so well, producing a lot of small contigs rather than fewer larger ones.
The data are 10-105 nuc reads that have been processed to remove adapters using cutadapt (hence the size range)
The nominal coverage (read length x number of reads)/genome length is reasonable (range: 21-243, median 100).
However the data all have quite a lot of sequence duplication (according to FastQC (range: 23-80; median 58)
Using Velvet Optimiser identifies a kmer between 69-71,but the output is still lots of smaller contigs.
Would reducing that duplication level help velvet? If so what software would anyone recommend (not FastX; its writer told me it wasn't designed to today's read lengths - which is why it couldn't clip adapters in my libraries)
thanks, look fwd to everyone's suggestions, and have a great weekend
m
I'm trying velvet to assembly some bacterial Illumina data. It's not doing so well, producing a lot of small contigs rather than fewer larger ones.
The data are 10-105 nuc reads that have been processed to remove adapters using cutadapt (hence the size range)
The nominal coverage (read length x number of reads)/genome length is reasonable (range: 21-243, median 100).
However the data all have quite a lot of sequence duplication (according to FastQC (range: 23-80; median 58)
Using Velvet Optimiser identifies a kmer between 69-71,but the output is still lots of smaller contigs.
Would reducing that duplication level help velvet? If so what software would anyone recommend (not FastX; its writer told me it wasn't designed to today's read lengths - which is why it couldn't clip adapters in my libraries)
thanks, look fwd to everyone's suggestions, and have a great weekend
m