Dear all,
I am working on de novo assembly of bacterial genomes. Genomes were sequenced with Miseq 2x250 libs with an average coverage ~40X. Adapters were trimmed with cutadapt by the seq provider, then I performed quality trimming with BBduk and now de novo assembly with SPAdes 3.9.
I ran some genomes with and without the cov-cutoff <auto> option and I've been comparing both assemblies.
I've seen that for some genomes, the cov-cutoff option in auto mode still keeps some contigs (actually, I am looking at scaffolds) with low coverage (~1) whereas for other genomes the minimum is much higher (~14).
Genomes will be compared using various tools, including ANI calculation. I'd rather not introduce bias by throwing away too much data from some genomes and not from others, or the contrary (keeping "junk" in some and not in other genomes).
I BLASTed some of the low cov contigs to see if these are contaminations and I cannot tell if they are, because I they align to various sequences in the genus I am working on.
So the question is basically:
1. How SPAdes chooses the minimum coverage using the cov-cutoff <auto> option?
2. Is there and what is the minimum contig coverage people suggest to keep/discard contigs?
Thanks!
I am working on de novo assembly of bacterial genomes. Genomes were sequenced with Miseq 2x250 libs with an average coverage ~40X. Adapters were trimmed with cutadapt by the seq provider, then I performed quality trimming with BBduk and now de novo assembly with SPAdes 3.9.
I ran some genomes with and without the cov-cutoff <auto> option and I've been comparing both assemblies.
I've seen that for some genomes, the cov-cutoff option in auto mode still keeps some contigs (actually, I am looking at scaffolds) with low coverage (~1) whereas for other genomes the minimum is much higher (~14).
Genomes will be compared using various tools, including ANI calculation. I'd rather not introduce bias by throwing away too much data from some genomes and not from others, or the contrary (keeping "junk" in some and not in other genomes).
I BLASTed some of the low cov contigs to see if these are contaminations and I cannot tell if they are, because I they align to various sequences in the genus I am working on.
So the question is basically:
1. How SPAdes chooses the minimum coverage using the cov-cutoff <auto> option?
2. Is there and what is the minimum contig coverage people suggest to keep/discard contigs?
Thanks!
Comment