SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
K-mer information and minimum contig size in SPAdes Tanner_6984 Bioinformatics 0 09-25-2014 12:33 PM
Minimum coverage info on Miseq ramujana Illumina/Solexa 3 09-18-2014 08:21 AM
Minimum coverage requirement for 50Mbp alignment? newkid Bioinformatics 6 07-25-2013 07:33 AM
Minimum contig & Coverage cement_head 454 Pyrosequencing 2 08-26-2012 04:27 AM
What is the minimum coverage needed? Annettet Illumina/Solexa 0 03-22-2011 03:41 AM

Reply
 
Thread Tools
Old 09-10-2016, 12:54 AM   #1
sebl
Member
 
Location: Israel

Join Date: Mar 2014
Posts: 26
Default Minimum contig coverage to keep

Dear all,

I am working on de novo assembly of bacterial genomes. Genomes were sequenced with Miseq 2x250 libs with an average coverage ~40X. Adapters were trimmed with cutadapt by the seq provider, then I performed quality trimming with BBduk and now de novo assembly with SPAdes 3.9.

I ran some genomes with and without the cov-cutoff <auto> option and I've been comparing both assemblies.

I've seen that for some genomes, the cov-cutoff option in auto mode still keeps some contigs (actually, I am looking at scaffolds) with low coverage (~1) whereas for other genomes the minimum is much higher (~14).

Genomes will be compared using various tools, including ANI calculation. I'd rather not introduce bias by throwing away too much data from some genomes and not from others, or the contrary (keeping "junk" in some and not in other genomes).

I BLASTed some of the low cov contigs to see if these are contaminations and I cannot tell if they are, because I they align to various sequences in the genus I am working on.

So the question is basically:

1. How SPAdes chooses the minimum coverage using the cov-cutoff <auto> option?

2. Is there and what is the minimum contig coverage people suggest to keep/discard contigs?


Thanks!
sebl is offline   Reply With Quote
Old 09-10-2016, 10:24 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Are these single-cell libraries or isolate libraries? With isolate libraries there is no reason to expect highly variable coverage and contigs with much lower coverage than normal are most likely contaminants (or "junk"). If they align to the same genus then they are less likely to be contamination and more likely to be ... well, some other variety of thing that you don't want, like assembly from chimeric reads or some other artifact. With single-cell it's much more difficult to determine.
Brian Bushnell is offline   Reply With Quote
Old 09-10-2016, 10:44 PM   #3
sebl
Member
 
Location: Israel

Join Date: Mar 2014
Posts: 26
Default

These are isolate libraries.

When I BLAST short contigs (<200bp) with high coverage (10'sx), these are often parts of ribosomal genes. When I BLAST contigs of length 200-~500bp, where most of low coverage contigs are, these are often pieces of different species in the genus or so isolates not identified to species level.

So the question is then what is "coverage much lower than normal"? Should I stick to a standard cut-off for all genomes? What this cut-off usually is? Or the "auto" mode in SPAdes should be enough?
sebl is offline   Reply With Quote
Old 09-10-2016, 11:36 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

"coverage much lower than normal" is on a per-library basis; you can't have a single cutoff value that's always appropriate. That said, while it's difficult to make strict rules... if a contig has coverage under 25% of the median coverage, in an isolate, it's probably best to discard it. Bear in mind, though, that bacterial isolates can have differential coverage by up to ~50% biased toward the origin of replication if you gathered the DNA during exponential growth phase.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
bacteria genome, coverage, de novo assemby, quality check, spades

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO