SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hybrid assembly with low coverage PacBio libraries dsher Bioinformatics 4 01-14-2017 08:49 AM
Tool for viral metagenome assembly with extremely low coverage? Rammaria Metagenomics 1 05-11-2015 06:06 PM
SPAdes: selecting K-mer based on read length bio_informatics Bioinformatics 8 04-20-2015 04:32 AM
SPAdes: does contig with node id has/refer coverage? bio_informatics Bioinformatics 4 03-27-2015 05:44 AM
K-mer information and minimum contig size in SPAdes Tanner_6984 Bioinformatics 0 09-25-2014 11:33 AM

Reply
 
Thread Tools
Old 02-02-2018, 12:05 AM   #1
berthenet
Junior Member
 
Location: France

Join Date: Jan 2018
Posts: 4
Default Low K-mer coverage from a SPAdes assembly

Hi everyone!

I kind of asked the question on my presentation thread, but I think this is a better place to make sure I reach people who know about my issue.


So my lab ordered sequencing for a lot of bacterial strains in order for me to do gene-by-gene approach genomics on them. The sequencing was made with NextSeq500 (Illumina), and the assembly with SPAdes. I obtain an alignment in a multi-fasta format.

Most of my assemblies look fine in terms of number of contigs (<100 contigs) once I filter out the smallest ones (<1000bp), and have a correct total size of the genome. However, for some of them the number of contigs remain really high, and when I check the length of the complete genome, I obain 3 genomes of more than 2.4Mb when I expect 1.65Mb approximately. I checked the 30 largest contigs for one of these outsider strain by doing a nblast against the NCBI database. I noticed that some of the contigs don't match the species of interest. These contigs have a low K-mer coverage (indicated in the name of the contig): around 1, against more than 200 for contigs matching the species of interest.

The cut-off between high coverage and low coverage is extremelly clear in all the samples I checked, so I was thinking simply filter out everything that is less than 1000bp and less than 50 in coverage. Do you think that is relevant ? If yes, can anyone explain to me what is in those contigs with a small coverage? What I'm getting rid of exactly? Is that contamination?

Many thanks for your help !
berthenet is offline   Reply With Quote
Old 02-02-2018, 06:57 AM   #2
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 258
Default

This looks like contamination.

You are expecting a 1.65 Mb genome.

The sum of contig length is 2.4 Mb.


When you searched the low-coverage contigs against the NCBI database, did they have any hits or no hits ?

If they did have good hits, you could align all your contigs against that hit to make a list of everything that align to that hit.
seb567 is offline   Reply With Quote
Old 02-04-2018, 11:59 PM   #3
berthenet
Junior Member
 
Location: France

Join Date: Jan 2018
Posts: 4
Default

Hi Seb,

I see your point, but not all my low coverage contigs have a hit. The ones that do don't have the exact same hit, and the hits are not very good (the largest contig matching another species matches with an identity of 90%, but on only 63% cover, the next contig has identity of 78% on only 21%)

On the 30 largest contigs which I tested, 10 were matching another species (but these were not very good matches, and not always the same species), and 9 were not matching anything. All these 19 contigs had low coverage (around 1). Contigs with high coverage (>200) were matching my species of interest with good hits.
berthenet is offline   Reply With Quote
Reply

Tags
assembly, contigs, coverage, spades

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO