![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Hybrid assembly with low coverage PacBio libraries | dsher | Bioinformatics | 4 | 01-14-2017 08:49 AM |
Tool for viral metagenome assembly with extremely low coverage? | Rammaria | Metagenomics | 1 | 05-11-2015 06:06 PM |
SPAdes: selecting K-mer based on read length | bio_informatics | Bioinformatics | 8 | 04-20-2015 04:32 AM |
SPAdes: does contig with node id has/refer coverage? | bio_informatics | Bioinformatics | 4 | 03-27-2015 05:44 AM |
K-mer information and minimum contig size in SPAdes | Tanner_6984 | Bioinformatics | 0 | 09-25-2014 11:33 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: France Join Date: Jan 2018
Posts: 4
|
![]()
Hi everyone!
I kind of asked the question on my presentation thread, but I think this is a better place to make sure I reach people who know about my issue. So my lab ordered sequencing for a lot of bacterial strains in order for me to do gene-by-gene approach genomics on them. The sequencing was made with NextSeq500 (Illumina), and the assembly with SPAdes. I obtain an alignment in a multi-fasta format. Most of my assemblies look fine in terms of number of contigs (<100 contigs) once I filter out the smallest ones (<1000bp), and have a correct total size of the genome. However, for some of them the number of contigs remain really high, and when I check the length of the complete genome, I obain 3 genomes of more than 2.4Mb when I expect 1.65Mb approximately. I checked the 30 largest contigs for one of these outsider strain by doing a nblast against the NCBI database. I noticed that some of the contigs don't match the species of interest. These contigs have a low K-mer coverage (indicated in the name of the contig): around 1, against more than 200 for contigs matching the species of interest. The cut-off between high coverage and low coverage is extremelly clear in all the samples I checked, so I was thinking simply filter out everything that is less than 1000bp and less than 50 in coverage. Do you think that is relevant ? If yes, can anyone explain to me what is in those contigs with a small coverage? What I'm getting rid of exactly? Is that contamination? Many thanks for your help ! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Québec, Canada Join Date: Jul 2008
Posts: 260
|
![]()
This looks like contamination.
You are expecting a 1.65 Mb genome. The sum of contig length is 2.4 Mb. When you searched the low-coverage contigs against the NCBI database, did they have any hits or no hits ? If they did have good hits, you could align all your contigs against that hit to make a list of everything that align to that hit. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: France Join Date: Jan 2018
Posts: 4
|
![]()
Hi Seb,
I see your point, but not all my low coverage contigs have a hit. The ones that do don't have the exact same hit, and the hits are not very good (the largest contig matching another species matches with an identity of 90%, but on only 63% cover, the next contig has identity of 78% on only 21%) On the 30 largest contigs which I tested, 10 were matching another species (but these were not very good matches, and not always the same species), and 9 were not matching anything. All these 19 contigs had low coverage (around 1). Contigs with high coverage (>200) were matching my species of interest with good hits. |
![]() |
![]() |
![]() |
Tags |
assembly, contigs, coverage, spades |
Thread Tools | |
|
|