Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low K-mer coverage from a SPAdes assembly

    Hi everyone!

    I kind of asked the question on my presentation thread, but I think this is a better place to make sure I reach people who know about my issue.

    So my lab ordered sequencing for a lot of bacterial strains in order for me to do gene-by-gene approach genomics on them. The sequencing was made with NextSeq500 (Illumina), and the assembly with SPAdes. I obtain an alignment in a multi-fasta format.

    Most of my assemblies look fine in terms of number of contigs (<100 contigs) once I filter out the smallest ones (<1000bp), and have a correct total size of the genome. However, for some of them the number of contigs remain really high, and when I check the length of the complete genome, I obain 3 genomes of more than 2.4Mb when I expect 1.65Mb approximately. I checked the 30 largest contigs for one of these outsider strain by doing a nblast against the NCBI database. I noticed that some of the contigs don't match the species of interest. These contigs have a low K-mer coverage (indicated in the name of the contig): around 1, against more than 200 for contigs matching the species of interest.

    The cut-off between high coverage and low coverage is extremelly clear in all the samples I checked, so I was thinking simply filter out everything that is less than 1000bp and less than 50 in coverage. Do you think that is relevant ? If yes, can anyone explain to me what is in those contigs with a small coverage? What I'm getting rid of exactly? Is that contamination?

    Many thanks for your help !

  • #2
    This looks like contamination.

    You are expecting a 1.65 Mb genome.

    The sum of contig length is 2.4 Mb.

    When you searched the low-coverage contigs against the NCBI database, did they have any hits or no hits ?

    If they did have good hits, you could align all your contigs against that hit to make a list of everything that align to that hit.


    • #3
      Hi Seb,

      I see your point, but not all my low coverage contigs have a hit. The ones that do don't have the exact same hit, and the hits are not very good (the largest contig matching another species matches with an identity of 90%, but on only 63% cover, the next contig has identity of 78% on only 21%)

      On the 30 largest contigs which I tested, 10 were matching another species (but these were not very good matches, and not always the same species), and 9 were not matching anything. All these 19 contigs had low coverage (around 1). Contigs with high coverage (>200) were matching my species of interest with good hits.


      Latest Articles


      • seqadmin
        Recent Innovations in Spatial Biology
        by seqadmin

        Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

        3D Genomics
        While spatial biology often involves studying proteins and RNAs in their...
        01-01-2025, 07:30 PM
      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin

        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM





      Topics Statistics Last Post
      Started by seqadmin, 01-09-2025, 04:04 PM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 01-09-2025, 09:42 AM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 01-08-2025, 03:17 PM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 01-03-2025, 11:18 AM
      1 response
      1 like
      Last Post Tonia
      by Tonia