Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weird Kmer Content in Illumina PE data

    We are analyzing a paired-end Illumina library with 150bp reads and we are seeing weird kmer content results from FastQC.

    There are several over-represented heptamers and they appear in the same positions in the R1 and R2 samples.

    We used trimmomatic to remove adapter sequences and low quality bases, but this didn't seem to fix the issue. We went ahead with a de novo assembly using Abyss and this produced a highly fragmented assembly with more than 60 million short contigs.

    We know something is wrong, but we can't explain it.

    Are the results of our assembly because of this kmer problem? And if so, what can we do about it? And can anyone let us know what is causing it?
    Attached Files
    Last edited by rooksie; 02-26-2016, 12:58 PM.

  • #2
    See this paper :


    They focus on miseq but please check out the links to papers for other studies.
    The authors note that the "library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. ".

    They conclude ... "the library preparation method together with the choice of primers causes an extensive bias towards certain motifs causing substitutions, insertions and deletions,"

    "Every time a molecule fails to elongate properly or advances too fast, the overall signal for the cluster suffers from interference. So as the read length increases, the cluster signal can get weaker due to an accumulation of these events resulting in higher error rates towards the end of the read (16). This explains the gradual increase of errors that we observed in the position and nucleotide-specific distributions in addition to the spikes caused by the motifs. "
    Last edited by Richard Finney; 02-26-2016, 12:46 PM.

    Comment


    • #3
      I wonder what was the library prep kit, read pairs number and if the kmer plots are from raw reads or after processing. I assume sequencing was done on HiSeq 3000/4000.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 11:49 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X