Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kmer Distribution Problem

    Hi there,

    I have sequenced four fungi strains by illumina Hiseq. Three of them were assembled well expect one strain. I made a kmer distribution of the reads by jellyfish, and I found that there is no peak on the curve. The total amount of data is more than 50X. What's problem could cause this result? Does anyone can give a suggestion? Thanks!
    Click image for larger version

Name:	A5_GCCAAT_L004_R1_001.fastq.hist.png
Views:	1
Size:	7.5 KB
ID:	307815

  • #2
    Originally posted by cyyuan View Post
    Hi there,

    I have sequenced four fungi strains by illumina Hiseq. Three of them were assembled well expect one strain. I made a kmer distribution of the reads by jellyfish, and I found that there is no peak on the curve. The total amount of data is more than 50X. What's problem could cause this result? Does anyone can give a suggestion? Thanks!
    [ATTACH]1406[/ATTACH]
    It looks like you have a little bit of a peak at about 15. But have you done any quality trimming? If so, what kind? You might want to play around with different quality cut offs, or simply taking off the last X-bps, or some combination of both. Usually the high numbers of unique or low occurrence kmers is simply a product of sequencing errors.

    The other option is heterozygosity/ploidy. So, if you're sequencing a very diverse set of individuals, you'll have lower occurrence kmers, in general. Of course, depending on what you're sequencing, you might not be able to get around this. But usually people try to sequence one individual, or a clonal set of individuals in order to create their reference genome.

    Comment


    • #3
      Thanks for you reply!!

      Originally posted by Wallysb01 View Post
      It looks like you have a little bit of a peak at about 15. But have you done any quality trimming? If so, what kind? You might want to play around with different quality cut offs, or simply taking off the last X-bps, or some combination of both. Usually the high numbers of unique or low occurrence kmers is simply a product of sequencing errors.
      This is the original data, I haven't made any quality trimming on it. And the reads quailty is similar to the other three strains.

      The other option is heterozygosity/ploidy. So, if you're sequencing a very diverse set of individuals, you'll have lower occurrence kmers, in general. Of course, depending on what you're sequencing, you might not be able to get around this. But usually people try to sequence one individual, or a clonal set of individuals in order to create their reference genome.
      We always extract DNA from a single colony, but I am not sure whether it is heterozygosity. I will check it later. Is it possible it is caused by the sequencing library, which is not well built?

      Comment


      • #4
        Its hard to know without more information, though its interesting that the other libraries are not producing this same thing while you seem confident the quality is similar between them.

        I can only guess that something less than ideal might have happened during the illumina library prep or during the run itself, which is not at all uncommon, and you are getting some strange bias that won't be shown in the quality scores. So, you might look at the nucleotide distribution across the length of the read. If you see things bouncing around in places, you should trim off those bases.

        I might be able to help more if you can give me information about each illumina run (i.e., did you barcode, what went into each lane), and some basic quality stats. I know absolutely nothing about any fungus specific issues, however, so if the problem is related to that, you'll have to hope someone else stops by.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X