Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genome Survey using KmerFreq

    Hi, everyone. I am now doing genome survey using KmerFreq for two species of wild goat. There are some problems

    In data statistics, Goat A has more number of bps than Goat B. But in the log file result of KmerFreq, the kmer number of Goat A is about 1/2 of Goat B. And I find in the log file, the abnormal of kmer number is caused by the processed reads, Goat A is 1/2 of Goat B.

    I am confused.. How can Goat A with more number of bps and reads, but with much less reads being processed in KmerFreq?

    Thank you.

  • #2
    It might help if you gave some actual numbers - read lengths, platform, read count, kmer counts, and ideally kmer frequency histograms.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      It might help if you gave some actual numbers - read lengths, platform, read count, kmer counts, and ideally kmer frequency histograms.
      The sequence platform is Hiseq 2000 or 2500, I am not sure at this moment. I choose the library with largest number of bps to give those numbers.

      For goat A: reads length=101bp, reads num=393Mb. In kmerfreq, processed reads=180Mb.(90Mb for left and 90Mb for right)
      For goat B: in data statistics, reads length=101bp, reads num=366.7Mb. In kmerfreq log file, processed reads=360Mb.(180Mb for left and 180Mb for right)


      I also run FastQC for this library files of goat A.
      for both Left and Right end fq file: warning of Per base sequence content & Per sequence GC content;Fail of Per base N content--just the last 2 position of the reads has 40% N
      Right end fq file: warning of Per tile sequence quality; Fail of Kmer Content as followed:
      Sequence Count PValue Obs/Exp Max Max Obs/Exp Position
      TCGGAAT 55620 0.0 5.3323565 94


      I assume that there are some problems when KmerFreq loads reads from fq files, this may explain for the less amount of reads being processed by KmerFreq in Goat A. But I don't know what is wrong with the fq reads of Goat A

      Comment


      • #4
        1) When you say, for example, "393Mb", do you mean 393 million reads or 393 megabases?
        2) In both cases, the amount of data processed appears to be less than the total amount of data, which is strange.
        3) Before doing any sort of kmer analysis, you should adapter-trim your reads. I'm guessing that you have not, based on FastQC indicating there is an overexpressed kmer, but I'm not sure.

        Perhaps you could install this and run khist, then post both the output to the console and the histogram files, since KmerFreq seems to be dropping some of the reads:

        khist.sh in1=goatA_1.fq in2=goatA_2.fq hist=goatA_hist.txt

        khist.sh in1=goatB_1.fq in2=goatB_2.fq hist=goatB_hist.txt

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X