Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genome Survey using KmerFreq

    Hi, everyone. I am now doing genome survey using KmerFreq for two species of wild goat. There are some problems

    In data statistics, Goat A has more number of bps than Goat B. But in the log file result of KmerFreq, the kmer number of Goat A is about 1/2 of Goat B. And I find in the log file, the abnormal of kmer number is caused by the processed reads, Goat A is 1/2 of Goat B.

    I am confused.. How can Goat A with more number of bps and reads, but with much less reads being processed in KmerFreq?

    Thank you.

  • #2
    It might help if you gave some actual numbers - read lengths, platform, read count, kmer counts, and ideally kmer frequency histograms.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      It might help if you gave some actual numbers - read lengths, platform, read count, kmer counts, and ideally kmer frequency histograms.
      The sequence platform is Hiseq 2000 or 2500, I am not sure at this moment. I choose the library with largest number of bps to give those numbers.

      For goat A: reads length=101bp, reads num=393Mb. In kmerfreq, processed reads=180Mb.(90Mb for left and 90Mb for right)
      For goat B: in data statistics, reads length=101bp, reads num=366.7Mb. In kmerfreq log file, processed reads=360Mb.(180Mb for left and 180Mb for right)


      I also run FastQC for this library files of goat A.
      for both Left and Right end fq file: warning of Per base sequence content & Per sequence GC content;Fail of Per base N content--just the last 2 position of the reads has 40% N
      Right end fq file: warning of Per tile sequence quality; Fail of Kmer Content as followed:
      Sequence Count PValue Obs/Exp Max Max Obs/Exp Position
      TCGGAAT 55620 0.0 5.3323565 94


      I assume that there are some problems when KmerFreq loads reads from fq files, this may explain for the less amount of reads being processed by KmerFreq in Goat A. But I don't know what is wrong with the fq reads of Goat A

      Comment


      • #4
        1) When you say, for example, "393Mb", do you mean 393 million reads or 393 megabases?
        2) In both cases, the amount of data processed appears to be less than the total amount of data, which is strange.
        3) Before doing any sort of kmer analysis, you should adapter-trim your reads. I'm guessing that you have not, based on FastQC indicating there is an overexpressed kmer, but I'm not sure.

        Perhaps you could install this and run khist, then post both the output to the console and the histogram files, since KmerFreq seems to be dropping some of the reads:

        khist.sh in1=goatA_1.fq in2=goatA_2.fq hist=goatA_hist.txt

        khist.sh in1=goatB_1.fq in2=goatB_2.fq hist=goatB_hist.txt

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X