Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • K-mer content failed on 5' end - advice needed

    Hi folks,

    I am trying to do adapter and low quality trimming of a fungal genome (prepared with Illumina DNA nano kit and sequenced with HiSeq 2000 100PE). After using BBduk to trim adapters and low quality reads as following

    >./bbduk.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1_q25.fastq.gz out2=R2_q25.fastq.gz ktrim=r k=21 mink=11 hdist=2 tpe tbo ref=resources/adapters.fa qtrim=rl trimq=25

    Still FASTQC showed a K-mer content warning for both R1 and R2 reads [ https://goo.gl/photos/Lsyt7YJeQnjB8HQq5 ]. Can I have your opinion how shall I handle my data? Shall I just remove the first 20 bases to be on a safe side? Or it is normal behavior for a library prepared with the nano kit?

    Thanks in advance and have a great day!
    Last edited by Vinn; 04-21-2017, 06:47 AM.

  • #2
    What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.

    Comment


    • #3
      Originally posted by GenoMax View Post
      What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.
      Hi GenoMax, thanks for your reply. I would like to do de novo assembly.

      Comment


      • #4
        Take a look at @Brian's suggestions in this thread. I have provided a link for a specific post but take a look at the whole thread. He should be along with more later.

        Comment


        • #5
          Thank you, I will read the thread through.

          Comment


          • #6
            Kmer-content spikiness at the beginning of the read is normal for many fragmentation methodologies and should not be removed. I'm not sure what's going on at the end, though...

            Comment


            • #7
              Thanks for your reply Brian. Just to be on a safe side, do you think it is better to trim the end off?

              Comment


              • #8
                Excessive trimming reduces accuracy, and will degrade the results of any experiment. If you want to be confident that bases are genomic rather than artificial, I suggest you follow this methodology:

                1) Map the reads to the reference (if you don't have a reference, you can make a quick assembly with Tadpole) with BBMap like this:

                Code:
                bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt qhist=qhist.txt
                2) Plot mhist with R or Excel with a log-scale Y-axis to look at the positional error rates.

                If there is not an increased error rate in a region of the read, there is no reason to trim it. And conversely, it is prudent to trim if there is a high error rate at one end or the other.

                Comment


                • #9
                  Thanks so much Brian for your advice. I will try as you suggested.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  71 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  80 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X