Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Vinn
    Member
    • Nov 2014
    • 21

    K-mer content failed on 5' end - advice needed

    Hi folks,

    I am trying to do adapter and low quality trimming of a fungal genome (prepared with Illumina DNA nano kit and sequenced with HiSeq 2000 100PE). After using BBduk to trim adapters and low quality reads as following

    >./bbduk.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1_q25.fastq.gz out2=R2_q25.fastq.gz ktrim=r k=21 mink=11 hdist=2 tpe tbo ref=resources/adapters.fa qtrim=rl trimq=25

    Still FASTQC showed a K-mer content warning for both R1 and R2 reads [ https://goo.gl/photos/Lsyt7YJeQnjB8HQq5 ]. Can I have your opinion how shall I handle my data? Shall I just remove the first 20 bases to be on a safe side? Or it is normal behavior for a library prepared with the nano kit?

    Thanks in advance and have a great day!
    Last edited by Vinn; 04-21-2017, 06:47 AM.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.

    Comment

    • Vinn
      Member
      • Nov 2014
      • 21

      #3
      Originally posted by GenoMax View Post
      What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.
      Hi GenoMax, thanks for your reply. I would like to do de novo assembly.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Take a look at @Brian's suggestions in this thread. I have provided a link for a specific post but take a look at the whole thread. He should be along with more later.

        Comment

        • Vinn
          Member
          • Nov 2014
          • 21

          #5
          Thank you, I will read the thread through.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Kmer-content spikiness at the beginning of the read is normal for many fragmentation methodologies and should not be removed. I'm not sure what's going on at the end, though...

            Comment

            • Vinn
              Member
              • Nov 2014
              • 21

              #7
              Thanks for your reply Brian. Just to be on a safe side, do you think it is better to trim the end off?

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                Excessive trimming reduces accuracy, and will degrade the results of any experiment. If you want to be confident that bases are genomic rather than artificial, I suggest you follow this methodology:

                1) Map the reads to the reference (if you don't have a reference, you can make a quick assembly with Tadpole) with BBMap like this:

                Code:
                bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt qhist=qhist.txt
                2) Plot mhist with R or Excel with a log-scale Y-axis to look at the positional error rates.

                If there is not an increased error rate in a region of the read, there is no reason to trim it. And conversely, it is prudent to trim if there is a high error rate at one end or the other.

                Comment

                • Vinn
                  Member
                  • Nov 2014
                  • 21

                  #9
                  Thanks so much Brian for your advice. I will try as you suggested.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 05:37 AM
                  0 responses
                  5 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  16 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  49 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  109 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...