SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Advice needed on De novo sequences Kmer content henriettevdz Introductions 6 08-31-2015 09:35 AM
K-mer content and adapter contamination rich22 Bioinformatics 0 10-21-2014 03:55 AM
Advice Needed on Tuxedo Suite thickrick99 RNA Sequencing 2 08-13-2014 10:22 AM
GPU vs Phi: advice needed yaximik Bioinformatics 4 09-11-2013 11:11 AM
ChIP-seq advice needed bcm Illumina/Solexa 2 03-22-2010 05:22 AM

Reply
 
Thread Tools
Old 04-21-2017, 07:39 AM   #1
Vinn
Member
 
Location: Sweden

Join Date: Nov 2014
Posts: 17
Default K-mer content failed on 5' end - advice needed

Hi folks,

I am trying to do adapter and low quality trimming of a fungal genome (prepared with Illumina DNA nano kit and sequenced with HiSeq 2000 100PE). After using BBduk to trim adapters and low quality reads as following

>./bbduk.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1_q25.fastq.gz out2=R2_q25.fastq.gz ktrim=r k=21 mink=11 hdist=2 tpe tbo ref=resources/adapters.fa qtrim=rl trimq=25

Still FASTQC showed a K-mer content warning for both R1 and R2 reads [ https://goo.gl/photos/Lsyt7YJeQnjB8HQq5 ]. Can I have your opinion how shall I handle my data? Shall I just remove the first 20 bases to be on a safe side? Or it is normal behavior for a library prepared with the nano kit?

Thanks in advance and have a great day!

Last edited by Vinn; 04-21-2017 at 07:47 AM.
Vinn is offline   Reply With Quote
Old 04-21-2017, 08:14 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,548
Default

What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.
GenoMax is offline   Reply With Quote
Old 04-21-2017, 08:17 AM   #3
Vinn
Member
 
Location: Sweden

Join Date: Nov 2014
Posts: 17
Default

Quote:
Originally Posted by GenoMax View Post
What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.
Hi GenoMax, thanks for your reply. I would like to do de novo assembly.
Vinn is offline   Reply With Quote
Old 04-21-2017, 08:43 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,548
Default

Take a look at @Brian's suggestions in this thread. I have provided a link for a specific post but take a look at the whole thread. He should be along with more later.
GenoMax is offline   Reply With Quote
Old 04-21-2017, 08:48 AM   #5
Vinn
Member
 
Location: Sweden

Join Date: Nov 2014
Posts: 17
Default

Thank you, I will read the thread through.
Vinn is offline   Reply With Quote
Old 04-24-2017, 11:16 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Kmer-content spikiness at the beginning of the read is normal for many fragmentation methodologies and should not be removed. I'm not sure what's going on at the end, though...
Brian Bushnell is offline   Reply With Quote
Old 04-25-2017, 07:48 AM   #7
Vinn
Member
 
Location: Sweden

Join Date: Nov 2014
Posts: 17
Default

Thanks for your reply Brian. Just to be on a safe side, do you think it is better to trim the end off?
Vinn is offline   Reply With Quote
Old 04-25-2017, 10:58 AM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Excessive trimming reduces accuracy, and will degrade the results of any experiment. If you want to be confident that bases are genomic rather than artificial, I suggest you follow this methodology:

1) Map the reads to the reference (if you don't have a reference, you can make a quick assembly with Tadpole) with BBMap like this:

Code:
bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt qhist=qhist.txt
2) Plot mhist with R or Excel with a log-scale Y-axis to look at the positional error rates.

If there is not an increased error rate in a region of the read, there is no reason to trim it. And conversely, it is prudent to trim if there is a high error rate at one end or the other.
Brian Bushnell is offline   Reply With Quote
Old 04-26-2017, 02:56 PM   #9
Vinn
Member
 
Location: Sweden

Join Date: Nov 2014
Posts: 17
Default

Thanks so much Brian for your advice. I will try as you suggested.
Vinn is offline   Reply With Quote
Reply

Tags
fastqc, genome assembly, illumina, quality control

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO