SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic quality trimming kga1978 Bioinformatics 26 11-24-2015 10:14 AM
Adapters trimming: Cutadapt vs Trimmomatic MafaldaSF Bioinformatics 8 03-20-2014 06:16 AM
Trimming Illumina PE sequences with Trimmomatic nicole_01 Bioinformatics 5 08-28-2013 05:12 PM
FASTQC overrepresented Kmers: Chirag Bioinformatics 1 08-23-2012 06:04 AM
With Illumina now going to 300bp, does there remain a viable role for 454? scrosby 454 Pyrosequencing 2 09-15-2011 11:53 AM

Reply
 
Thread Tools
Old 04-11-2014, 12:16 PM   #1
modi2020
Member
 
Location: New York

Join Date: May 2012
Posts: 22
Default Kmers remain even after trimmomatic trimming

Hi Guys,

I have Illumina NGS DNA 150 BP paired end reads.
My initial Fasqc report indicated the presence of Kmers towards the end of the reads at 145-147.
I used trimmomatic to trim them off. I trimmed 5 BP from the start of the read and 5 BP at the end of the read which makes my read length 140 bp (and should remove the kmers). However, when I looked at the Fastqc report post filtering, it showed that the kmers still exist but are now in the position 135-136. I have attached the pre and post filtering Fastqc reports if it helps to visualize them.


My trimmomatic trimming command was as follows:
java -Xmx15g -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 4 -phred33 -trimlog trimmlog_log.txt input_R1.fastq input_R2.fastq output_R1.fq unpaired_output1.fq output_R2.fq unpaired_output2.fq HEADCROP:5 CROP:140 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:60

I would really appreciate it if any one can guide me through this issue as I couldn't figure it out.
Attached Files
File Type: pdf post-filtering.fastq.pdf (575.4 KB, 65 views)
File Type: pdf pre-filtering.fastq.pdf (615.0 KB, 44 views)
modi2020 is offline   Reply With Quote
Old 04-11-2014, 12:28 PM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

You should do some sequence-specific adapter trimming using the ILLUMINACLIP command.
mastal is offline   Reply With Quote
Old 04-11-2014, 12:37 PM   #3
modi2020
Member
 
Location: New York

Join Date: May 2012
Posts: 22
Default

Dear Mastal,

Thank you for your prompt reply.
I honestly thought that ILLUMINACLIP command was only for Overrepresented sequences e.g Illumina adapters. I will try using it and see what I get.
Thank you
Quote:
Originally Posted by mastal View Post
You should do some sequence-specific adapter trimming using the ILLUMINACLIP command.
modi2020 is offline   Reply With Quote
Old 04-11-2014, 01:55 PM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

It is only for the Illumina adapters.

The adapters are not in the same position in each read, and they are not present in every read, only when the DNA insert is shorter than the length of one read, so that you read through into some of the adapter sequence.
mastal is offline   Reply With Quote
Old 04-11-2014, 07:46 PM   #5
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 337
Default

The pre-filter FASTQC kmer plot shows the presence of these enriched kmers are about 50-fold enriched even at base 110 of your reads.
As mentioned better try a normal adapter trimming and (depending on the purpose of teh experiments) the FASTQC reports also indicate that the reads could use some quality trimming or filtering.
luc is offline   Reply With Quote
Old 04-13-2014, 01:43 PM   #6
modi2020
Member
 
Location: New York

Join Date: May 2012
Posts: 22
Default

I have used the ILLUMINACLIP command to remove the kmers. My command was as follows:
java -Xmx15g -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 4 -phred33 -trimlog trimmlog_log.txt input_R1.fastq input_R2.fastq output_R1.fq unpaired_output1.fq output_R2.fq unpaired_output2.fq ILLUMINACLIP:adapters3.fasta:2:30:10 HEADCROP:5 CROP:140 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:60

I also attached the adapters3.fasta file in which I specified the kmers.
The program ran at first for a few minutes but its progress has stopped for six hours now even though its running on 4 cores.

Do you think that my command and adapters format correct?

Quote:
Originally Posted by mastal View Post
It is only for the Illumina adapters.

The adapters are not in the same position in each read, and they are not present in every read, only when the DNA insert is shorter than the length of one read, so that you read through into some of the adapter sequence.
Attached Files
File Type: txt adapters3.txt (404 Bytes, 40 views)
modi2020 is offline   Reply With Quote
Old 04-13-2014, 02:50 PM   #7
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Your trimmomatic command looks OK, but it looks like you are using Illumina barcodes instead of adapter sequences.

Just use the adapters.fasta file that comes with trimmomatic.

Do you know what type of Illumina kit was used for library prep?
mastal is offline   Reply With Quote
Old 04-14-2014, 02:54 PM   #8
modi2020
Member
 
Location: New York

Join Date: May 2012
Posts: 22
Default

Dear Mastal,

After following your advise I am glad to say that it worked! I have attached the output of Fastqc herein.
I have basically used the file TruSeq3-PE-2.fa as adapters sequences and it worked wonderfully. I have asked the fellows who did the library prep about the kit but haven't gotten a reply yet. If they did I will let you know. I am just glad it worked now

Again, thank you so much for your follow up with me
Quote:
Originally Posted by mastal View Post
Your trimmomatic command looks OK, but it looks like you are using Illumina barcodes instead of adapter sequences.

Just use the adapters.fasta file that comes with trimmomatic.

Do you know what type of Illumina kit was used for library prep?
Attached Files
File Type: pdf TruSeq3-PE-2 filtered.pdf (506.7 KB, 40 views)
modi2020 is offline   Reply With Quote
Old 04-14-2014, 03:02 PM   #9
modi2020
Member
 
Location: New York

Join Date: May 2012
Posts: 22
Default

Dear Luc,

Thank you for your advise. I have actually trimmed up to 110 previously but the presence of kmers in the Fastqc report persisted. When I used the adapters set supplied by trimmomatic for TruSeq3-PE-2.fa, it helped remove them completely. I have attached the result of that run on my previous reply to mastal.

Again thank you so much
Quote:
Originally Posted by luc View Post
The pre-filter FASTQC kmer plot shows the presence of these enriched kmers are about 50-fold enriched even at base 110 of your reads.
As mentioned better try a normal adapter trimming and (depending on the purpose of teh experiments) the FASTQC reports also indicate that the reads could use some quality trimming or filtering.
modi2020 is offline   Reply With Quote
Old 03-12-2015, 06:38 PM   #10
vromanr_2015
Junior Member
 
Location: Australia

Join Date: Mar 2015
Posts: 5
Default it didnt worked for me

I tried to trim with Truseq3 fasta file and even after that I got the first 9bp with kmers. I tried to crop that but after that I got overrepresented reads (which I didn't have before).

any help highly appreciated.
vromanr_2015 is offline   Reply With Quote
Old 03-12-2015, 06:40 PM   #11
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Perhaps you could post your fastqc results? The first 9 bp are unlikely to be related to adapters, but some systems (like Nextera) will have highly biased initial bases.
Brian Bushnell is offline   Reply With Quote
Old 03-12-2015, 06:48 PM   #12
vromanr_2015
Junior Member
 
Location: Australia

Join Date: Mar 2015
Posts: 5
Default It didnt worked for me

sure! sorry about that.
Attached Files
File Type: pdf After trimming.pdf (309.1 KB, 51 views)
vromanr_2015 is offline   Reply With Quote
Old 03-12-2015, 06:59 PM   #13
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You need to ask what kind of library this was. The spikiness at the beginning is similar to Nextera libraries (in which case you should not trim the first 9bp), but I'm not quite sure.

If it is a Nextera library, then it makes sense that trimming Truseq adapters will not change anything. I don't understand fastqc's relative enrichment graph, though, so I won't comment on that.

The Nextera adapter sequences are included with BBTools in the resources directory (nextera.fa.gz). But I recommend that you consult the source of the library to find out what adapters were used before simply trying that to see if it works.
Brian Bushnell is offline   Reply With Quote
Old 03-12-2015, 07:50 PM   #14
vromanr_2015
Junior Member
 
Location: Australia

Join Date: Mar 2015
Posts: 5
Default

This is RNA-seq data from wheat, Could be an adapter that I don't have?
vromanr_2015 is offline   Reply With Quote
Old 03-12-2015, 10:56 PM   #15
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Yes. The organism and data-source are generally independent of the adapter type. Although, there is an RNA-specific set of TruSeq adapters, which is also included with BBTools as truseq_rna.fa.gz.

Last edited by Brian Bushnell; 03-13-2015 at 08:50 AM.
Brian Bushnell is offline   Reply With Quote
Old 03-13-2015, 12:06 AM   #16
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

It could also be rRNA; what does the GC plot (per sequence GC content) look like?
sarvidsson is offline   Reply With Quote
Old 03-13-2015, 03:09 AM   #17
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

The problem that you describe in the first few bases is often seen with RNA-Seq data,
and is thought to be due to the random priming being not so random in practice.

This has previously been disussed in various threads, one of which is here:

http://seqanswers.com/forums/showthr...m+priming+bias

The trimmomatic adapter trimming only trims adapter sequences from the 3' ends of the reads.
mastal is offline   Reply With Quote
Old 03-16-2015, 04:47 AM   #18
vromanr_2015
Junior Member
 
Location: Australia

Join Date: Mar 2015
Posts: 5
Default

Thanks for all the answers!!! and the link
sarvidsson: The GC plot looks terrible there are 3 peaks in the reads is not a normal distribution.
vromanr_2015 is offline   Reply With Quote
Reply

Tags
fastqc, ngs, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO