SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mixing paired-end and single-end reads in Tophat: Do I have to reverse the SE-reads f JonB Bioinformatics 5 12-11-2015 01:08 AM
Paired-end Illumina reads preprocessing with Trimmomatic nmerienn Bioinformatics 5 06-10-2015 01:24 AM
Illumina paired end poor quality in reverse reads TKC Illumina/Solexa 8 02-25-2015 10:13 PM
Trimmomatic Paired End - Low number of surviving reads BADE Bioinformatics 17 10-29-2014 09:37 AM
should I reverse paired end reads before mapping? supermario Bioinformatics 2 03-06-2012 08:59 PM

Reply
 
Thread Tools
Old 04-18-2016, 08:14 AM   #1
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Question Trimmomatic paired end- dropped reverse reads

Hi,
I am working on a re-sequencing project and have sequenced some whole genomes using Illumina HiSeq 2000 (150 bp paired end reads), which I hope to later align to an existing reference genome. I would like to remove any possible adapter contamination with Trimmomatic, but have run into the problem that in 70- 80% of my reads the reverse read is being dropped and the forward only is surviving. When I use the "keep both reads" parameter, then both pairs survive for about 97% of reads. So my question is... does this mean that more than 70% of my reads have "adapter read through", or have I done something wrong in my adapter file?

The adapters used were:

P5 adapter: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
P7 adapter: CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT


The adapter file I created looks as follows:

>PrefixPE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>P5
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>P7
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT


An example of my script (using purposefully lenient quality control)...

java -jar ~/bin/trimmomatic.jar PE -phred33 -trimlog ten_trimLog 10_R1.gz 10_R2.gz Ten_out_1P.fq.gz Ten_out_1U.fq.gz Ten_out_2P.fq.gz Ten_out_2U.fq.gz ILLUMINACLIP:P5_P7.fa:2:30:10 LEADING:2 TRAILING:2 MAXINFO:40:0.2 MINLEN:36

... and the resulting output:

ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

Input Read Pairs: 41173498 Both Surviving: 9680654 (23.51%) Forward Only Surviving: 30355811 (73.73%) Reverse Only Surviving: 26285 (0.06%) Dropped: 1110748 (2.70%)
TrimmomaticPE: Completed successfully


I'm new to Trimmomatic, so apologize in advance if this is something obvious!
Thanks!
Meli
Meli is offline   Reply With Quote
Old 04-18-2016, 08:28 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

As a guess I think that the use of Ns in the adapter file would cause problems since the program could match anything to those bases. But I am not sure about this.
westerman is offline   Reply With Quote
Old 04-18-2016, 08:37 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

@Meli: What are you not using the adapter file provided with Trimmomatic?

Did FastQC analysis show presence of adapter contamination in your data (indicative of shorter than expected inserts)?
GenoMax is offline   Reply With Quote
Old 04-18-2016, 09:03 PM   #4
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 97
Default

I would just limit your P7 to GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. If Trimmomatic encounters this sequence it will clip the read, so no need to specify the rest, and the barcode with Ns (which might complicate things as suggested earlier).
wdecoster is offline   Reply With Quote
Old 04-26-2016, 07:21 AM   #5
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Default

Thanks all! Sorry I'm not seeing your replies until now, because I thought I'd get notifications and just assumed no one had answered. I also had my suspicions that it might be the N's, but wasn't sure if the "keep both reads" parameter forces the program to keep both reads regardless of the reason they've been cut (adapter contamination, low quality etc) or only if they are being thrown out because they are redundant to the forward read because of adapter read-through. I will try using the shortened adapter sequence as suggested... thanks!
Meli is offline   Reply With Quote
Old 04-26-2016, 07:33 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

keep both reads refers to adapter read through. In the earlier versions of trimmomatic the 'keep both reads' option didn't exist, and the default behaviour was to drop R2 entirely when adapters were trimmed because of read through, the reasoning being that in those cases R2 did not provide any additional information since the insert for that read pair was shorter than the length of one read. hope this helps.
mastal is offline   Reply With Quote
Old 04-26-2016, 07:53 AM   #7
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Default

Quote:
Originally Posted by mastal View Post
keep both reads refers to adapter read through. In the earlier versions of trimmomatic the 'keep both reads' option didn't exist, and the default behaviour was to drop R2 entirely when adapters were trimmed because of read through, the reasoning being that in those cases R2 did not provide any additional information since the insert for that read pair was shorter than the length of one read. hope this helps.
OK, that is also what I understood from the manual... but in that case, if 70% of my reverse reads are rescued when using this option, that must mean that the reason they were dropped in the first place was because of read through and NOT because of the N's in my adapter sequence if I understand correctly?
Meli is offline   Reply With Quote
Old 04-29-2016, 05:35 AM   #8
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Default

Quote:
Originally Posted by wdecoster View Post
I would just limit your P7 to GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. If Trimmomatic encounters this sequence it will clip the read, so no need to specify the rest, and the barcode with Ns (which might complicate things as suggested earlier).

Using the truncated adapter sequence just led to even more reads being dropped:

TrimmomaticPE: Started with arguments: -phred33 -trimlog five_trunc_Log R1_zcat.gz R2_zcat.gz 5_trunc_1P.fq.gz 5_trunc_1U.fq.gz 5_trunc_2P.fq.gz 5_trunc_2U.fq.gz ILLUMINACLIP:P5_P7_trunc.fa:2:30:10 LEADING:2 TRAILING:2 MAXINFO:40:0.2 MINLEN:36
Multiple cores found: Using 16 threads
Using PrefixPair: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 130953595 Both Surviving: 21891860 (16.72%) Forward Only Surviving: 105313891 (80.42%) Reverse Only Surviving: 5086 (0.00%) Dropped: 3742758 (2.86%)
TrimmomaticPE: Completed successfully
Meli is offline   Reply With Quote
Old 04-29-2016, 07:13 AM   #9
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Do one run of trimmomatic using only the Illuminaclip trimming, and then you will know how many of the reads are being dropped because of adapters and not because of other quality issues.
mastal is offline   Reply With Quote
Old 05-03-2016, 01:05 AM   #10
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Default

Quote:
Originally Posted by mastal View Post
Do one run of trimmomatic using only the Illuminaclip trimming, and then you will know how many of the reads are being dropped because of adapters and not because of other quality issues.
I still got 83% of forward only surviving when I ran just the IlluminaClip
Meli is offline   Reply With Quote
Old 05-03-2016, 02:23 AM   #11
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

OK, so it looks like a large percentage of your reads have adapter read-through.
mastal is offline   Reply With Quote
Old 05-09-2016, 03:17 AM   #12
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Default

Quote:
Originally Posted by wdecoster View Post
I would just limit your P7 to GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. If Trimmomatic encounters this sequence it will clip the read, so no need to specify the rest, and the barcode with Ns (which might complicate things as suggested earlier).
Using the OTHER half of the adapter saved the reverse reads, but I'm not sure if it is at all correct to do so...
Meli is offline   Reply With Quote
Old 05-11-2016, 03:28 AM   #13
Meli
Junior Member
 
Location: Lund, Sweden

Join Date: Apr 2016
Posts: 7
Default

Not sure if this thread can be of use to anyone else, but just in case...

I finally figured out what was wrong with my adapter file by running the "identify adapters" tool in AdapterRemoval:

https://github.com/MikkelSchubert/ad...terRemoval.pod

Found that the adapters in my sequence were the reverse complement of what I'd been provided and also that they were on the opposite read (fwd<-->rev). Now trimmomatic seems to be running smoothly and isn't throwing away the reverse reads.

Thanks for all your help!
Meli is offline   Reply With Quote
Old 05-11-2016, 08:10 AM   #14
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Yes, the fasta adapter sequences in trimmomatic are designed to work that way for paired-end mode.
mastal is offline   Reply With Quote
Reply

Tags
adapter trimming, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO