SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
trim adapter from Illumina Genome Analyzer IIe miRNA reads NicoBxl Bioinformatics 5 01-02-2014 06:31 AM
Checking the Quality of RRBS libraries before actually running them twang11 Sample Prep / Library Generation 0 02-22-2012 05:18 PM
trim 3' adapter sequence for mRNA-Seq? slny Bioinformatics 14 06-14-2011 07:15 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 03:27 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM

Reply
 
Thread Tools
Old 04-20-2012, 07:08 AM   #1
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default Quality-, adapter- and RRBS-trimming with Trim Galore!

We have just updated Trim Galore! so that is now has built-in paired-end functionality which means that trimmed files don't need to be validated separately any more, as well as a few more things that make it more convenient to use.

As there hasn't been any dedicated thread yet, here are a few more details: Trim Galore! is a wrapper script to automate quality and adapter trimming, with some added functionality to remove biased methylation positions for MspI digested RRBS sequence files (for directional, non-directional (or paired-end) sequencing). It's main features are:

- For adapter trimming, Trim Galore! uses the first 13 bp of Illumina standard adapters ('AGATCGGAAGAGC') by default (suitable for both ends of paired-end libraries), but accepts other adapter sequence, too
- For MspI-digested RRBS libraries, Trim Galore! performs quality and adapter trimming in two subsequent steps. This allows it to remove 2 additional bases that contain a cytosine which was artificially introduced in the end-repair step during the library preparation
- For any kind of FastQ file other than MspI-digested RRBS, Trim Galore! can perform single-pass adapter- and quality trimming
- The Phred quality of basecalls and the stringency for adapter removal can be specified individually
- Trim Galore! can remove sequences if they become too short during the trimming process. For paired-end files Trim Galore! removes entire sequence pairs if one (or both) of the two reads became shorter than the set length cutoff. Reads of a read-pair that are longer than a given threshold but for which the partner read has become too short can optionally be written out to single-end files. This ensures that the information of a read pair is not lost entirely if only one read is of good quality
- Trim Galore! can trim paired-end files by 1 additional bp from the 3' end of all reads to avoid problems with invalid alignments with Bowtie 1
- Trim Galore! accepts and produces standard or gzip compressed FastQ files
- FastQC can be run on the resulting output files once trimming has completed (optional)

Trim Galore! and its User Guide can be found here: http://www.bioinformatics.babraham.a...s/trim_galore/
fkrueger is offline   Reply With Quote
Old 07-27-2012, 08:12 AM   #2
gwilkie
Member
 
Location: Glasgow

Join Date: Dec 2011
Posts: 27
Default

Thanks so much for writing this incredibly useful program

I was struggling with the problem of trimming and removing low quality reads whilst maintaining read pairs - trim galore makes this straightforward and fast!
gwilkie is offline   Reply With Quote
Old 07-27-2012, 11:44 AM   #3
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

I am glad that you found it useful!
fkrueger is offline   Reply With Quote
Old 01-17-2013, 04:56 AM   #4
blanco
Member
 
Location: Iceland

Join Date: Apr 2012
Posts: 28
Default

Hi, I am wondering about the -a and -a2 parameters for paired end reads

For example my libraries were made using the Illumina TruSeq adapters, one of which is:
(5')-adapter-(3')=GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

Should -a2 be the reverse of this or should I not use the a2 option at all?

Thanks,
blanco
blanco is offline   Reply With Quote
Old 01-17-2013, 05:11 AM   #5
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

Quote:
Originally Posted by blanco View Post
Hi, I am wondering about the -a and -a2 parameters for paired end reads

For example my libraries were made using the Illumina TruSeq adapters, one of which is:
(5')-adapter-(3')=GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

Should -a2 be the reverse of this or should I not use the a2 option at all?

Thanks,
blanco
You don't need to use -a2 at all because Trim Galore will by default search for the first 13 bp of the Illumina adapter. This portion is shared between both read 1 and read 2 adapter, and they start to diverge only after this position. Normally it should thus be suffifcient to run it with the default parameters.
fkrueger is offline   Reply With Quote
Old 01-17-2013, 05:22 AM   #6
gwilkie
Member
 
Location: Glasgow

Join Date: Dec 2011
Posts: 27
Default

I have also found that when using Nextera sample prep, you should trim at CTGTCTCTTATACACATCT instead of the usual AGATCGGAAGAGC.

Best wishes, Gavin
gwilkie is offline   Reply With Quote
Old 01-17-2013, 06:44 AM   #7
blanco
Member
 
Location: Iceland

Join Date: Apr 2012
Posts: 28
Default

Thanks again for your quick reply fkrueger.

What, however, if I do not want to use the default 13bp adapter but instead want to use the whole 63 bp index-adapter (index = red):
(5')-adapter-(3')=GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

Would I still be safe to just use the -a option? I am assuming yes.
blanco is offline   Reply With Quote
Old 01-17-2013, 06:57 AM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

Quote:
Originally Posted by blanco View Post
Thanks again for your quick reply fkrueger.

What, however, if I do not want to use the default 13bp adapter but instead want to use the whole 63 bp index-adapter (index = red):
(5')-adapter-(3')=GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

Would I still be safe to just use the -a option? I am assuming yes.
Yes that should work. However if you look closely at the sequence you posted you'll find that it starts with GATCGGAAGAGC.... A-tailing of the library prior to adapter ligation will add another A to the 5-prime end, like so:
AGATCGGAAGAGC, which happens to be pretty much what Trim Galore is using by default:
AGATCGGAAGAGC.

If you still want to use the entire sequence then you would need tp specify different adapters for both sides since their sequence starts diverging after the first 13bp. Unless you want to trim only a subset of indexed adapters it is probably best to just run it in default mode.
fkrueger is offline   Reply With Quote
Old 03-01-2013, 05:52 AM   #9
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

We have just released Trim Galore v0.2.7 that adds an option '--dont_gzip' which takes precedence over .gz file endings.

The download can be found here: http://www.bioinformatics.babraham.a...s/trim_galore/.
fkrueger is offline   Reply With Quote
Old 04-17-2013, 02:06 PM   #10
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

We have just released Trim Galore v0.2.8, which is more of a cosmetic update. The changes in detail are:

Trim Galore will now compress output files with GZIP on the fly instead of compressing the trimmed file once trimming has completed. In the interest of time temporary files are not being compressed
Added a small sanity check to exit if no files were supplied for trimming. Thanks to P. for 'bringing this to my attention'
The download also includes the updated RRBS guide

The download can be found here: http://www.bioinformatics.babraham.a...s/trim_galore/
fkrueger is offline   Reply With Quote
Old 07-16-2013, 06:25 AM   #11
Vasilisa
Junior Member
 
Location: Germany

Join Date: Feb 2012
Posts: 6
Default paired-end data filtering with FastQC

Hi!
I have a question about the usage of Trim Galore (v0.3.0) for paired-end data filtering.

I used FastQC to generate the quality assessment report. Based on it, I want to trim my sequences both at the 3'- and 5'-ends (from "Per base sequence quality") and also filter them based on quality (from "Per sequence quality scores"). Is there a way to somehow integrate FastQC with Trim Galore for paired-end data?

Thanks a lot!
Vasilisa
Vasilisa is offline   Reply With Quote
Old 07-16-2013, 06:36 AM   #12
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

Quote:
Originally Posted by Vasilisa View Post
Hi!
I have a question about the usage of Trim Galore (v0.3.0) for paired-end data filtering.

I used FastQC to generate the quality assessment report. Based on it, I want to trim my sequences both at the 3'- and 5'-ends (from "Per base sequence quality") and also filter them based on quality (from "Per sequence quality scores"). Is there a way to somehow integrate FastQC with Trim Galore for paired-end data?

Thanks a lot!
Vasilisa
Hi Vasilisa,

FastQC and Trim Galore are independent pieces of software, so they can't really be 'integrated'. Trim Galore will by default use a Phred score cutoff of 20 for quality trimming (the implementation is the same as the one used by BWA, please see the Cutadapt documentation for more information). As of the latest version of Trim Galore you can also trim sequences on their 5' ends if intended. If you are simply experiencing somwhat lower qualities at the first couple of positions it is probably simply due to the quality score calibration that takes place during the first cycles on the HiSeq/MiSeq. Normally it is sufficient to run Trim Galore in its default setting.
fkrueger is offline   Reply With Quote
Old 07-16-2013, 01:41 PM   #13
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

A new version of Trim Galore (v0.3.0) is now available for download from http://www.bioinformatics.babraham.a...s/trim_galore/.

This new release adds the options '--clip_R1' and '--clip_R2' to trim off a fixed number of bases from the 5' end of reads. This can be useful if the quality is unusually low at the start, or whenever there is an undesired bias at the start of reads. An example for this could be PBAT-Seq in general (where the first 4bp show a very strong sequence (and methylation) bias), or the start of Read 2 of every shotgun Bisulfite-Seq paired-end library where the end repair procedure introduces unmethylated cytosines. For more information on this please refer to the M-bias section of the Bismark User Guide.

Together with the '--ignore' and '--ignore_r2' options in the Bismark methylation extractor one may now choose whether to remove potential biased positions at the start of reads before mapping or ignoring them later on using the Bismark methylation extractor.
fkrueger is offline   Reply With Quote
Old 07-18-2013, 09:20 AM   #14
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default trim galore error

Hello,
When I tried trim-galore (v0.3.0) for mate_pair reads (actually my first time to use it) I always had an error:
Code:
Use of uninitialized value $length_cutoff in numeric gt (>) at /usr/bin/trim_galore line 1068.
Use of uninitialized value $length_cutoff in numeric gt (>) at /usr/bin/trim_galore line 1077.
Failed to write to file: No such file or directory
And this is my command line:
Code:
trim_galore --paired -phred33 -q 20 -a CTGTCTCTTATACACATCT -a2 AGATGTGTATAAGAGACAG -e 0.15 --clip_R1 14 --clip_R2 15 --retain_unpaired  -t -r1 20 -r2 20  -o MP03-05k_R1+R2trim_galored.fq test_R1.fq test_R2.fq
I checked the script lines it seems related to my reads length. What did I miss? Thank you!
yifangt is offline   Reply With Quote
Old 07-18-2013, 01:54 PM   #15
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

Apologies for that, this was a bug in the code because the default length cutoff had not been defined when the check against the length of unpaired reads was performed. I have now fixed that and put up a new version. Just as side note, the retain length for unpaired singleton reads has to be greater than the paired-end length cutoff (which is 20bp by default).

Please download the hot-fixed version 0.3.1 from the Trim Galore project page.
Best,
Felix
fkrueger is offline   Reply With Quote
Old 07-18-2013, 03:31 PM   #16
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default

Thanks Felix, I will give it a try again. Is it poosible to provide multiple adaptor sequences in a single line like cutadapt?
Quote:
cutadapt -a PRIMER1 -b ADAPTOR1 -b ADAPTOR2
It seems there is no description about this in the Trim_galore guide. Thanks a lot!
yifangt is offline   Reply With Quote
Old 07-18-2013, 03:36 PM   #17
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

No, if you wanted to use many adapter sequences you would have to use Cutadapt itself. For standard (Illumina) sequencing libraries this is probably not needed though. Do you have a reason to try many adapters as the first attempt instead of just going with the default (which is Illumina adapters)?
fkrueger is offline   Reply With Quote
Old 07-18-2013, 04:06 PM   #18
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default

Yes, for Illumina Mate-pair reads, there are multiple scenarios that need be handled at the same time, at least the junction sequence and its reverse complement. Maybe Trim_galore can do it with multiple steps, is that right?
yifangt is offline   Reply With Quote
Old 07-19-2013, 01:28 AM   #19
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 595
Default

Do you mean mate-pair or paired-end libraries? At least paired-end libraries typically share the same starting sequence of the adapters on both ends of each fragment, so you wouldn't have to supply different sequences or reverse complements but simply run Trim Galore in default mode. If you really wanted to run several consecutive steps it can't be guaranteed that you only trim off one adapter per sequence.
fkrueger is offline   Reply With Quote
Old 08-19-2013, 01:28 AM   #20
gerald2545
Member
 
Location: Toulouse

Join Date: Nov 2008
Posts: 21
Default

Hi all,
we noticed something strange in TrimGalore! (0.2.8) result file, for a WGBS 2x101 paired-ends run :
Quote:
RUN STATISTICS FOR INPUT FILE: /work/ng6/jflow/methylSeq/wf000619/ConcatenateFilesGroups_default/A3_CACGAT_L002_R1.fastq.gz
=============================================
26711665 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 0 (0.0%)
Quote:
RUN STATISTICS FOR INPUT FILE: /work/ng6/jflow/methylSeq/wf000619/ConcatenateFilesGroups_default/A3_CACGAT_L002_R2.fastq.gz
=============================================
26711665 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 0 (0.0%)

Total number of sequences analysed for the sequence pair length validation: 26711665

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 2697629 (10.10%)
It seems that 0 reads were removed for R1 due to length, 0 reads were removed for R2 due to length but 2697629 pairs were removed because at least one read was shorter than the length cutoff

Is there someting that I don't well understand?

We have just installed the latest version, but did'nt try it yet (but you don't mention this in your release notes, so I don't think the behaviour will be different)

Thank you for your answer

Another thing, but it may be a cutadapt output problem. In the report file, cutadapt section :
Quote:
cutadapt version 1.2.1
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /work/ng6/jflow/methylSeq/wf000619/ConcatenateFilesGroups_default/A3_CACGAT_L002_R1.fastq.gz
Maximum error rate: 10.00%
No. of adapters: 1
Processed reads: 26711665
Processed bases: 2671166500 bp (2671.2 Mbp)
Trimmed reads: 10940767 (41.0%)
Quality-trimmed: 168748516 bp (177.0 Mbp) (6.32% of total)
Trimmed bases: 177017106 bp (177.0 Mbp) (6.63% of total)
Quality-trimmed and Trimmed bases have different number of bp reported, but the number in Mb is the same. It seems that the number of bases in Mbp used in Quality-trimmed comes from the Trimmed bases result.


Gerald
gerald2545 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO