SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 138 11-12-2020 04:58 PM
Trim FastQ nxtgenkid10 Bioinformatics 7 05-27-2014 06:40 PM
Trim Illumina reads? sapearl Bioinformatics 3 08-10-2011 09:35 AM
Newbler Trim Status blindtiger454 De novo discovery 2 05-18-2011 05:46 AM
Do I need to trim the sequences like this? days369 Bioinformatics 4 08-16-2010 09:19 PM

Reply
 
Thread Tools
Old 09-10-2012, 08:59 AM   #1
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default trim galore error

Hi
I try to run trim galore but received an error message (pasted below). It says "cutadapt ... failed at /usr/local/bin/trim_galore line 420". Anyone knows what it means?

thanks


###############################
$ No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)


SUMMARISING RUN PARAMETERS
==========================
Input filename: Sample_C1.R1.fastq.gz
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG'
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 20 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: '--outdir ./'
Output file will be GZIP compressed

Writing final adapter and quality trimmed output to Sample_C1.R1_trimmed.fq


>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG' from file Sample_C1.R1.fastq.gz <<<
open3: exec of cutadapt -f fastq -e 0.1 -q 20 -O 20 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG Sample_C1.R1.fastq.gz failed at /usr/local/bin/trim_galore line 420

RUN STATISTICS FOR INPUT FILE: Sample_C1.R1.fastq.gz
=============================================
0 sequences processed in total
Illegal division by zero at /usr/local/bin/trim_galore line 506.
^C
[3]- Exit 255 trim_galore --fastqc_args "--outdir ./" -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG -s 20 Sample_C1.R1.fastq.gz
JQL is offline   Reply With Quote
Old 09-10-2012, 10:58 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

In order to run Trim Galore you need to install Cutadapt first and set the filepath to the cutadapt executable within Trim Galore. Have you done that?
fkrueger is offline   Reply With Quote
Old 09-10-2012, 11:08 AM   #3
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Our IT admin installed the trim Galore. Let me ask him.
But when I tried $cutadapt -h it says command not found. Any way I can check myself?

Probably need to install fastQC too Im afarid?

Quote:
Originally Posted by fkrueger View Post
In order to run Trim Galore you need to install Cutadapt first and set the filepath to the cutadapt executable within Trim Galore. Have you done that?
JQL is offline   Reply With Quote
Old 09-10-2012, 11:25 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Not sure if you have the permissions to install Cutadapt on your system, but once this is done you need to edit a line at the top of Trim Galore to specify the path.
Running FastQC within Trim Galore is optional (Cutadapt is mandatory), but should you choose to install it you need to modify the path likewise
fkrueger is offline   Reply With Quote
Old 09-10-2012, 11:44 AM   #5
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

I actually don't.

but I will pass this to him. thanks.

Quote:
Originally Posted by fkrueger View Post
Not sure if you have the permissions to install Cutadapt on your system, but once this is done you need to edit a line at the top of Trim Galore to specify the path.
Running FastQC within Trim Galore is optional (Cutadapt is mandatory), but should you choose to install it you need to modify the path likewise
JQL is offline   Reply With Quote
Old 09-11-2012, 06:57 AM   #6
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Hi Felix,

Our IT fixed the problem. It is running now. Thanks for the pointers.

I have a few more questions for you after reading the documentation:
1. -s option: what would be a good overlapping length for trimming for 50 SE? I chose 20, what would you use?
2. -e option: If we use -s 20, 0.1 means two mismatches allowed correct?

Can trim galore run fastQC without doing trimming? Just wonder if one doesn't want to do trimming, can he still use fastQC in trim galore? I tried $fastqc -h, it doesnt' seem recognize the command.

thanks!


Quote:
Originally Posted by fkrueger View Post
Not sure if you have the permissions to install Cutadapt on your system, but once this is done you need to edit a line at the top of Trim Galore to specify the path.
Running FastQC within Trim Galore is optional (Cutadapt is mandatory), but should you choose to install it you need to modify the path likewise
JQL is offline   Reply With Quote
Old 09-11-2012, 07:04 AM   #7
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

It just finished running. It is quite fast. I trimmed off about 5.6% of reads, which I think is about right. The fastQC report says I have 5.1% truSeq adaptor index 3 (100% match). Given that we allow 10% mismatches and -s 20, it seems right.

I used fastx_clipper a few days ago, it trimmed off 20% reads, which I think is a bit too much

Just one slight problem, I can't seem to unzip the fastQC report. It is under a folder, the file was named x.gz (I didn't name it). when I tried to $gunzip x.gz, it says, gzip: x.gz: not in gzip format. Can you help please?

Thanks for the great program!
JQL is offline   Reply With Quote
Old 09-11-2012, 07:08 AM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi JQL,

(1) the strincency you want to use depends a bit on the application you have, The cutadapt default is 3, meaning that if it finds 3 bases at the 3' end that look like adapter it will trim it. For BS-Seq applications, for which Trim Galore was intended initially, any kind of adapter sequence is detrimental to mapping, methylation calling, or both. I have thus lowered the default to 1 so that it trims of virtually anything looking like adapter. Choosing a value of 20 will probably not remove a lot of adapter sequences at all. For non bisulfite applications you can probably get away with 3 or so, but I would use the default of 1 personally

(2) the calculation is sound

About your last comment, if you don't want to do any trimming and run FastQC, why not run FastQC alone? Your IT guys should know where it is installed (maybe you can try a "locate fastqc"?)
fkrueger is offline   Reply With Quote
Old 09-11-2012, 07:11 AM   #9
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Quote:
Originally Posted by JQL View Post
It just finished running. It is quite fast. I trimmed off about 5.6% of reads, which I think is about right. The fastQC report says I have 5.1% truSeq adaptor index 3 (100% match). Given that we allow 10% mismatches and -s 20, it seems right.

I used fastx_clipper a few days ago, it trimmed off 20% reads, which I think is a bit too much

Just one slight problem, I can't seem to unzip the fastQC report. It is under a folder, the file was named x.gz (I didn't name it). when I tried to $gunzip x.gz, it says, gzip: x.gz: not in gzip format. Can you help please?

Thanks for the great program!
FastQC does not produce any .gz files. Instead it produces one folder with all files in the correct folder stucture, and a .zip report of it. The two outputs should something like this:

H1.fq_fastqc
H1.fq_fastqc.zip
fkrueger is offline   Reply With Quote
Old 09-11-2012, 07:44 AM   #10
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Yep, he just installed fastQC. I will try later

I don't think I quite get the -s option yet. Sorry.

The TruSeq adaptor index from Illumina has 63 bases as following:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG

My RNA-seq is 50 SE. So, lets say I use -s 3, and I have a read like this: 5' xxxx....xxxxTTG 3' (x: any base). Since it matches the three bases (TTG in blue) at the 3' end, this read is trimmed? Yet another case, if another read like this: xxx...xxxGCC (in red), this one will also be trimmed?

If this is true, would it be a bit too non-specific? Or I misunderstood...


thanks
John
JQL is offline   Reply With Quote
Old 09-11-2012, 08:04 AM   #11
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Quote:
Originally Posted by JQL View Post
Yep, he just installed fastQC. I will try later

I don't think I quite get the -s option yet. Sorry.

The TruSeq adaptor index from Illumina has 63 bases as following:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG

My RNA-seq is 50 SE. So, lets say I use -s 3, and I have a read like this: 5' xxxx....xxxxTTG 3' (x: any base). Since it matches the three bases (TTG in blue) at the 3' end, this read is trimmed? Yet another case, if another read like this: xxx...xxxGCC (in red), this one will also be trimmed?

If this is true, would it be a bit too non-specific? Or I misunderstood...


thanks
John
Your first example is correct, if the read is 5' xxxx....xxxxTTG 3' and -s is 3, the 3 last bases would be trimmed. If it was 5' xxxx....xxxxTTT 3', the read would not be trimmed as the most 3' bases need to overlap. Similarly, if some part of the adapter is found within the read it does only trim the read at this position if the full adapter sequence is found in the rest of the read (for which the -e applies). So your xxx...xxxGCC would not be trimmed.

As another note: you need to select the other end of the primer to remove, here GATCGGAAGAG. As Illumina fragments are A tailed, you need to add an additional A at the start, resulting in the sequence: AGATCGGAAGAG.

You will also find that this is already the adapter sequence Trim Galore uses by default. I have tried to make Trim Galore a pretty straight forward tool that does the right thing automatically, so if you just run

./trim_galore your_file.fq

you should find that it is doing exactly the right thing in one simple command. I would only modify the parameters if I would like it to do something extra special.
fkrueger is offline   Reply With Quote
Old 09-11-2012, 08:06 AM   #12
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Just ran the fastQC, it shows a great improvement over non-trimmed reads over several parameters except seq. duplication level which increases from 61% to 71%.

I will try -s 3 and see the difference.
JQL is offline   Reply With Quote
Old 09-11-2012, 08:13 AM   #13
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Thanks Felix! Thats a very nice explanation.

Quote:
Originally Posted by fkrueger View Post
Your first example is correct, if the read is 5' xxxx....xxxxTTG 3' and -s is 3, the 3 last bases would be trimmed. If it was 5' xxxx....xxxxTTT 3', the read would not be trimmed as the most 3' bases need to overlap. Similarly, if some part of the adapter is found within the read it does only trim the read at this position if the full adapter sequence is found in the rest of the read (for which the -e applies). So your xxx...xxxGCC would not be trimmed.

As another note: you need to select the other end of the primer to remove, here GATCGGAAGAG. As Illumina fragments are A tailed, you need to add an additional A at the start, resulting in the sequence: AGATCGGAAGAG.

You will also find that this is already the adapter sequence Trim Galore uses by default. I have tried to make Trim Galore a pretty straight forward tool that does the right thing automatically, so if you just run

./trim_galore your_file.fq

you should find that it is doing exactly the right thing in one simple command. I would only modify the parameters if I would like it to do something extra special.
JQL is offline   Reply With Quote
Old 09-12-2012, 09:54 AM   #14
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Hi Felix,

After I ran trim_galore and fastQC, I compare the fastQC reports before and after. I have a couple of questions. I ran trim_galore with all default settings.

1. In one of my 8 samples, the TruSeq adaptor reads was not removed completely. See the attached pdf file. That adaptor read starts with 5' AGAGCxxx... which overlaps with the last five bases of the default 13 bp adaptor seq. I can't figure out why this one was not trimmed while others were gone? Can you explain? All other 7 samples TruSeq adaptor reads are no longer reported in fastQC report.

2. After trimming, in all 8 samples, I start to seeing the big changes in the last 3bases. The sequence and GC contents behaves strangely. Do you suggest removal the last three bases in all reads? See pics.

thanks John
Attached Images
File Type: png C2_per_base_gc_content.png (15.4 KB, 15 views)
File Type: png C2_trim_per_base_gc_content.png (15.3 KB, 8 views)
File Type: png C2_per_base_sequence_content.png (33.2 KB, 12 views)
File Type: png C2_trim_per_base_sequence_content.png (34.7 KB, 11 views)
Attached Files
File Type: pdf overRep.pdf (93.0 KB, 24 views)
JQL is offline   Reply With Quote
Old 09-12-2012, 10:49 AM   #15
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Quote:
Originally Posted by JQL View Post
Hi Felix,

After I ran trim_galore and fastQC, I compare the fastQC reports before and after. I have a couple of questions. I ran trim_galore with all default settings.

1. In one of my 8 samples, the TruSeq adaptor reads was not removed completely. See the attached pdf file. That adaptor read starts with 5' AGAGCxxx... which overlaps with the last five bases of the default 13 bp adaptor seq. I can't figure out why this one was not trimmed while others were gone? Can you explain? All other 7 samples TruSeq adaptor reads are no longer reported in fastQC report.

2. After trimming, in all 8 samples, I start to seeing the big changes in the last 3bases. The sequence and GC contents behaves strangely. Do you suggest removal the last three bases in all reads? See pics.

thanks John
The one sequence that escaped has a slightly different sequence than the standard adapter sequence and was thus probably missed out. Not sure why this is, but you could either run it again using the end of that very sequence in question or just don't bother because a full length adapter sequence is not going to align anyway.

The ratio of the last 3 bases changes due to the trimming, so if you remove e.g. the A from the read then the A content will go down while other bases contents go up. The bias in the start looks very much like the bias from random hexamer priming which is quite typical and has been discussed in several threads already. Overall I think you should be good to start aligning your reads now!
fkrueger is offline   Reply With Quote
Old 09-12-2012, 11:07 AM   #16
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 83
Default

Quote:
Originally Posted by fkrueger View Post
The one sequence that escaped has a slightly different sequence than the standard adapter sequence and was thus probably missed out. Not sure why this is, but you could either run it again using the end of that very sequence in question or just don't bother because a full length adapter sequence is not going to align anyway.

The ratio of the last 3 bases changes due to the trimming, so if you remove e.g. the A from the read then the A content will go down while other bases contents go up. The bias in the start looks very much like the bias from random hexamer priming which is quite typical and has been discussed in several threads already. Overall I think you should be good to start aligning your reads now!
Thanks Felix. Great, I am ready for my very first RNA-seq then!
JQL is offline   Reply With Quote
Old 04-30-2014, 01:00 PM   #17
magisterh
Junior Member
 
Location: Vienna

Join Date: Apr 2014
Posts: 1
Default Please help! (Trim Galore of RRBS data)

Hi!

I want to use Trim Galore to precess RRBS data before working with bismark.
System win7 ultimate, 64bit
However, I get the following error message:

Traceback (most recent call last):
File "F:\RRBSCML\cutadapt\scripts\cutadapt.py", line 71, in <module> from cutadapt import seqio, __version__
ImportError: cannot import name seqio

What am I doing wrong?

Thanks for help!!
magisterh is offline   Reply With Quote
Old 04-30-2014, 02:38 PM   #18
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Hi magisterh,

The error you are seeing is caused by Cutadapt not being able to find required packages under Windows. I would suggest to either contact Marcel Martin directly (the developer of Cutadapt), or follow BB's suggestion to use his suite of tools.
fkrueger is offline   Reply With Quote
Old 01-12-2015, 10:58 PM   #19
Nikhil
Junior Member
 
Location: coimbtore

Join Date: Jan 2015
Posts: 2
Default trim galore and cutadapt software errors

hi...

this is the error i am getting while running the Trim galore and cutadapt... what is the exactly this error, and how can solve this problem ????

cutadapt/_align.c:8:22: fatal error: pyconfig.h: No such file or directory

#include "pyconfig.h"

^

compilation terminated.

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_manman/cutadapt/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-V_zM1D-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_manman/cutadapt
Storing debug log for failure in /home/manman/.pip/pip.log
Nikhil is offline   Reply With Quote
Old 01-12-2015, 11:02 PM   #20
Nikhil
Junior Member
 
Location: coimbtore

Join Date: Jan 2015
Posts: 2
Default

Quote:
Originally Posted by JQL View Post
Hi
I try to run trim galore but received an error message (pasted below). It says "cutadapt ... failed at /usr/local/bin/trim_galore line 420". Anyone knows what it means?

thanks


###############################
$ No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)


SUMMARISING RUN PARAMETERS
==========================
Input filename: Sample_C1.R1.fastq.gz
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG'
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 20 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: '--outdir ./'
Output file will be GZIP compressed

Writing final adapter and quality trimmed output to Sample_C1.R1_trimmed.fq


>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG' from file Sample_C1.R1.fastq.gz <<<
open3: exec of cutadapt -f fastq -e 0.1 -q 20 -O 20 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG Sample_C1.R1.fastq.gz failed at /usr/local/bin/trim_galore line 420

RUN STATISTICS FOR INPUT FILE: Sample_C1.R1.fastq.gz
=============================================
0 sequences processed in total
Illegal division by zero at /usr/local/bin/trim_galore line 506.
^C
[3]- Exit 255 trim_galore --fastqc_args "--outdir ./" -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG -s 20 Sample_C1.R1.fastq.gz
hi ....
i am also getting the same error ... please can tell me how did you try to solve this problem ????
Nikhil is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO