SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
trim adapter from Illumina Genome Analyzer IIe miRNA reads NicoBxl Bioinformatics 5 01-02-2014 05:31 AM
Checking the Quality of RRBS libraries before actually running them twang11 Sample Prep / Library Generation 0 02-22-2012 04:18 PM
trim 3' adapter sequence for mRNA-Seq? slny Bioinformatics 14 06-14-2011 06:15 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 02:27 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 07-11-2014, 12:51 PM   #41
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

First of all apologies for not having released Trim Galore updates lately, I seem to have somehow always postponed and then forgotten them entirely...

A new version of Trim Galore (v0.3.6) is now available from its project page (http://www.bioinformatics.babraham.a...s/trim_galore/), which adds several features and fixes:

- Added the new options '--three_prime_clip_r1' and '--three_prime_clip_r2' to clip any number of bases from the 3' end after adapter/quality trimming has completed
- Added a check to see if Cutadapt exits fine. Else, Trim Galore will bail a well
- The option '--stringency' needs to be spelled out now since using -s was ambiguous because of '--suppress_warn'
- Added the Trim Galore version number to the summary report
- Added single-end or paired-end mode to the summary report
- In paired-end mode, the Read 1 summary report will no longer state that no sequence have been discarded due to trimming. This will be stated in the trimming report of Read 2 once the validation step has been completed

(Edit: The manual needs a little updating, too, I'll work on that...)
fkrueger is offline   Reply With Quote
Old 07-16-2014, 07:06 AM   #42
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

I have just released a small fix to Trim Galore (v0.3.7) that makes paired-end trimming work again (which I had accidentally broken by introducing a small change...). The manual has now also been updated.

Please find the latest release here: http://www.bioinformatics.babraham.a...s/trim_galore/
fkrueger is offline   Reply With Quote
Old 07-28-2014, 09:17 AM   #43
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 63
Default

I have used trim galore before using Bismark many times. It just occurred to me though that there might be a problem in my use of the pipeline. What I usually do is trim the reads (both adaptor and quality trimming), then align with Bismark, remove duplicate reads using the deduplicatebismark script provided, then proceed with methylation calling. However, if I am trimming for quality, I am changing the start and end coordinates of the read, which I think would affect the detection of duplicate reads. Could someone please let me know if this is correct? Is trimming for quality, going to adversely affect the detection of duplicate reads?
shawpa is offline   Reply With Quote
Old 07-28-2014, 09:41 AM   #44
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

Quote:
Originally Posted by shawpa View Post
I have used trim galore before using Bismark many times. It just occurred to me though that there might be a problem in my use of the pipeline. What I usually do is trim the reads (both adaptor and quality trimming), then align with Bismark, remove duplicate reads using the deduplicatebismark script provided, then proceed with methylation calling. However, if I am trimming for quality, I am changing the start and end coordinates of the read, which I think would affect the detection of duplicate reads. Could someone please let me know if this is correct? Is trimming for quality, going to adversely affect the detection of duplicate reads?
No, trimming should not affect the deduplication:

Single-end deduplication uses the chromosome, the start coordinate and the orientation of a read. Since you are trimming from the 3' end of a read this has no influence on the start coordinate. (for reverse reads the start coordinate is calculated by adding the read length (using the CIGAR string for gapped alignments if required)).

Paired-end deduplication uses the chromosome, the start coordinate of read 1, the end coordinate of read 2 and the orientation of the read pair (determined by read 1). Again, since you are trimming from the 3' end of both reads the relevant parameters are not affected.
fkrueger is offline   Reply With Quote
Old 08-14-2014, 11:55 PM   #45
rzwu0721
Junior Member
 
Location: China Guangxi

Join Date: Aug 2014
Posts: 1
Default

Hi, I am using the software named CLC Genomics Workbench, and it can trim the adapter for just need several minutes, eg. CTGTCTCTTATACACATCT you have mentioned above.So I would recommend you can try to use it.

Best Wishes!
Renzhi Woo,
Guangxi Academy of Sciences
rzwu0721 is offline   Reply With Quote
Old 09-12-2014, 12:52 AM   #46
yasmin_friedmann
Junior Member
 
Location: wales

Join Date: Sep 2014
Posts: 1
Default trim_galore without adaptor trimming?

Hi All,

Here is my first question ever to this forum! :-)

I have come across trim_galore when looking for a quality trimmer that would trim both paired end reads together. my fastq files are from illumina 1.9. I run the following command:

Quote:
trim_galore -q 20 --fastqc --gzip --paired filename1 filename3
I get the following error message:

Quote:
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Writing report to 'filename1_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: filename1
Trimming mode: paired-end
Trim Galore version: 0.3.7
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC'
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to filename1_trimmed.fq.gz


>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file filename1 <<<
Traceback (most recent call last):
File "/Users/yasmin/cutadapt-1.4.2/bin//cutadapt", line 9, in <module>
from cutadapt.scripts import cutadapt
File "/Users/yasmin/cutadapt-1.4.2/cutadapt/scripts/cutadapt.py", line 69, in <module>
from cutadapt.adapters import Adapter, ColorspaceAdapter, BACK, FRONT, PREFIX, ANYWHERE
File "/Users/yasmin/cutadapt-1.4.2/cutadapt/adapters.py", line 4, in <module>
from cutadapt import align, colorspace
File "/Users/yasmin/cutadapt-1.4.2/cutadapt/align.py", line 225, in <module>
from cutadapt._align import globalalign_locate, compare_prefixes
ImportError: dlopen(/Users/yasmin/cutadapt-1.4.2/cutadapt/_align.so, 2): no suitable image found. Did find:
/Users/yasmin/cutadapt-1.4.2/cutadapt/_align.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00


Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...
if anybody came across this and solved it , please let me know!

Many thanks!
Yasmin

Last edited by yasmin_friedmann; 09-12-2014 at 01:16 AM. Reason: added error message
yasmin_friedmann is offline   Reply With Quote
Old 01-13-2015, 02:58 PM   #47
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

Hi Felix - I'm writing the methods sections for a few WGBS papers where I've used trim_galore, is there a paper I can cite for it?
frozenlyse is offline   Reply With Quote
Old 01-14-2015, 12:59 AM   #48
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

If you wanted to you could cite its URL, there is no publication as such (apart from the Cutadapt reference). Cheers, Felix
fkrueger is offline   Reply With Quote
Old 03-30-2015, 04:23 AM   #49
MaximeG
Junior Member
 
Location: paris

Join Date: Jun 2014
Posts: 1
Default

Hi all,
I have a question about the option non directional of trim galore.
After a lot of reflexion, we have determined that we have done a RRBS library in a directional paired end manner (R1 begin by C/TGG and R2 by CAA). But the option nd permits to cut the CA from R2.
It's a better strategy to let this CA for bismark and then to cut them ?
We have run the two: With nd: 36,6% uniquely aligned pairs + 55.6% Multiple pairs
Without nd: 37.8% uniquely aligned pairs + 55.2% Multiple pairs
Thank you for your future response
Maxime
MaximeG is offline   Reply With Quote
Old 04-21-2015, 03:28 AM   #50
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Quote:
Originally Posted by gwilkie View Post
I have also found that when using Nextera sample prep, you should trim at CTGTCTCTTATACACATCT instead of the usual AGATCGGAAGAGC.

Best wishes, Gavin
Is this still the case in 2015? I mean, is "CTGTCTCTTATACACATCT" universal to Nextera prepped samples?
__________________
savetherhino.org
rhinoceros is offline   Reply With Quote
Old 05-06-2015, 12:55 AM   #51
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default Trim Galore v0.4.0 released: Adapter auto-detection

We have just made a new Trim Galore release to version 0.4.0. This adds a few sanity checks and makes the specification of standard adapters more straight forward. In fact we changed the default mode so that Trim Galore attempts to auto-detect which type of adapter has been used in library construction, which results in a 'one command to trim them all' for standard ClusterFlow processing of a highly diverse full Illumina flowcell.

Here are the changes in more detail:

• Unless instructed otherwise Trim Galore will now attempt to auto-detect the adapter which had been used for library construction (choosing from the Illumina universal, Nextera transposase and Illumina small RNA adapters). For this the first 1 million sequences of the first file specified are analysed. If no adapter can be detected within the first 1 million sequences Trim Galore defaults to --illumina. The auto-detection behaviour can be overruled by specifying an adapter sequence or using --illumina, --nextera or --small_rna

• Added the new options '--illumina', '--nextera' and '--small_rna' to use different default sequences for trimming (instead of -a):
Universal Illumina: AGATCGGAAGAGC (TruSeq or Sanger iTag)
Small RNA: ATGGAATTCTCG
Nextera: CTGTCTCTTATA

• Added a sanity check to the start of a Trim Galore run to see if the (first) FastQ file in question does contain information at all or appears to be in SOLiD colorspace format, and bails if either is true. Trim Galore does not support colorspace trimming, but users wishing to do this are kindly referred to using Cutadapt as a standalone program

• Added a new option '--path_to_cutadapt /path/to/cudapt'. Unless this option is specified it is assumed that Cutadapt is in the PATH (equivalent to '--path_to_cutadapt cutadapt'). Also added a test to see if Cutadapt seems to be working before the actual trimming is launched

• Fixed an open command for a certain type of RRBS processing (was open() instead of open3())

Trim Galore is available from the Babraham Bioinformatics projects site.
fkrueger is offline   Reply With Quote
Old 05-07-2015, 12:29 AM   #52
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Quote:
Originally Posted by fkrueger View Post
Trim Galore is available from the Babraham Bioinformatics projects site.
There's a small problem with the zip file.

Code:
unzip trim_galore_v0.4.0.zip 
Archive:  trim_galore_v0.4.0.zip
  inflating: Trim_Galore_User_Guide.pdf  
  inflating: trim_galore             
  inflating: RRBS_Guide.pdf          
warning:  skipped "../" path component(s) in ../Bismark/license.txt
  inflating: Bismark/license.txt
__________________
savetherhino.org
rhinoceros is offline   Reply With Quote
Old 05-07-2015, 12:34 AM   #53
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

Quote:
Originally Posted by rhinoceros View Post
There's a small problem with the zip file.

Code:
unzip trim_galore_v0.4.0.zip 
Archive:  trim_galore_v0.4.0.zip
  inflating: Trim_Galore_User_Guide.pdf  
  inflating: trim_galore             
  inflating: RRBS_Guide.pdf          
warning:  skipped "../" path component(s) in ../Bismark/license.txt
  inflating: Bismark/license.txt
Ups... but it is only the license file. I have replaced the zip file now, Cheers, Felix
fkrueger is offline   Reply With Quote
Old 05-07-2015, 04:26 AM   #54
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Thats great.

Do you have any hints how to trim ScriptSeq prepped samples? My PE reads clearly had Truseq adaptors, but after trim_galore fastqc tells me that my R1 reads still contain a considerable amount of "TruSeq Adapter, Index 12 (100% over 58bp)" and some other "no hit" stuff whereas my R2 reads apparently contain lots of "Illumina Single End PCR Primer 1 (100% over 52bp)" and "no hit" stuff. Both files have massive k-mer bias in 5'-ends even after trim_galore. The first 13 bp of TruSeq adapters and ScriptSeq adapters are identical so I'm somewhat baffled how these adapters are present in some R1 even after trimming. I presume the R2 stuff is related to 3'-terminal tagging and very short RNA molecules so as a solution I could include the complete Illumina Paired End PCR Primer 1 seq utilizing the -a2 flag.
__________________
savetherhino.org

Last edited by rhinoceros; 05-07-2015 at 04:58 AM.
rhinoceros is offline   Reply With Quote
Old 05-07-2015, 05:12 AM   #55
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

It might help if you could send me the FastQC html report to take a look (email).

In more general terms, it is very well possible that you've got fragments of TruSeq adapters, or especially PCR primers, left in the library after trimming that FastQC warns you about. Quite often these are adapter or primer dimers that don't have the A (from A-tailing) at the start of the sequence. These sequences are not removed from the file, and they generally don't have to be if you are going to align the samples as the next step because they simply won't align.

The adapter contamination you do care about is the read-through contamination at the 3' end which start in a genomic sequence of interest which then continues into adapter contamination. It would appear that trimming got rid of these efficiently.
fkrueger is offline   Reply With Quote
Old 05-07-2015, 10:22 AM   #56
KJohnson
Junior Member
 
Location: Santa Barbara, Ca

Join Date: Oct 2013
Posts: 2
Default

Hi all,

I am running Trim Galore on illumina pair-end data and am trying to figure out what is going wrong. I have set quality score level to phred score of 30 but when trimming is complete and I view the FastQC file the box-plot whiskers under the Per base sequence quality tab go down to a phred score of 13. Is there something I am doing wrong?

Thanks.

code:
trim_galore -fastqc -q 30 -paired -retain_unpaired Blue_trimmed_1.fq Blue_trimmed_2.fq
KJohnson is offline   Reply With Quote
Old 05-07-2015, 10:26 AM   #57
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

Is the data Illumina 1.9 encoded (phred33) or the old 1.5 encoding by any chance? Would you mind attaching or sending me the FastQC report via email? Cheers, Felix
fkrueger is offline   Reply With Quote
Old 05-07-2015, 10:29 AM   #58
KJohnson
Junior Member
 
Location: Santa Barbara, Ca

Join Date: Oct 2013
Posts: 2
Default

The data is Illumina 1.9. Yes I can email you the report.

Thank you,
Kevin
KJohnson is offline   Reply With Quote
Old 05-29-2015, 04:48 AM   #59
LindsayR
Junior Member
 
Location: Baltimore

Join Date: May 2015
Posts: 2
Default TrimGalore paired end issue

I’m trying to run TrimGalore!v0.4.0 and I have cut adapt 1.8.1 installed using Python 2.7.6. I think that TrimGalore is not feeding in the paired option to cut adapt. I end up with an unequal number of reads in the read1 vs read 2 file and bismark will not align. This is the Summary of trimming: (I bolded the part I think is wrong in cut adapt) Any ideas? Thanks so much! -Lindsay

SUMMARISING RUN PARAMETERS
==========================
Input filename: path/read1_R1_010.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.4.0
Cutadapt version: 1.8.1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed


This is cutadapt 1.8.1 with Python 2.7.6
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC no –p argument is specified here…… /path_R1_010.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 161.90 s (40 us/read; 1.48 M reads/minute).
LindsayR is offline   Reply With Quote
Old 05-29-2015, 05:02 AM   #60
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 574
Default

Hi Lindsay,

This is a little odd... The way Trim Galore handles paired-end files (when you specify --paired) is to run single-end trimming on read 1 and read 2 separately, and then run a 'validation' step that checks the length of each read in a sequence pair to decide whether or not to keep or boot the entire read pair. Since reads are not discarded in the (single-end) trimming step even if they are trimmed to a length of 0bp they should then either be kept or discarded as the entire pair. Is there a chance that the FastQ files you fed in did not match up or were truncated?

So in a nutshell, the --paired option is not supposed to be fed through to Cutadapt (which only started supporting paired-end trimming recently), but is handled internally. If you keep having these problems could you please send me a few reads of your FastQ files and I can try to reproduce these errors on my side. Thanks, Felix
fkrueger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO