SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Guys, What do you use to trim the Illumina sequences? Daisy-Fu Illumina/Solexa 1 09-15-2011 10:18 AM
Trim Illumina reads? sapearl Bioinformatics 3 08-10-2011 09:35 AM
Newbler Trim Status blindtiger454 De novo discovery 2 05-18-2011 05:46 AM
Do I need to trim the sequences like this? days369 Bioinformatics 4 08-16-2010 09:19 PM
efficiently trim solexa reads weizhu Illumina/Solexa 1 01-04-2010 12:22 AM

Reply
 
Thread Tools
Old 02-20-2012, 10:51 PM   #1
nxtgenkid10
Member
 
Location: india

Join Date: Feb 2011
Posts: 16
Default Trim FastQ

Hi all, I would Like to know how could i trim a Fastq files and eliminate the reads lesser than QV30
nxtgenkid10 is offline   Reply With Quote
Old 02-20-2012, 11:40 PM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Originally Posted by nxtgenkid10 View Post
Hi all, I would Like to know how could i trim a Fastq files and eliminate the reads lesser than QV30
FASTQ quality trimmer will do this:

http://hannonlab.cshl.edu/fastx_toolkit/

If you want a more hand-holding approach, I suppose you could use Galaxy instead.
gringer is offline   Reply With Quote
Old 12-20-2012, 04:15 AM   #3
giampe
Member
 
Location: Bari, Italy

Join Date: Aug 2009
Posts: 22
Default

Hi Gringer,
I'm writing to ask some question about FastQ quality trimmer.
I have loaded Fastx toolkit on my linux machine, I want to use that tool to trim bases from the 3' and 5' and with a low quality score <28.
I have a paired end data files of 101 bases, only in the R2 file I have a decrease of quality in the 3' and in the 5' end, I attached the relative quality box plot.
How should I set the parameter reported below of Fastq quality trimmer to obtain a good quality box plot as for R1 (attached)?

fastq_quality_trimmer -h
usage: fastq_quality_trimmer [-h] [-v] [-t N] [-l N] [-z] [-i INFILE] [-o OUTFILE]
Part of FASTX Toolkit 0.0.13 by A. Gordon (gordon@cshl.edu)

[-h] = This helpful help screen.
[-t N] = Quality threshold - nucleotides with lower
quality will be trimmed (from the end of the sequence).
[-l N] = Minimum length - sequences shorter than this (after trimming)
will be discarded. Default = 0 = no minimum length.
[-z] = Compress output with GZIP.
[-i INFILE] = FASTQ input file. default is STDIN.
[-o OUTFILE] = FASTQ output file. default is STDOUT.
[-v] = Verbose - report number of sequences.
If [-o] is specified, report will be printed to STDOUT.
If [-o] is not specified (and output goes to STDOUT),
report will be printed to STDERR.

I will appreciate your help.
Thanks
Attached Images
File Type: jpg ImmagineR1_R2.jpg (20.0 KB, 92 views)
giampe is offline   Reply With Quote
Old 05-27-2014, 05:54 PM   #4
giorgifm
Member
 
Location: Columbia University Medical Center

Join Date: Aug 2011
Posts: 35
Default

You can find a complete list of FASTQ trimmers, compared to each other, in this paper:

"An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis"
giorgifm is offline   Reply With Quote
Old 05-27-2014, 06:09 PM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by giorgifm View Post
You can find a complete list of FASTQ trimmers, compared to each other, in this paper:

"An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis"
That's not complete; BBDuk isn't in it

Granted, it was not publicly available when the paper was published, but it DID exist. Here's a comparison of mapping error rates after trimming with various trimmers:

http://seqanswers.com/forums/showpos...6&postcount=16
Brian Bushnell is offline   Reply With Quote
Old 05-27-2014, 06:19 PM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

FWIW, I'm now using Trimmomatic (which finally has a published paper out) because it has a pipeline-based interface that I find expressive enough for almost all of my potential uses. In particular, you can change the order of operations for different use cases.

http://bioinformatics.oxfordjournals...&pmid=24695404

It has fairly good statistics in the paper giorgifm linked, but you can probably find a better trimmer for each specific case. I notice that Brian Bushnell hasn't included Trimmomatic in his graph (the Trimmomatic paper suggests "Maximum Information" mode for the best statistics) -- any chance of adding that?
gringer is offline   Reply With Quote
Old 05-27-2014, 06:23 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The reason I did not include Trimmomatic is because its settings are more complex. Most of them, you can just give a quality cutoff; Trimmomatic requires multiple parameters so it's hard to grade objectively and plot on that graph where the only variable is quality cutoff.

That said, I would be happy to add it, if you can give me a typical command line.
Brian Bushnell is offline   Reply With Quote
Old 05-27-2014, 06:40 PM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Here's a command line I've used:
Code:
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:1:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:28
I haven't tried optimising it at all (i.e. by adjusting the sliding window quality cutoff), and have yet to experiment with MAXINFO mode. Replacing the sliding window with maximum information would be something like the following:

Code:
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:1:true LEADING:3 TRAILING:3 MAXINFO:50:0.5
[optimisation of this would involve making sure the minimum length of 50bp is sufficient for a good mapping, then adjusting the score strictness to see how it changes mapping]
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO