SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastX Trimmer takes too long and how to do multiple files? billstevens Bioinformatics 11 02-26-2015 10:33 AM
Is FastX Trimmer supposed to take a long time to trim the fastq file? prs321 Bioinformatics 4 03-10-2014 08:48 AM
Quality trimmer tahamasoodi Bioinformatics 4 11-14-2013 09:28 PM
Fastx Trimmer : invalid sequence Oliviervg Bioinformatics 0 03-09-2012 12:56 AM
fastx quality score madsaan Bioinformatics 2 01-12-2011 08:55 AM

Reply
 
Thread Tools
Old 03-07-2014, 01:17 PM   #1
balsampoplar
Member
 
Location: Maryland

Join Date: Mar 2014
Posts: 14
Exclamation fastx quality trimmer and gzipped fastq

Hello -

I am using fastq_quality_trimmer tool from the Fastx toolkit to do some quality control on my gzipped fastq file. However, I keep getting wrong file type error. Are the fastq files to be gunzipped prior to using them with this tool?

Edit 1: I figured out that I needed to use zcat to feed the data to this tool as follows:

$> zcat sequences_fastq.gz | fastq_quality_trimmer -t 30 -l 35 -z -o sequences_trimmed_fastq.gz

This results in the following error:
fastq_quality_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 4

I am not quite sure if I understand what's causing this.

Last edited by balsampoplar; 03-07-2014 at 01:37 PM.
balsampoplar is offline   Reply With Quote
Old 03-07-2014, 01:52 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

This is because fastx is obsolete and requires everything to use ASCII-64 qualities, but your data is modern and uses ASCII-33 qualities.

I suggest you use a different trimmer. For example, from the BBTools package:

reformat.sh in=sequences_fastq.gz out=trimmed.fq.gz qtrim=rl trimq=20 minlen=40

That will trim the left and right sides of reads to Q20 using the Phred method (superior to fastx's method), and discard resulting reads that are shorter than 40 after trimming (that parameter is optional).

I've attached a study I did recently on quality trimming using various trimmers. Note that each point on the graphs represents trimming with a different threshold, so the lower-left point for an algorithm is trimming to Q40, and the upper-right is trimming to Q0. 30 is generally way too high a trimming threshold for any purposes, by the way.
Attached Files
File Type: ppt BB_QualityTrim.ppt (1.28 MB, 15 views)

Last edited by Brian Bushnell; 03-07-2014 at 01:55 PM.
Brian Bushnell is offline   Reply With Quote
Old 03-08-2014, 06:05 AM   #3
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by Brian Bushnell View Post
This is because fastx is obsolete and requires everything to use ASCII-64 qualities.
This is not true, the FastX toolkit has had the '-Q 33' option for Sanger FASTQ data for maybe 4 years now. Though, there is the eternal question of why these things are not documented......but it's pretty common knowledge now. You will see this discussed quite a bit on seqanswers and biostars.
SES is offline   Reply With Quote
Old 03-08-2014, 08:06 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by SES View Post
This is not true, the FastX toolkit has had the '-Q 33' option for Sanger FASTQ data for maybe 4 years now. Though, there is the eternal question of why these things are not documented......but it's pretty common knowledge now. You will see this discussed quite a bit on seqanswers and biostars.
Oh, thanks for the correction; that would have made my testing easier. It should really be the default, otherwise anyone running modern data will get an immediate, mysterious crash.
Brian Bushnell is offline   Reply With Quote
Old 03-10-2014, 06:53 AM   #5
balsampoplar
Member
 
Location: Maryland

Join Date: Mar 2014
Posts: 14
Default

Thank you Brian and SES. As it turns out, I had to use both -Q and -t flags in order to successfully run the fastq_quality_trimmer. Using one of the other generated input formatting errors.

Unfortunately, the documentation of the program makes no mention of the -Q call. In fact if you access the help menu, -Q does not even show up. Also, what is the rationale behind having to use both flags? Don't they mean the same thing?
balsampoplar is offline   Reply With Quote
Reply

Tags
fastx toolkit

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO