SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting FASTX installed Seta Bioinformatics 4 05-24-2017 09:44 AM
FastX-toolkit liu_xt005 Bioinformatics 13 10-11-2014 04:52 AM
Fastx collapsed frymor Bioinformatics 0 02-10-2011 01:36 AM
fastx quality score madsaan Bioinformatics 2 01-12-2011 08:55 AM
fastx newbie madsaan Bioinformatics 0 01-10-2011 10:03 AM

Reply
 
Thread Tools
Old 02-09-2011, 08:44 AM   #1
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default FastX: fastq_quality_filter problem

Hi,
I'm trying to filter reads using fastx toolkit.
Command is the following:
fastq_quality_filter -Q33 -q 20 -p 100 -v -i filename_1 -o filename_2

However, I often get two kinds of error message:
Segmentation fault (core dumped)
or
fastq_quality_filter: bug: got empty array at fastq_quality_filter.c:97

It seems to me that segmentation fault I get for files with more than (roughly) 50-60M reads
Is there any limitation for the tool fastq_quality_filter? Any ideas about this issue?
thanks.
ElMichael is offline   Reply With Quote
Old 02-09-2011, 02:35 PM   #2
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default

Well, I found the cause. It happened I have mixed fastq files, some with the Sanger format quality, and some with Illumina 1.5+. If you use Illumina 1.5+ without parameter -Q33 you get an error message "fastq_quality_filter: Invalid quality score value..." that clearly indicates pitfall.
But in the opposite case, when you wrongly use -Q33 for reads with Illumina quality format, error messages like
Segmentation fault (core dumped)
or
fastq_quality_filter: bug: got empty array at fastq_quality_filter.c:97
do not give a clue what is going wrong.
ElMichael is offline   Reply With Quote
Old 08-14-2012, 06:23 PM   #3
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default

Just wanted to add to this, as I got the same error message from fastq_quality_filter, but I knew that I had the correct quality format because it worked on a differently processed version of the same library.

It turned that I had reads in my library that were empty after removing adapter sequences with cutadapt. Once I got ride of the empty entries, it fixed the issue with fastq_quality_filter.
kerhard is offline   Reply With Quote
Old 08-17-2012, 08:21 AM   #4
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by kerhard View Post
Just wanted to add to this, as I got the same error message from fastq_quality_filter, but I knew that I had the correct quality format because it worked on a differently processed version of the same library.

It turned that I had reads in my library that were empty after removing adapter sequences with cutadapt. Once I got ride of the empty entries, it fixed the issue with fastq_quality_filter.
The fastq_quality_filter help screen does not list the -Q parameter. What is it for and what does Q33 mean. The reason I ask is that I use -q to set minimum quality, but without the -Q33 parameter, I also get the errors you received.

Joe White
jwhite is offline   Reply With Quote
Old 08-17-2012, 10:18 AM   #5
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default

Yeah, I found out about that -Q parameter on SEQanswers, it's "undocumented" in the Fastx toolkit. If the quality scores for your libraries are in the fastq sanger format (ascii(phred+33)), rather than the fastq illumina format (ascii(phred+64)), you would use the -Q33 parameter. fastq_quality_filter automatically assumes fastq illumina quality scores. See here for original explanation:

http://seqanswers.com/forums/showthread.php?t=6701
kerhard is offline   Reply With Quote
Old 08-17-2012, 10:37 AM   #6
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by kerhard View Post
Yeah, I found out about that -Q parameter on SEQanswers, it's "undocumented" in the Fastx toolkit. If the quality scores for your libraries are in the fastq sanger format (ascii(phred+33)), rather than the fastq illumina format (ascii(phred+64)), you would use the -Q33 parameter. fastq_quality_filter automatically assumes fastq illumina quality scores. See here for original explanation:

http://seqanswers.com/forums/showthread.php?t=6701
Thanks! That should be documented.
jwhite is offline   Reply With Quote
Old 08-22-2012, 05:29 AM   #7
Chirag
Member
 
Location: Norway

Join Date: Nov 2011
Posts: 23
Default

Hi all,
I would like to add one question here regarding fastq_quality_filter

I used the command:

fastq_quality_filter -i R1_QC.fastq -o R1_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
fastq_quality_filter -i R2_QC.fastq -o R2_QC_Filter.fastq -q 20 -p 80 -Q 33 -v

The result is fine. But the number of reads left in each of the pair-end file is different.

When i do further trimming (or any other preprocessing) and eventually mapping, does it have to be that both end of pair end reads have to be present ?

Thank you for your help in advance !
cheers
CN
Chirag is offline   Reply With Quote
Old 08-22-2012, 08:21 AM   #8
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by Chirag View Post
Hi all,
I would like to add one question here regarding fastq_quality_filter

I used the command:

fastq_quality_filter -i R1_QC.fastq -o R1_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
fastq_quality_filter -i R2_QC.fastq -o R2_QC_Filter.fastq -q 20 -p 80 -Q 33 -v

The result is fine. But the number of reads left in each of the pair-end file is different.

When i do further trimming (or any other preprocessing) and eventually mapping, does it have to be that both end of pair end reads have to be present ?
Yes they do! This is a constant problem when trimming paired end data. I have switched to using Trimmomatic which trims reads in a "pair aware" manner, which is to say it outputs four files: R1 and R2 which are still paired, plus an R1 singleton and R2 singleton file.
kmcarr is offline   Reply With Quote
Old 08-22-2012, 09:08 AM   #9
Chirag
Member
 
Location: Norway

Join Date: Nov 2011
Posts: 23
Default

Thanks Kmcarr !!!
I will try that tool and see how it works.

In the mean while, i have posted one question about over-represented Kmers at
http://seqanswers.com/forums/showthr...2173#post82173
Could you please help if you have better understanding about it.


regards
CN
Chirag is offline   Reply With Quote
Old 11-07-2012, 01:17 AM   #10
monkey_SEQ
Junior Member
 
Location: South Africa

Join Date: Oct 2011
Posts: 6
Default

Quote:
Originally Posted by kerhard View Post
Just wanted to add to this, as I got the same error message from fastq_quality_filter, but I knew that I had the correct quality format because it worked on a differently processed version of the same library.

It turned that I had reads in my library that were empty after removing adapter sequences with cutadapt. Once I got ride of the empty entries, it fixed the issue with fastq_quality_filter.
Hi Kerhard, I have the same problem. Could you please tell me how you got rid of the empty entries?
monkey_SEQ is offline   Reply With Quote
Old 11-08-2012, 06:34 PM   #11
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default removing empty entries

Quote:
Originally Posted by monkey_SEQ View Post
Hi Kerhard, I have the same problem. Could you please tell me how you got rid of the empty entries?
In cutadapt, the program I used to remove adapter sequences from the reads, there is a parameter (-m, --minimum-length) that allows you to remove reads that are too short after removing the adapter.

For example -m 20 would only give you reads that are 20 bp after removing the adapter. This would of course exclude any adapter only reads, the empty entries.
kerhard is offline   Reply With Quote
Old 11-08-2012, 11:19 PM   #12
monkey_SEQ
Junior Member
 
Location: South Africa

Join Date: Oct 2011
Posts: 6
Default

Quote:
Originally Posted by kerhard View Post
In cutadapt, the program I used to remove adapter sequences from the reads, there is a parameter (-m, --minimum-length) that allows you to remove reads that are too short after removing the adapter.

For example -m 20 would only give you reads that are 20 bp after removing the adapter. This would of course exclude any adapter only reads, the empty entries.
Wow! Thanks that worked perfectly! So simple

I am also using the quality parameter (-q, --quality-cutoff) of cutadapt to remove low quality bases from the ends of the reads. But it seems that this parameter only removes bases from the 3'end of the read. What program can I use to also filter the 5'ends of reads and even in the middle of the read?
monkey_SEQ is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO