SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cuffmerge command giving error- TypeError: 'str' object is not callable kumardeep RNA Sequencing 1 05-21-2012 04:21 AM
FASTX quality plot error vinay052003 Bioinformatics 2 11-18-2011 07:26 AM
Fastx Toolkit for Quality Stats of data from new illumina pipeline software vedbar Bioinformatics 6 09-19-2011 09:50 AM
Dindel giving error for every candidate indel gaffa Bioinformatics 4 03-14-2011 11:18 AM
MIRA output for Illumina mapping giving 100% coverage! Kasycas Bioinformatics 1 09-10-2010 03:45 AM

Reply
 
Thread Tools
Old 09-20-2012, 12:20 PM   #1
all_your_base
Member
 
Location: USA

Join Date: Mar 2012
Posts: 40
Default Fastx giving error for Illumina data

Hi,

I'm trying to do some quality scoring/trimming on some Illumina, SE RNAseq data, but keep getting an unusual error message from Fastx. The reads are 36bp long, and I suspect that these use the older Illumina quality scores instead of today's ASCII-33, which may be the problem.


Here is the command and its output:

fastx_quality_stats -Q33 -i myInput.fastq -o myOutput.fastq.stats
fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 32868957. Is this a valid FASTQ file?


This line in question is the last line of the file. I tried paring back the read file to make sure the last read wasn't truncated or something, but I still get the same error.


If I run the command without the "-Q33" I get the following:

fastx_quality_stats: Invalid quality score value (char '#' ord 35 quality value -29) on line 4


I'm eager to try suggestions, so please let me know ! thanks
all_your_base is offline   Reply With Quote
Old 09-20-2012, 01:09 PM   #2
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

is it just the lack of a space between the -Q and the 33?
Wallysb01 is offline   Reply With Quote
Old 09-20-2012, 01:17 PM   #3
all_your_base
Member
 
Location: USA

Join Date: Mar 2012
Posts: 40
Default

Thanks for the suggestion! Actually, -Q33 works just fine. Turns out each read file ended with a newline character which was responsible for confusing fastx.

Solved!
all_your_base is offline   Reply With Quote
Old 08-06-2013, 10:32 AM   #4
Phoenix
Junior Member
 
Location: UK

Join Date: Aug 2013
Posts: 6
Default

Hi all_your_base,
I'm extremely new to Linux and working with NGS data and I'm trying to get my hands wet! Like you I'm getting the "expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?" error when using fastx. How did you solve the problem?
Phoenix is offline   Reply With Quote
Old 08-06-2013, 10:46 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,075
Default

This error may be related to the file format differences between windows/unix.

Since ubuntu is likely the most popular *nix people use, here is a link on how to do the conversion: http://ubuntugenius.wordpress.com/20...uxunix-format/

You may need to install one of the two programs the link references by doing

Code:
$ sudo apt-get install flip (or fromdos)
GenoMax is offline   Reply With Quote
Old 08-06-2013, 03:35 PM   #6
Phoenix
Junior Member
 
Location: UK

Join Date: Aug 2013
Posts: 6
Default

Thanks for the response! I installed 'flip' and did the conversion with the following:
sudo apt-get install flip
flip -u coralbacteria.fastq

I then tried the following:
fastx_quality_stats -i coralbacteria.fastq -o coralbacteria_stats.txt

but i got:
fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?

Phoenix is offline   Reply With Quote
Old 08-06-2013, 04:23 PM   #7
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Hi Phoenix,

Since you're still getting the same error it's most likely not a problem with having windows line encodings in your fastq file.

What you might want to try is
Code:
awk "NR==62049{print;exit}" coralbacteria.fastq
This will print out line 62049, and you can check to see if it looks like a real fastq sequence header line. To compare, you can try
Code:
head -n 1 coralbacteria.fastq
which will print the first line of your file which appears to be a valid fastq sequence header since fastx didn't raise an issue till line 62049.

Additionally, you might want to check how many lines you have in your file using
Code:
wc -l coralbacteria.fastq
Divide the number by 4 and it should equal the number of sequences that you have. If you don't already know how many sequences you have, try
Code:
grep -c '^@' coralbacteria.fastq
Only problem you might have with that last command is if you have a quality score line that begins with an @, which I believe is Q30 so it probably isn't that uncommon.
mcnelson.phd is offline   Reply With Quote
Old 08-06-2013, 05:01 PM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,075
Default

One additional thing to try is to use a script posted by Simon Andrews (post #8) in this thread to do a basic check on your sequence file to see if there are any odd problems with it.
GenoMax is offline   Reply With Quote
Old 08-06-2013, 06:53 PM   #9
Phoenix
Junior Member
 
Location: UK

Join Date: Aug 2013
Posts: 6
Default

wc -l coralbacteria.fastq = 62049
grep -c '^@' coralbacteria.fastq = 15512.25

this suggested there was a extra line.

awk "NR==62049{print;exit}" coralbacteria.fastq = "a blank line"

So, to remove line 62049 I used the following:
sed -i 62049d coralbacteria.fastq

and fastx now runs like a charm!!

Cheers mate!!!
Phoenix is offline   Reply With Quote
Reply

Tags
ascii33, fastx, illumina, quality scores, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO