Seqanswers Leaderboard Ad

**Wallysb01** · 09-20-2012, 01:09 PM

is it just the lack of a space between the -Q and the 33?

**all_your_base** · 09-20-2012, 01:17 PM

Thanks for the suggestion! Actually, -Q33 works just fine. Turns out each read file ended with a newline character which was responsible for confusing fastx.

Solved!

**Phoenix** · 08-06-2013, 10:32 AM

Hi all_your_base,
I'm extremely new to Linux and working with NGS data and I'm trying to get my hands wet! Like you I'm getting the "expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?" error when using fastx. How did you solve the problem?

**GenoMax** · 08-06-2013, 10:46 AM

This error may be related to the file format differences between windows/unix.

Since ubuntu is likely the most popular *nix people use, here is a link on how to do the conversion: http://ubuntugenius.wordpress.com/20...uxunix-format/

You may need to install one of the two programs the link references by doing

Code:

$ sudo apt-get install flip (or fromdos)

**Phoenix** · 08-06-2013, 03:35 PM

Thanks for the response! I installed 'flip' and did the conversion with the following:
sudo apt-get install flip
flip -u coralbacteria.fastq

I then tried the following:
fastx_quality_stats -i coralbacteria.fastq -o coralbacteria_stats.txt

but i got:
fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?

**mcnelson.phd** · 08-06-2013, 04:23 PM

Hi Phoenix,

Since you're still getting the same error it's most likely not a problem with having windows line encodings in your fastq file.

What you might want to try is

Code:

awk "NR==62049{print;exit}" coralbacteria.fastq

This will print out line 62049, and you can check to see if it looks like a real fastq sequence header line. To compare, you can try

Code:

head -n 1 coralbacteria.fastq

which will print the first line of your file which appears to be a valid fastq sequence header since fastx didn't raise an issue till line 62049.

Additionally, you might want to check how many lines you have in your file using

Code:

wc -l coralbacteria.fastq

Divide the number by 4 and it should equal the number of sequences that you have. If you don't already know how many sequences you have, try

Code:

grep -c '^@' coralbacteria.fastq

Only problem you might have with that last command is if you have a quality score line that begins with an @, which I believe is Q30 so it probably isn't that uncommon.

**GenoMax** · 08-06-2013, 05:01 PM

One additional thing to try is to use a script posted by Simon Andrews (post #8) in this thread to do a basic check on your sequence file to see if there are any odd problems with it.

**Phoenix** · 08-06-2013, 06:53 PM

wc -l coralbacteria.fastq = 62049
grep -c '^@' coralbacteria.fastq = 15512.25

this suggested there was a extra line.

awk "NR==62049{print;exit}" coralbacteria.fastq = "a blank line"

So, to remove line 62049 I used the following:
sed -i 62049d coralbacteria.fastq

and fastx now runs like a charm!!

Cheers mate!!!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Fastx giving error for Illumina data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News