Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastx giving error for Illumina data

    Hi,

    I'm trying to do some quality scoring/trimming on some Illumina, SE RNAseq data, but keep getting an unusual error message from Fastx. The reads are 36bp long, and I suspect that these use the older Illumina quality scores instead of today's ASCII-33, which may be the problem.


    Here is the command and its output:

    fastx_quality_stats -Q33 -i myInput.fastq -o myOutput.fastq.stats
    fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 32868957. Is this a valid FASTQ file?


    This line in question is the last line of the file. I tried paring back the read file to make sure the last read wasn't truncated or something, but I still get the same error.


    If I run the command without the "-Q33" I get the following:

    fastx_quality_stats: Invalid quality score value (char '#' ord 35 quality value -29) on line 4


    I'm eager to try suggestions, so please let me know ! thanks

  • #2
    is it just the lack of a space between the -Q and the 33?

    Comment


    • #3
      Thanks for the suggestion! Actually, -Q33 works just fine. Turns out each read file ended with a newline character which was responsible for confusing fastx.

      Solved!

      Comment


      • #4
        Hi all_your_base,
        I'm extremely new to Linux and working with NGS data and I'm trying to get my hands wet! Like you I'm getting the "expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?" error when using fastx. How did you solve the problem?

        Comment


        • #5
          This error may be related to the file format differences between windows/unix.

          Since ubuntu is likely the most popular *nix people use, here is a link on how to do the conversion: http://ubuntugenius.wordpress.com/20...uxunix-format/

          You may need to install one of the two programs the link references by doing

          Code:
          $ sudo apt-get install flip (or fromdos)

          Comment


          • #6
            Thanks for the response! I installed 'flip' and did the conversion with the following:
            sudo apt-get install flip
            flip -u coralbacteria.fastq

            I then tried the following:
            fastx_quality_stats -i coralbacteria.fastq -o coralbacteria_stats.txt

            but i got:
            fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?

            Comment


            • #7
              Hi Phoenix,

              Since you're still getting the same error it's most likely not a problem with having windows line encodings in your fastq file.

              What you might want to try is
              Code:
              awk "NR==62049{print;exit}" coralbacteria.fastq
              This will print out line 62049, and you can check to see if it looks like a real fastq sequence header line. To compare, you can try
              Code:
              head -n 1 coralbacteria.fastq
              which will print the first line of your file which appears to be a valid fastq sequence header since fastx didn't raise an issue till line 62049.

              Additionally, you might want to check how many lines you have in your file using
              Code:
              wc -l coralbacteria.fastq
              Divide the number by 4 and it should equal the number of sequences that you have. If you don't already know how many sequences you have, try
              Code:
              grep -c '^@' coralbacteria.fastq
              Only problem you might have with that last command is if you have a quality score line that begins with an @, which I believe is Q30 so it probably isn't that uncommon.

              Comment


              • #8
                One additional thing to try is to use a script posted by Simon Andrews (post #8) in this thread to do a basic check on your sequence file to see if there are any odd problems with it.

                Comment


                • #9
                  wc -l coralbacteria.fastq = 62049
                  grep -c '^@' coralbacteria.fastq = 15512.25

                  this suggested there was a extra line.

                  awk "NR==62049{print;exit}" coralbacteria.fastq = "a blank line"

                  So, to remove line 62049 I used the following:
                  sed -i 62049d coralbacteria.fastq

                  and fastx now runs like a charm!!

                  Cheers mate!!!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X