Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastx giving error for Illumina data

    Hi,

    I'm trying to do some quality scoring/trimming on some Illumina, SE RNAseq data, but keep getting an unusual error message from Fastx. The reads are 36bp long, and I suspect that these use the older Illumina quality scores instead of today's ASCII-33, which may be the problem.


    Here is the command and its output:

    fastx_quality_stats -Q33 -i myInput.fastq -o myOutput.fastq.stats
    fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 32868957. Is this a valid FASTQ file?


    This line in question is the last line of the file. I tried paring back the read file to make sure the last read wasn't truncated or something, but I still get the same error.


    If I run the command without the "-Q33" I get the following:

    fastx_quality_stats: Invalid quality score value (char '#' ord 35 quality value -29) on line 4


    I'm eager to try suggestions, so please let me know ! thanks

  • #2
    is it just the lack of a space between the -Q and the 33?

    Comment


    • #3
      Thanks for the suggestion! Actually, -Q33 works just fine. Turns out each read file ended with a newline character which was responsible for confusing fastx.

      Solved!

      Comment


      • #4
        Hi all_your_base,
        I'm extremely new to Linux and working with NGS data and I'm trying to get my hands wet! Like you I'm getting the "expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?" error when using fastx. How did you solve the problem?

        Comment


        • #5
          This error may be related to the file format differences between windows/unix.

          Since ubuntu is likely the most popular *nix people use, here is a link on how to do the conversion: http://ubuntugenius.wordpress.com/20...uxunix-format/

          You may need to install one of the two programs the link references by doing

          Code:
          $ sudo apt-get install flip (or fromdos)

          Comment


          • #6
            Thanks for the response! I installed 'flip' and did the conversion with the following:
            sudo apt-get install flip
            flip -u coralbacteria.fastq

            I then tried the following:
            fastx_quality_stats -i coralbacteria.fastq -o coralbacteria_stats.txt

            but i got:
            fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 62049. Is this a valid FASTQ file?

            Comment


            • #7
              Hi Phoenix,

              Since you're still getting the same error it's most likely not a problem with having windows line encodings in your fastq file.

              What you might want to try is
              Code:
              awk "NR==62049{print;exit}" coralbacteria.fastq
              This will print out line 62049, and you can check to see if it looks like a real fastq sequence header line. To compare, you can try
              Code:
              head -n 1 coralbacteria.fastq
              which will print the first line of your file which appears to be a valid fastq sequence header since fastx didn't raise an issue till line 62049.

              Additionally, you might want to check how many lines you have in your file using
              Code:
              wc -l coralbacteria.fastq
              Divide the number by 4 and it should equal the number of sequences that you have. If you don't already know how many sequences you have, try
              Code:
              grep -c '^@' coralbacteria.fastq
              Only problem you might have with that last command is if you have a quality score line that begins with an @, which I believe is Q30 so it probably isn't that uncommon.

              Comment


              • #8
                One additional thing to try is to use a script posted by Simon Andrews (post #8) in this thread to do a basic check on your sequence file to see if there are any odd problems with it.

                Comment


                • #9
                  wc -l coralbacteria.fastq = 62049
                  grep -c '^@' coralbacteria.fastq = 15512.25

                  this suggested there was a extra line.

                  awk "NR==62049{print;exit}" coralbacteria.fastq = "a blank line"

                  So, to remove line 62049 I used the following:
                  sed -i 62049d coralbacteria.fastq

                  and fastx now runs like a charm!!

                  Cheers mate!!!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X