Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • feng
    Member
    • Oct 2010
    • 50

    how to convert general fastq to fastq int format?

    Is there any one using FASTX? It need fastq int format. Do you know how to convert general fastq into fastq int? Many thanks for any suggestion.
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    You're not talking about Bill Pearson's tool FASTX, see Pearson et al (1997). Comparison of DNA sequences with protein sequences.


    You probably aren't talking about the FASTX-Toolkit either, since that supports multiple FASTQ variants.


    What are you talking about?

    Comment

    • feng
      Member
      • Oct 2010
      • 50

      #3
      I mean the second one. I tried to use Fastx to trim reads in fastq. It seems this program need fastq (int)? It there any more parameter for this?

      Thanks.

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        The FASTX-Toolkit can read standard FASTQ files with ASCII qualities

        Could you tell us the command line you are trying to use that fails, and show us the first few reads of your FASTQ file (using the [ code ] and [ /code ] tags in the forum, or the # button on the advanced editor).

        Comment

        • feng
          Member
          • Oct 2010
          • 50

          #5
          Hi, I did

          $ ./fastq_quality_trimmer -t 20 -l 30 -i sra_data.fastq -o sra_data.fastq.quality.trimmed

          fastq_quality_trimmer: Invalid quality score value (char '*' ord 42 quality value -22) on line 12

          the fist 12 lines of reads

          @SRR001030.1.1 Hela.tar.gz:8:1:328:133.1 length=27
          TCGAGATTTCTACAGTCCTTCGATAAC
          +SRR001030.1.1 Hela.tar.gz:8:1:328:133.1 length=27
          IIIIIIIIIII8IIIIIIIIIIIII4I
          @SRR001030.2.1 Hela.tar.gz:8:1:96:66.1 length=27
          ATGTACGGTAAATGGAAAAAAAAAAAA
          +SRR001030.2.1 Hela.tar.gz:8:1:96:66.1 length=27
          IIIIIIIIIIIIIIIIIIIIIIIIIII
          @SRR001030.3.1 Hela.tar.gz:8:1:400:280.1 length=27
          TCGGATGCCTACTTCTGCTTGAAAACA
          +SRR001030.3.1 Hela.tar.gz:8:1:400:280.1 length=27
          IIIIIIIIIIIIIII*&IIIIII/III


          Any suggestion?

          Comment

          • simonandrews
            Simon Andrews
            • May 2009
            • 870

            #6
            Are you using an older version of the toolkit? The files you showed are valid FastQ and use the Sanger encoding method. The FastX download page shows that automatic encoding detection was introduced in v0.0.13 so if you're using a version which is older than that it might be assuming an Illumina encoding.

            Comment

            • maubp
              Peter (Biopython etc)
              • Jul 2009
              • 1544

              #7
              I think as Simon suggests your (old) version of FASTX-toolkit is probably assuming you have Solexa/Illumina FASTQ (which have a narrow range of allowed characters), but you have Sanger FASTQ. Try the FASTX command line option -Q 33 here.

              Comment

              • feng
                Member
                • Oct 2010
                • 50

                #8
                Thanks

                I used -Q 33. It works. Many thanks again.

                Comment

                • golharam
                  Member
                  • Dec 2009
                  • 55

                  #9
                  i'm running into the same problem. I just got read off an Illumina GAIIx. Here are the first few lines:

                  @GEN-SEQ-ANA_0012:2:1:1562:1167#0/1
                  GAATACGTTCGCGTCACACAGTATCAACGGAAGCGGGTAAATGAAGGCGACACAGGGGATAAGCAGGGTTTCATGAAGTATCTTGGGCACGTGCCAGCGAG
                  +
                  A;-A4=B:?0?2?########################################################################################
                  @GEN-SEQ-ANA_0012:2:1:1626:1169#0/1
                  GAGGAAGGCGGTTTTGAAGGAGAGGGGAGGCTTTCGGACCAAGGGAAGGAAGGGAGGGTAAGAAAAGGAAAAAGAATTTGTGAGGGAGAAGGGTTTTTATC
                  +
                  D@EB:?DD;BF=EEEE>@BB4A;BAA;';;/??88AA################################################################
                  @GEN-SEQ-ANA_0012:2:1:1959:1166#0/1
                  GTGAGGGGATGTTCACTAGCTTGCCTACTTCGTCGAAGATCAGCTTGGCCTGGGTATTCGCGGTCCCTGCTGTTTTAAAGTTGGCGCCTGCTGCGTCCGCT
                  +
                  @6@)@B@<(?EBDBEBGBDB?B8BE?8,B0:A#####################################################################


                  When I try to run:

                  gunzip -dc s_2_reads_passed_filter.fastq.gz | fastx_quality_stats

                  I get the error

                  fastx_quality_stats: Invalid quality score value (char '-' ord 45 quality value -19) on line 4

                  I'm running the latest version:


                  [golharam@vail input]$ fastx_quality_stats -h
                  usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE]
                  Part of FASTX Toolkit 0.0.13 by A. Gordon ([email protected])

                  Comment

                  • maubp
                    Peter (Biopython etc)
                    • Jul 2009
                    • 1544

                    #10
                    Have you tried the solution discussed above? This tells FASTX to treat the qualities as Sanger FASTQ...

                    gunzip -dc s_2_reads_passed_filter.fastq.gz | fastx_quality_stats -Q 33

                    Are you sure your FASTQ files are in the original form from your Illumina GAIIx? Do you know what version of the pipeline it was?

                    Comment

                    • kmcarr
                      Senior Member
                      • May 2008
                      • 1181

                      #11
                      The long strings of # are a give away that this FASTQ is encoded in the standard Sanger format (Phred + 33). '#' is ASCII 35; if this was still Illumina format (Phred + 64) these would be 'B' which is the tell tale Illumina Quality Control Indicator.

                      Do as maubp and others have suggested and add the -Q33 option to your fastx command.

                      In feng and golharam's defense the -Q parameter to the fastx commands is not documented and does not appear in the help message. It is only discoverable by reading the source code (or the very helpful replies on SeqAnswers). The authors of the fastx toolkit could really help by documenting this option.

                      Comment

                      • maubp
                        Peter (Biopython etc)
                        • Jul 2009
                        • 1544

                        #12
                        Originally posted by kmcarr View Post
                        In feng and golharam's defense the -Q parameter to the fastx commands is not documented and does not appear in the help message. It is only discoverable by reading the source code (or the very helpful replies on SeqAnswers). The authors of the fastx toolkit could really help by documenting this option.
                        The -Q option is on http://hannonlab.cshl.edu/fastx_toolkit/ as part of a release announcement, but otherwise I agree with you.

                        Comment

                        • kmcarr
                          Senior Member
                          • May 2008
                          • 1181

                          #13
                          Originally posted by maubp View Post
                          The -Q option is on http://hannonlab.cshl.edu/fastx_toolkit/ as part of a release announcement, but otherwise I agree with you.
                          Thanks for the pointer, I had missed that.

                          Comment

                          • MQ-BCBB
                            Member
                            • May 2009
                            • 25

                            #14
                            Thanks, thanks for the -Q 33 tip. So helpful!

                            Comment

                            • golharam
                              Member
                              • Dec 2009
                              • 55

                              #15
                              Agreed. The toolkit should really add that as a documented parameter to come up when running from the shell.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              61 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...