Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting CASAVA 1.8 qual scores

    Does anybody have a tested program to convert Illumina CASAVA 1.8 qual scores (Phred+33) to the previous version Illumina 1.5+ (Phred+64)?
    Last edited by lpn; 03-02-2012, 10:19 AM.

  • #2
    Given that you're asking for a "tested program", I presume that the Galaxy tool will be good enough:

    Galaxy is a community-driven web-based analysis platform for life science research.

    Comment


    • #3
      What about something that I can run in the command lane?

      Comment


      • #4
        What about something that I can run in the command lene?
        Why do you need a Phred+64 offset and/or the command line?

        Most programs should work with Phred+33 (e.g. append '-Q 33' to the fastx command line).

        Also, Biopython can read files as one format and write them as another:



        Here's a quick python conversion script, derived from an example on that page:

        Code:
        #!/usr/bin/python
        from Bio import SeqIO
        SeqIO.convert("input.fastq", "fastq-sanger", "output.fastq", "fastq-illumina")
        The hashbang isn't strictly needed, and the import is obvious, but it seemed too small with just a single line of code.

        Comment


        • #5
          Gringer thanks for the "-Q 33" heads up for FastX. You solved a major headache of mine. Now only if the Illumina mate-pairs we made were not 88% duplicates... As Homer would say, Doh....

          Comment


          • #6
            If you are still looking for a command line tool for the job, EMBOSS seqret can do this (and the reverse).

            Comment


            • #7
              Originally posted by gringer View Post
              Why do you need a Phred+64 offset and/or the command line?
              Because tophat doesn't seem to handle well Phred+33 (CASAVA 1.8), but works with Phred+64 (CASAVA 1.5+).

              Originally posted by gringer View Post
              Most programs should work with Phred+33 (e.g. append '-Q 33' to the fastx command line).



              Also, Biopython can read files as one format and write them as another:



              Here's a quick python conversion script, derived from an example on that page:

              Code:
              #!/usr/bin/python
              from Bio import SeqIO
              SeqIO.convert("input.fastq", "fastq-sanger", "output.fastq", "fastq-illumina")
              The hashbang isn't strictly needed, and the import is obvious, but it seemed too small with just a single line of code.
              Thanks a lot!

              Comment


              • #8
                Originally posted by lpn View Post
                Because tophat doesn't seem to handle well Phred+33 (CASAVA 1.8), but works with Phred+64 (CASAVA 1.5+).
                This is interesting and doesn't match my experience with tophat on recent Illumina runs. Do you have any "--solexa1.3-quals" options on your command line? Removing that should stop bowtie from using Phred+64, and go back to the default Phred+33.

                There's also the Bowtie "--phred33-quals" option, which I guess you could add to tophat's bowtie call to force this:
                Code:
                nano $(which tophat)

                Comment


                • #9
                  As gringer said if you DO NOT specify a qual flag it will work fine. You are not the only person with HiSeq data which finally encodes the quality values in the standard sanger format for which nearly all programs expect by by default. The flag, for most today, is just for processing legacy datasets.

                  Comment


                  • #10
                    Originally posted by gringer View Post
                    This is interesting and doesn't match my experience with tophat on recent Illumina runs. Do you have any "--solexa1.3-quals" options on your command line? Removing that should stop bowtie from using Phred+64, and go back to the default Phred+33.

                    There's also the Bowtie "--phred33-quals" option, which I guess you could add to tophat's bowtie call to force this:
                    Code:
                    nano $(which tophat)
                    That works, but subsequent analysis produces strange results.
                    Last edited by lpn; 03-03-2012, 07:53 AM.

                    Comment


                    • #11
                      Originally posted by lpn View Post
                      That works, but subsequent analysis produces strange results.
                      What works? I suggested two options (not counting the python code). One was to remove --solexa1.3-quals from the tophat command line, and the other was to modify the bowtie parameters. I was deliberately vague about the second option because you need to know what you're doing before you do it (e.g. change the bowtie options everywhere bowtie is called, and change the tophat code that expects Phred+64 output).
                      Last edited by gringer; 03-03-2012, 11:57 AM.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X