Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Base qualities for Illumina Sequencing

    Hi all,

    I have a question regarding the base qualities for Illumina Sequencing that I am trying to wrap my head around. I have qseq files and according to the qseq read me file it says "quality: the calibrated quality string (encoded in ASCII as 64+score)."

    I've been doing some research and Wikipedia states that Illumina has its own way of encoding base qualities, but as of Illumina 1.3+ (assuming they mean the pipeline version) Illumina has switched from using Solexa/Illumina qualities to Phred qualities. And as of Illumina1.8+, the base qualities are the same as Sanger now which is Phred+33.

    I was told my qseq files were generated through the Illumina1.8+, but the base qualities are ASCII as 64+score which to me makes no sense. Since Illumina should be Phred+33 and not Phred+64. Additional, it is ambiguous because it doesn't say whether they are phred or Solexa/Illumina base qualities since both are encoded in ASCII.

    Can anyone shed a light on the discrepancy?

    Thanks!

  • #2
    If your files were generated with illumina pipeline v.1.8.x then the quality scores should be in the sanger fastq format.

    It is possible that your sequencing facility switched to using the new pipeline recently and the relevant help files may not have been updated.

    Edited: Indeed. As HESmith indicates below this data must have been produced with an earlier version of pipeline if "qseq" files were produced. You should contact your sequencing facility and double-check.
    Last edited by GenoMax; 11-04-2011, 11:39 AM.

    Comment


    • #3
      Actually, it's more complicated than GenoMax indicates for CASAVA v1.8. The fastq files are indeed Phred+33, but the export files are Phred+64. If the quality scores contain any lower case characters (i.e., >96) it's Phred+64.

      Also, CASAVA v1.8 doesn't produce qseq files; they've been replaced by gzipped fastq. It sounds like your data were generated with an earlier version.
      Last edited by HESmith; 11-04-2011, 11:30 AM. Reason: corrected typos

      Comment


      • #4
        Thanks for the replies. Now I am even more confused. I am not 100% sure this is CASAVA 1.8 to be exact. The response I got was:

        Illumina Pipeline version 1.8.0

        But I can't see how this can't be CASAVA 1.8. Yes I remember reading how qseq files aren't being produced and they should be fastq. But I see no evidence of fastq files. I do have .bam files which were aligned using bwa. But that aligner, to my understanding, does not use base qualities during the alignment.

        Comment


        • #5
          Originally posted by fongchun View Post
          I was told my qseq files were generated through the Illumina1.8+, but the base qualities are ASCII as 64+score which to me makes no sense. Since Illumina should be Phred+33 and not Phred+64. Additional, it is ambiguous because it doesn't say whether they are phred or Solexa/Illumina base qualities since both are encoded in ASCII.
          Three or four different varieties of q-score encoding, multiple software pipeline names and separate version numbering schemes -- Illumina has left researchers with an inscrutable problem simply trying to figure out what format their data are in.

          First, there isn't really any software called Illumina 1.8. What is meant here is CASAVA 1.8. Besides the CASAVA pipeline, Illumina also has (or had) RTA (Real Time Analysis which runs on the instrument computer) and OLB (OffLine Basecalling, no longer really used). You say you have QSEQ files but Illumina has stopped using these, CASAVA 1.8 does not generate them so it seems unlikely that your QSEQs were generated by CASAVA 1.8. It is possible that they were created by OLB 1.8 (the second to last version of OLB) which did generate QSEQs by default and did use ASCII + 64 encoding of Phred scale quality scores. The Solexa Q-Scores (as opposed to Phred) haven't been used in long time so it is probably safe to assume your data is not in this format.

          If you haven't already seen it look at the FASTQ article on Wikipedia, it contains a lot of useful information about the various quality encodings use by Illumina.

          Comment


          • #6
            Thanks for the reply kmcarr. That helped a bit. I digged a little further and it appears that it was CASAVA 1.8, but actually:

            Illumina 1.10.0 RTA 1.10.36.0

            I don't get the version number, but it appears that it is RTA (Real Time Analysis) so it appears that maybe that explains why I am getting qseq and not fastq files. Anyone have any information on what "Illumina 1.10.0 RTA 1.10.36.0" is?

            Comment


            • #7
              Hi fongchun,

              did you know which version your SCS at the Sequence-PC is?
              Or how get you the qseq files?
              I suppose that old versions through qseq files (and so it must be phred + 64), the new version create bcl, stats, pos and so on files which could be translate with CASAVA.
              From CASAVA 1.8+ the phred is +33 (OLB is still +64).
              Also you can look at the user guides for the tools, respectively.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X