Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Colin

    Originally posted by sparks View Post
    Hi Tony,
    Thanks for the that, I'm sure you are right though some Illumina documentation being sent out with export files still talks about -5 being a valid quality value so you guys should check your documentation.
    OK, thanks for bringing that to my attention.

    Originally posted by sparks View Post
    I've also noticed in the qseq files I have that the lowest code is a B which translates to a Phred score of 2. This happens even for bases called as '.'. If Perr was 0.75 then Phred would be 1.24 so it looks like you round up to 2. This is might be of interest to people who are using qualities in alignment and in SNP calling. I did like the previous Solexa scale as it gave a finer resolution for higher Perr values.

    Thanks again., Colin
    Yes, Phred Q1 translates to just over 20% probability of the called base being correct. In the absence of further information, the natural assumption is that the three non-called bases are equiprobable, but that then means for a Q1 base the three non-called bases are each more likely than the called base - this can mess up your stats! It probably doesn't matter so much what the Q-value of an 'N' is set to, but I guess they are being set to Q2 for consistency.

    Personally I've tended to find that if the error probability is higher enough for the divergence of the scoring schemes to be an issue then the base is probably best ignored for many purposes.

    There are certainly plusses and minuses to both scoring schemes. The original reason for going with the 'Solexa' log-odds scheme was that, unlike the Phred scheme, it naturally extends to a 4-values-per-base scoring scheme. We've ended up using only a single value per base, but I know some folks in the community remain keen on having more than one qv per base.

    Cheers

    Tony

    Comment


    • #17
      Originally posted by coxtonyj View Post
      Hi Colin

      You have it spot on, they are now in Phred format. Just to state it fully for the benefit of others: ASCII='@'+10*log10(1/p), p being the estimated probability of the base being in error. This change was made as of Pipeline 1.3.

      Cheers

      Tony
      Out of curiosity why did you stick with ASCII(Q+64) instead of the standard ASCII(Q+33)? It results in the minor annoyance of having to remember to convert before use in programs which are expecting Sanger FASTQ. It also means that there are now three types of FASTQ files floating about; standard Sanger FASTQ with quality scores expressed as ASCII(Qphred+33), Solexa FASTQ with ASCII(Qsolexa+64) and Solexa FASTQ with ASCII(Qphred+64).
      Last edited by kmcarr; 02-26-2009, 01:39 PM. Reason: Added thought

      Comment


      • #18
        That is a fair point. The need to convert has always been present of course. We did give this some thought at the time and as I recall the rationale was that any code (ours or others) that was expecting Qsolexa+64 would probably still work if given Qphred+64, but that the conversion to Qphred+33 was at least now just a simple subtraction. But perhaps we should have bitten the bullet and gone with Qphred+33.

        Comment


        • #19
          sol2sanger

          Hi,

          Just want to be sure here:

          1 - Is the sol2sanger function of maq 0.7.1 not working for solexa pipeline 1.3?

          2 - If not, how can I convert the scores that I already computed (sol2sanger of maq 0.7.1 with solexa pipeline 1.3) to the sanger phred score system?


          Best regards,
          João

          Comment


          • #20
            non unique sequences in sorted.txt file?

            When working in a tag counting context there will be many instances of a given read sequence (e.g. for digital gene expression). I have noticed an odd behavior from eland/GA pipeline from glancing at the s_N_sorted.txt files (SE reads). There are cases where eland reports different locations for a specific sequence but the pipeline still includes it as part of the sorted.txt file. Could this be due to differences in base quality for different instances of the sequence or perhaps even the way the genome was squashed? Has any else seen this?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X