Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • another solexa/phred question

    I am trying to identify what quality format my reads are in, and I can't seem to find a clear answer online. Thanks for the help:

    @HWI-EAS432:1:1:4:99#0/1
    TCTTATCAGTTTAATATCTGATACGTCATCTATTTGAGTACTATATATTAAATGGATTTT
    +HWI-EAS432:1:1:4:99#0/1
    B>;?A8-(:6=A873=A@/:AA18<7(<6:.6=.036&-=<4:2242:=5/<,&3/?16
    @HWI-EAS432:1:1:4:866#0/1
    GGTTTCGCTAGATAGTAGGTAGGGACAGTGGGAATCTCGTTCATCCATTCATGCGCGTCA
    +HWI-EAS432:1:1:4:866#0/1
    83?CCB@ABB@AC?91?9A6>?:9@B?1>7/-<>98;<2<=;B@B6BB?>);.7(7+5BB
    @HWI-EAS432:1:1:4:844#0/1
    TGCTACCCCTCTATTCTGCCATGGTTAGACCACACCTAGAGTATTGTGTCCAATTCTGGG
    +HWI-EAS432:1:1:4:844#0/1
    @1ABBBBACBBBCBAA=/@BBBA1:A@/@BBBB@<@B@4946B@8:>99A3<=??A%%%%

  • #2
    This FASTQ file is standard Sanger quality encoding, which means take the ASCII value of each character in the quality string and subtract 33 from it. The 'highest' character you have is 'C' == 67 and the lowest is '%' == 37. These would translate to Q scores of 34 and 4 which is an expected range of Phred scores.

    The quickest way to distinguish Sanger Q-score encoding (ASCII-33) from Illumina (Solexa) Q-score encoding (ASCII-64) is to look for numerals [0-9] in the quality string. The numerals have ASCII values from 48-57 so it would be non-sensical to subtract 64 from them. If there are numerals in your quality string then the Q-score encoding is Sanger.
    Last edited by kmcarr; 12-02-2009, 02:23 PM.

    Comment


    • #3
      Got it -- thanks for explaining it so clearly.

      Comment


      • #4
        Got it -- thanks for explaining this so clearly.
        Last edited by crinfante; 12-02-2009, 02:05 PM. Reason: duplicate please delete

        Comment


        • #5
          Hello,

          I have a related issue, I don't know in which FASTQ format my reads are?

          @XXX010005.1 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:1319:692 length=51
          ACGATGTGACGTACGCGTATGCTCGTATACACACGCATGACGAGCGACGAT
          +XXX010005.1 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:1319:692 length=51
          IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII@I
          @XXX10005.2 BI:080722_SL-XBE_0007_FC3061LAAXX:6:1:395:487 length=51
          TTTTTCGTGTCGGCGGCCCGTCGCCTCTCCACCCCACCACACCCCCCACCC

          Comment


          • #6
            That would be Illumina/Solexa Fastq;
            But as I can't see the version of the pipeline, it's not possible to tell if this is the new linear fastq metric or the older log-score fastq metric.
            The difference is small, so it shouldn't matter.

            Short question:
            Are all your reads of quality "IIIIII"?
            Strikes me as funny and mayhaps erroneous
            Best
            -Jonathan

            Comment


            • #7
              That's a kind of old Solexa fastq format. (Old being about a year old with this application!) The characters in the quality line ranged from -5 to 40, with ! being 0 and I being 40.

              Fastq format looks like this:

              @read name
              sequence
              +read name again (or just + and nothing, to save file space)
              quality score for each letter in above read

              Comment


              • #8
                Many thanks Jonathan and swbarnes2.

                Yes actually frankly most of my reads are with quality values of IIII!

                So I'm trying to use VAAL in order to assemble several bacterial genomes to a reference and detect SNPs, VAAL requires that I convert these FASTQ files into .fasta and .qual.

                Do you know an easy way of doing that, given that I'm not an expert in bioinformatics?!

                I tried this one:


                But it keeps giving me errors.

                Thanks a lot!

                Comment


                • #9
                  Originally posted by MoBi View Post
                  I tried this one:


                  But it keeps giving me errors.

                  Thanks a lot!
                  I'm pretty sure from the format names etc that this website is using Biopython internally to do the conversion. However, it looks like there is a bug in the website with quotes (which can occur in FASTQ quality strings) being "escaped" with extra slash characters. As a result, the data given to Biopython is corrupted, and the conversion fails.

                  You would be better off using Biopython directly (especially for large files, it would be silly to try and upload/convert/download anything bigger than a few megabytes).

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X