Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • May I a question about fastqs and the script fq_all2std?

    Hi Guys

    I testing distinct conversors from solexa fastq to standard sanger but have a couple of naive questions to do om which i am unclear

    I think that as illumina 1.8+ encoding scheme encodes that Phred-score by adding 33 to the score there is no need of conversion as this is sanger too

    Can anyone tell me if I am correct??

    As for illumina 1.5+ and 1.8+

    I donwloaded the fq_all2std script from maq and because in the source file the parameter in for loop aare -64 to 64 and the remaining parameters wherever for the conversion is again 64 i think this file is for illumina 1.3+
    Am i right here??

    Additonally if want to this script to convert from illumina 1.5+ to sanger is it enough to just change 64 wherever noted for 66?? Is that right? I mean is it just as simple as it seems or i am missing something in this last step?

    thank you in advance

  • #2
    Originally posted by cllorens View Post
    I think that as illumina 1.8+ encoding scheme encodes that Phred-score by adding 33 to the score there is no need of conversion as this is sanger too

    Can anyone tell me if I am correct??
    Yes, that part is correct.

    Have you read http://dx.doi.org/10.1093/nar/gkp1137 and http://en.wikipedia.org/wiki/FASTQ_format yet?

    Comment


    • #3
      Hi Peter,
      Yes I saw the wikipedia site you note and some other information (no the NAR paper thank for the reference, I´ll read it with pleasure) and I think i am clear about the differences although in the first question (that you answered me) was asked as sometimes is good to have confirmation about thinks you are clear but not sure at all. My two other doubts are more properly about the script fq_all2std and if about i am right about if it is oriented to illumina 1.3+ (when talking about conversion illumina to sanger there are other functions) and if the few amends in the code I suggest are enough to have these illumina to sanger functions adapted to 1.5+. Based on what i read I think so but again it is good if someone can confirm should i am right or wrong. Perhaps these two questions are more targeted to the original developer of this script I took from Maq. I am not sure but I think is Nilshommer.
      CArlos

      Comment


      • #4
        If you can read Perl it is fairly clear, I'm quoting from this copy http://maq.sourceforge.net/fq_all2std.pl

        This bit does the FASTQ conversion:
        Code:
        sub sol2std {
          my $max = 0;
          while (<>) {
        	if (/^@/) {
        	  print;
        	  $_ = <>; print; $_ = <>; $_ = <>;
        	  my @t = split('', $_);
        	  my $qual = '';
        	  $qual .= $conv_table[ord($_)] for (@t);
        	  print "+\n$qual\n";
        	}
          }
        }
        And further up here is the definition of the conversation table:
        Code:
        # Solexa->Sanger quality conversion table
        my @conv_table;
        for (-64..64) {
          $conv_table[$_+64] = chr(int(33 + 10*log(1+10**($_/10.0))/log(10)+.499));
        }
        All those logs are implementing the Solexa score to PHRED score conversion (see our NAR paper for the formula and citations, or the wikipedia page). That means this script converts the really old Solexa FASTQ encoding (which can have negative scores) into the Sanger PHRED encoding (which does not have negative scores).

        Unless you are dealing with really really old data, you shouldn't be using this conversion.

        Comment


        • #5
          That is the point in part of my question Peter. I known that is a conversor for old data. I found however interesting the script not for my current data but under the idea have at hand (just in case if need it) a conversor for several previous illumina formats.

          Then this script was done for the early solexa (ASCII characters with range 59–126 and off-set 64) isn´t? But if several amends are done to change the parameters the script should also valid for converting 1.3+ and 1.5+ (again if needed) as SANGER has not changed. That was the remainder of my question.

          Ah, thank you for the reference (I read it) older but quite instructive.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Working...
          X