Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • i converted illumina fastq into sanger fastq, need advice

    Hello dear ngs community,
    I am new to this forum but allready red many threads which helped me alot.
    So i found many ways in this forum to convert illumina fastq quality scores into sanger fastq phred scores. My Data comes from sequencer which use Illumina 1.5 (thx to fastqc ). For my Diploma thesis (iam the last of my kind with Diploma ) i write a pipleline script in ruby. Therefore i use the tools bwa samtools, gatk and picard. My Prof. wants me to convert all fasq files to sanger fastq. So i read about bioruby maq and other tools but did come to the conclusion that i want to write it on my own so the user of the script wont need to install even more tools or patch bwa for my tool to correctly use it. Thats why i experimented with ASCII codes in ruby and got some result and i want to doublecheck this results with your comments.

    my results:
    here a exampe read:
    "NACGTTATACTTGTTAGCACAATCCAAGCTAGGCTAAGAAGTTCAAACATGGTGGACGTACCCACTGATCTTTTG "

    illumina 1.5 score
    "BIKKGQNMLL[[[[[Y[[[[_______________YYYYYYYYYY[[[[[[Y[[YY[[[[_____________QQ"
    (in numbers
    66 73 75 75 71 81 78 77 76 76 91 91 91 91 91 89 91 91 91 91 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 89 89 89 89 89 89 89 89 89 89 91 91 91 91 91 91 89 91 91 89 89 91 91 91 91 95 95 95 95 95 95 95 95 95 95 95 95 95 81 81
    sanger score
    "#*,,(2/.--<<<<<:<<<<@@@@@@@@@@@@@@@::::::::::<<<<<<:<<::<<<<@@@@@@@@@@@@@22"
    (in numbers)
    35 42 44 44 40 50 47 46 45 45 60 60 60 60 60 58 60 60 60 60 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 58 58 58 58 58 58 58 58 58 58 60 60 60 60 60 60 58 60 60 58 58 60 60 60 60 64 64 64 64 64 64 64 64 64 64 64 64 64 50 50
    i got the sanger score from athread in this forum who uses a commandline for converting it in bam files (couldn"t find the thread again):
    samtools view -h chrYvs48_2_1_KESC1_mymod_48_2_2_KESC1_mymod.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam

    so my question is, can i simply substract 31 to the numbers and i get a sanger quality score ?And there was something with offsets if i recognize correcly... I would converts this number again into ascii and replace them with the scores in the fasq file.
    Is this the correct way or where did i mistakes.
    Thank you in Advance Alex
    Last edited by Aicen; 11-09-2011, 02:32 AM.

  • #2
    The current version of bwa has a '-I' option, which will read files with 1.5 encoding directly. You might want to discuss with your Prof. whether it would be more appropriate to use this rather than converting (will the fastq files be used for anything else that assumes Sanger encoding after you've run bwa?).

    Comment


    • #3
      bwa

      my current bwa version is 5.9 so ur right. but the idea is also that the user can choose to just use my script, if he wants to, to just convert fastq files into sanger formated files.

      To be honest i dont know if tools i use assume they got fastq files in sanger formation but for me it seems to get the standard score format in future, so i thought it would be a got idea to add a function which could format fastq files therefore.

      Tools I use, as above described, are samtools , gatk and picard tools(mark duplicates)

      Comment


      • #4
        Originally posted by Aicen View Post


        so my question is, can i simply add +31 to the numbers and i get a sanger quality score ?And there was something with offsets if i recognize correcly... I would converts this number again into ascii and replace them with the scores in the fasq file.
        Is this the correct way or where did i mistakes.
        Thank you in Advance Alex
        I guess you mean substract 31??
        Anyways, you can do that, but Ilumina 1.5 quality is using the "B" mark (i think it is ASCII 66). So this is a flag which tells you not to use that base for analysis. And AFAIK ASCII 67 is never used. So mere substraction yields somehat like Sanger, but you have to bear that in mind. (of course the B signs will have a very low Phred Score, so most downstream programs will be aware of that)

        Hope that helps,
        Peter

        Comment


        • #5
          your right

          Thx i will fix this in my post.

          Comment


          • #6
            i am a learner of NGS, we have created our own Galaxy, i need to run BWA for which i need my data set files to be converted from fastq to fastqsanger, i am not finding the way to convert it, any help will be regarded?.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            80 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X