Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with input files with Q64 and Q33 (GATK IndelRealigner)

    Dear SEQanswers members,

    I have two input fasta files from exome-seq. One is coded with Q64 and the other is coded with Q33 quality scores. I want to combine the two input fasta files and run bwa+GATK.

    How do I combine them for IndelRealigner? I suppose that IndelRealigner needs all reads from both Q64 and Q33. Can I do IndelRealigner separately and then join them? Will this cause problems?

    I have searched for many posts but can't find my answers. Please help me.

    Thanks,
    Woody

  • #2
    Once they're in sam/bam format the original quality values get changed to ASCII-33 anyway. Just map them separately then combine the sam files.

    Comment


    • #3
      Please elaborate this issue and then i can see and Brian Bushnell is right please do one time this "Just map them separately then combine the sam files."............Best Of Luck
      Journal of Social Science

      Comment


      • #4
        Originally posted by Brian Bushnell View Post
        Once they're in sam/bam format the original quality values get changed to ASCII-33 anyway. Just map them separately then combine the sam files.
        Hi Brian,

        Thanks for the response!

        I combined the fastq files and ran BWA-MEM to generate a bam file. I used GATK's RealignerTargetCreator to create target regions for IndelRealigner. But, before IndelRealigner, I added --fix_misencoded_quality_scores to the RealignerTargetCreator step. GATK returned an error message that it can't handle a mixture of records with Q64 and Q33. Are you suggesting that I don't need to fix the quality scores?

        It confused me.

        Bests,
        Woody

        Comment


        • #5
          Hi Brain,

          I am sorry that I misunderstood your response. I will map them separately and merge the bam files for the following GATK steps. I will report if I succeed.

          Thanks,
          Woody

          Comment


          • #6
            Hi Brian and elkjournals1,

            I found the solution for this after consulting with a GATK developer. Please refer to this post:

            http://gatkforums.broadinstitute.org...aligner#latest

            Bests,
            Woody

            Comment


            • #7
              Thanks for posting that link. But, I disagree with the advice there. Let me clarify:

              You should align the datasets separately, then merge the sam files. But when aligning the ASCII-64 files with bwa, you need to use the -I flag to tell it the qualities are in ASCII-64, since bwa does not autodetect qualities (as far as I know). Without that flag, the output will be incorrect, and simply fixing the quality values afterward will not solve it since the mapping will have been done according to incorrect quality values.

              As long as the mapper is aware of the input quality format, it will convert it to ASCII-33 in the sam output regardless of the input, as that is required by the sam specification and is not something that varies by mapper.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X