Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST and Variant Calling

    Hi,

    I have a question regarding variant calling with SOLiD data. For the mapping, I used BFAST and for variant calling I'm trying to use GATK.

    The problem occurs when I run the quality score recalibration and also the unified genotyper. Whenever I run these I get an output that says many of my reads were filtered because they don't have a mapping quality (The MappingQualityUnavailableFilter contains nearly 90% of my reads, also see here: http://www.broadinstitute.org/gsa/wi...TK_release_1.1).

    In other words the mapping quality was 255 which in a sam/bam file means the quality is unavailable.

    Here (http://sourceforge.net/apps/mediawik...apping_Quality) it seems the mapping quality of 255 is actually very good.

    I'm confused as to how to work around this problem. Any help would be appreciated. Thanks.

  • #2
    I had the same problem, my solution was to change the mapping quality. This issue is supposed to be fixed in the latest version of BFAST, so an better option would be to realign reads with mapQV 255. Or just use Samtools mpileup.

    Comment


    • #3
      Thanks Chipper.

      If I may ask what did you find was an appropriate change for the mapping quality? Could it merely be changed to 254 or is it more complex than that? Thanks again.

      Comment


      • #4
        As long as it is not 255 it is ok, but if the alignments with score 255 are unreliable it would be better to set it to a lower value.

        Comment


        • #5
          Try upgrading to the newest version as Chipper suggests, then report back.

          Comment


          • #6
            I will try again using the new version (0.7.0a). Before I used 0.6.5a. My only concern is that the manual for the new version on page 39 also says "If a read has one alignment, then the mapping quality is set to 255." Is there another option I should specify to avoid getting a score of 255?

            Comment


            • #7
              I would welcome feedback, but I think the calculation should produce a lot fewer 255s. If you find that it does, perhaps I should update the manual.

              Comment


              • #8
                Thanks Chipper and nilshomer! I reran using the new version and now there are no reads failing this filter: MappingQualityUnavailableFilter

                It seems to be working much better now.

                Thanks again.

                Comment


                • #9
                  Let me retake this old issue.

                  I am working with 1000 genomes project alignments data and they have done their SOLiD alignment with bfast 0.64e, so I have two options: I redo the alignment myself with a newer version of bfast as it is said above; or I try to handle the 255 before GATK.

                  Handling the 255 mapping qualities requires replacing this values with something else in the interval [0, 254]. Any ideas on this?

                  I am trying to do it with samtools calmd mapping quality capping option.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  44 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  43 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  38 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X