Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP quality score in Samtools pileup

    Hi,

    I was examining the pileup by Samtools at a particular base of interest:

    X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

    It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

    The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!

  • #2
    Originally posted by wangzkai View Post
    Hi,

    I was examining the pileup by Samtools at a particular base of interest:

    X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

    It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

    The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!
    Maybe the mapping qualities for the variant reads are low?

    Comment


    • #3
      Originally posted by nilshomer View Post
      Maybe the mapping qualities for the variant reads are low?
      This is exactly what I found when I came across the same puzzling situation. Converting the data to BAM format and then visualizing it in IGV showed me that the apparently heterozygous SNP was getting all of its heterozygous bases from the low-quality ends of the reads - the SNP never once showed up in the beginning or middle of a read, only when it was within 7 nt of the end.

      Odd? Yes. But if it were a true SNP, you'd expect to find it in half of the reads regardless of position.

      Comment


      • #4
        Yes that is the problem with SAMtools. The majority of variants are in the 2nd half of the read, hence you have lots of false positives.

        Comment


        • #5
          Does anyone have a code that can print out the positions within each of the reads where a given snp exist?
          Last edited by christophpale; 07-21-2010, 03:11 AM.

          Comment


          • #6
            Is there any more explanation?
            I have found the following contrast examples:
            Code:
            scaffold2410 23912 G S 6 6 37 123 c,,,,,,,,,,,,,,,,,,,,ccc,,cc,,,,,,,,,,,,,ccc,,cc,cccc,,cc,,,,,,,cc,,c,cc,c,,c,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, HHHHHHHHHHHHHHHHHFEHJCCHJEIHHHHHBHHHHHHFHHHHGHHHHHHHHHHCHHHHHHHJFFGJJJJHJHHHHHHHHHHHHHHHHHHHHH<HHHHHGHHHHGHHGHHHGHHGH<GJJIA
            and :
            Code:
            scaffold12030   25942   A       R       37      44      25      44      gggggg,gg,g,,,,,,,ggggggggggggggggggggggggg,    HHHHHHGHHGHEHGHHHHHFHHHHHHHHHHGHBFHHHHHHHHHJ
            Both of these two examples have similar reads quality and mapped on the reverse strand of reference, but with different "SNP quality", how these results produced?

            Anyone who can give me any suggestions will be highly appreciated.

            We have estimated the heterozygosis based on the results that filtered by VarFilter, obviously, we have under estimated the heterozygosis level.
            Last edited by pengchy; 09-21-2011, 06:37 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X