Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP quality score in Samtools pileup

    Hi,

    I was examining the pileup by Samtools at a particular base of interest:

    X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

    It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

    The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!

  • #2
    Originally posted by wangzkai View Post
    Hi,

    I was examining the pileup by Samtools at a particular base of interest:

    X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

    It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

    The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!
    Maybe the mapping qualities for the variant reads are low?

    Comment


    • #3
      Originally posted by nilshomer View Post
      Maybe the mapping qualities for the variant reads are low?
      This is exactly what I found when I came across the same puzzling situation. Converting the data to BAM format and then visualizing it in IGV showed me that the apparently heterozygous SNP was getting all of its heterozygous bases from the low-quality ends of the reads - the SNP never once showed up in the beginning or middle of a read, only when it was within 7 nt of the end.

      Odd? Yes. But if it were a true SNP, you'd expect to find it in half of the reads regardless of position.

      Comment


      • #4
        Yes that is the problem with SAMtools. The majority of variants are in the 2nd half of the read, hence you have lots of false positives.

        Comment


        • #5
          Does anyone have a code that can print out the positions within each of the reads where a given snp exist?
          Last edited by christophpale; 07-21-2010, 03:11 AM.

          Comment


          • #6
            Is there any more explanation?
            I have found the following contrast examples:
            Code:
            scaffold2410 23912 G S 6 6 37 123 c,,,,,,,,,,,,,,,,,,,,ccc,,cc,,,,,,,,,,,,,ccc,,cc,cccc,,cc,,,,,,,cc,,c,cc,c,,c,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, HHHHHHHHHHHHHHHHHFEHJCCHJEIHHHHHBHHHHHHFHHHHGHHHHHHHHHHCHHHHHHHJFFGJJJJHJHHHHHHHHHHHHHHHHHHHHH<HHHHHGHHHHGHHGHHHGHHGH<GJJIA
            and :
            Code:
            scaffold12030   25942   A       R       37      44      25      44      gggggg,gg,g,,,,,,,ggggggggggggggggggggggggg,    HHHHHHGHHGHEHGHHHHHFHHHHHHHHHHGHBFHHHHHHHHHJ
            Both of these two examples have similar reads quality and mapped on the reverse strand of reference, but with different "SNP quality", how these results produced?

            Anyone who can give me any suggestions will be highly appreciated.

            We have estimated the heterozygosis based on the results that filtered by VarFilter, obviously, we have under estimated the heterozygosis level.
            Last edited by pengchy; 09-21-2011, 06:37 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM
            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin



              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has seen remarkable advancements,...
              12-02-2024, 01:49 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            48 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            34 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-11-2024, 07:45 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Working...
            X