SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What cut off does GATK snp calling use for mapping quality score? foxyg Bioinformatics 0 09-25-2010 12:19 PM
Samtools pileup indel quality score computation christophpale Bioinformatics 0 07-27-2010 01:51 PM
deletion '*' has quality score in pileup? luxmare Bioinformatics 3 06-13-2010 04:20 PM
How is SNP quality calculated in SAMTOOLS pileup? lazyworm Bioinformatics 2 05-28-2010 01:00 AM
BWA dbwtsw to Samtools Pileup - what is the score? aldrinyim Bioinformatics 2 01-19-2010 07:11 AM

Reply
 
Thread Tools
Old 05-15-2010, 12:38 AM   #1
wangzkai
Member
 
Location: Southern California, USA

Join Date: Feb 2010
Posts: 11
Question SNP quality score in Samtools pileup

Hi,

I was examining the pileup by Samtools at a particular base of interest:

X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!
wangzkai is offline   Reply With Quote
Old 05-15-2010, 11:14 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by wangzkai View Post
Hi,

I was examining the pileup by Samtools at a particular base of interest:

X 131016403 G G 103 0 60 53 T$T,,.t.....T.TT,,,..,.t,,...,t,t,tTT,.TT..T..Tt,,T,,t BFGGAGCEFEEE<B-GGGAGGFGGGGGG?GGFGGGGGBEGFGGGEGEFDGGEF

It looks like a clear heterozygous position with good coverage and decent base qualities, however it got a SNP quality score of 0 and a homozygous genotype call. Is there any possible explanation for this?

The data are from 75x2 PE reads and alignment was done using ELANDv2. Any help on this will be highly appreciated. Thanks!
Maybe the mapping qualities for the variant reads are low?
nilshomer is offline   Reply With Quote
Old 05-18-2010, 11:49 AM   #3
martian_bob
Member
 
Location: New York

Join Date: Feb 2010
Posts: 11
Default

Quote:
Originally Posted by nilshomer View Post
Maybe the mapping qualities for the variant reads are low?
This is exactly what I found when I came across the same puzzling situation. Converting the data to BAM format and then visualizing it in IGV showed me that the apparently heterozygous SNP was getting all of its heterozygous bases from the low-quality ends of the reads - the SNP never once showed up in the beginning or middle of a read, only when it was within 7 nt of the end.

Odd? Yes. But if it were a true SNP, you'd expect to find it in half of the reads regardless of position.
martian_bob is offline   Reply With Quote
Old 05-18-2010, 12:50 PM   #4
juan
Member
 
Location: New York City

Join Date: Aug 2009
Posts: 14
Default

Yes that is the problem with SAMtools. The majority of variants are in the 2nd half of the read, hence you have lots of false positives.
juan is offline   Reply With Quote
Old 07-03-2010, 11:08 AM   #5
christophpale
Member
 
Location: canada

Join Date: May 2010
Posts: 16
Default

Does anyone have a code that can print out the positions within each of the reads where a given snp exist?

Last edited by christophpale; 07-21-2010 at 04:11 AM.
christophpale is offline   Reply With Quote
Old 09-21-2011, 07:32 AM   #6
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Is there any more explanation?
I have found the following contrast examples:
Code:
scaffold2410 23912 G S 6 6 37 123 c,,,,,,,,,,,,,,,,,,,,ccc,,cc,,,,,,,,,,,,,ccc,,cc,cccc,,cc,,,,,,,cc,,c,cc,c,,c,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, HHHHHHHHHHHHHHHHHFEHJCCHJEIHHHHHBHHHHHHFHHHHGHHHHHHHHHHCHHHHHHHJFFGJJJJHJHHHHHHHHHHHHHHHHHHHHH<HHHHHGHHHHGHHGHHHGHHGH<GJJIA
and :
Code:
scaffold12030   25942   A       R       37      44      25      44      gggggg,gg,g,,,,,,,ggggggggggggggggggggggggg,    HHHHHHGHHGHEHGHHHHHFHHHHHHHHHHGHBFHHHHHHHHHJ
Both of these two examples have similar reads quality and mapped on the reverse strand of reference, but with different "SNP quality", how these results produced?

Anyone who can give me any suggestions will be highly appreciated.

We have estimated the heterozygosis based on the results that filtered by VarFilter, obviously, we have under estimated the heterozygosis level.

Last edited by pengchy; 09-21-2011 at 07:37 AM.
pengchy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO