Seqanswers Leaderboard Ad

**Heisman** · 12-01-2011, 05:52 AM

Originally posted by fanping View Post

I got INFO in the raw.bcf as:
DP=118;AF1=1;CI95=1,1;DP4=0,0,1,42;MQ=20;FQ=-156
DP=154;AF1=1;CI95=1,1;DP4=0,1,1,42;MQ=20;FQ=-139;PV4=1,1,1,1

It is confused why the first one have no PV4. I could understand that PV4 are strand bias, baseQ bias, mapQ bias and tail distance bias. These four biases are obtained by exact test of DP4, t test of baseQ, T test of mapQ and T test of tail distance separately. In the first case, we do have DP4, why can we get the Strand bias? If it is because that there is no sample in the reference group, why can we get the other three biases? Are they one sample T test or Two sample T test? These bias is used to determine whether baseQ or mapQ prefer to be a fix number. Then how to determine this fix number? Thanks

Anyway, when we had the PV4, how can we determine the SNP qualities with these p values? Obviously, we do not want larger biases. I thought the lower the p value is, the significant the bias is. However, I am not sure it is right. Just take tail distance bias as an example, do we have to pooled all the tail distance first and then do the T test? If so, I think lower p value will correspond to widely distribution of the tail distance and thus we will have lower bias. Again, this is just my thought. I am really confused about all these value.

I know in the vcfutils.pl varFilter all the filter options for PV4 are the minimum values. Why? Does that mean the larger P value is better? Does any one have any optimal values for these filter options?

Thank you very much,

fanping

These are good questions that I've been wondering as well, hopefully somebody (prehaps Heng himself) can give a good response.

**fanping** · 12-01-2011, 07:04 AM

Thanks for your reply. Hope some one could answer our question

**swbarnes2** · 12-01-2011, 09:10 AM

Well, empirically, the PV4 scores only show up on mixed calls, not homozygous. So I think those values are about assessing if there is a significant quality difference beween the reads saying the alternate letter, and reads saying the reference letter. If all the reads saying reference letter are great quality, come from both directions, and the reference letter falls in the start of some reads, and the middle of others, while all of the reads saying alternate letter have crap mapQ, and mostly come from one direction, and all the alternate letters are in the last 4 bases of their reads, you probably don't have a real mixed letter at all, because the reads saying you have an alternate letter are messed up.

So you can't have those stats for homoxygous calls. There's nothing to compare to.

**fanping** · 12-01-2011, 10:24 AM

So does that mean the PV4 is a prosperity of genotype 0/1? I know lots of genotype 1/1 also have PV4.

If PV4 is used to describe the homs and my data has a haploid genome, which means I have to filter all the homs info, so the PV4 will give me no information on the quality of the SNP or INDELs. Is my understanding right? Thank you very much.

Originally posted by swbarnes2 View Post

Well, empirically, the PV4 scores only show up on mixed calls, not homozygous. So I think those values are about assessing if there is a significant quality difference beween the reads saying the alternate letter, and reads saying the reference letter. If all the reads saying reference letter are great quality, come from both directions, and the reference letter falls in the start of some reads, and the middle of others, while all of the reads saying alternate letter have crap mapQ, and mostly come from one direction, and all the alternate letters are in the last 4 bases of their reads, you probably don't have a real mixed letter at all, because the reads saying you have an alternate letter are messed up.

So you can't have those stats for homoxygous calls. There's nothing to compare to.

**swbarnes2** · 12-01-2011, 11:59 AM

Originally posted by fanping View Post

So does that mean the PV4 is a prosperity of genotype 0/1? I know lots of genotype 1/1 also have PV4.

If PV4 is used to describe the homs and my data has a haploid genome, which means I have to filter all the homs info, so the PV4 will give me no information on the quality of the SNP or INDELs. Is my understanding right? Thank you very much.

Just because you are doing a haploid genome doesn't mean that you can't have genuine mixed calls. Submitters don't always give clonal samples.

For the 1/1 calls with PV4 values, do the DP4's show at least one read for reference allele? I bet they do.

So yes, it looks like the PV4 isn't going to help you evaluate homozygous calls. It doesn't look like it's supposed to. It looks like it's supposed to tell you on mixed calls whether the evidence supporting one of the alleles is suspect.

Use the DP4, and the GQ, and the PL to evaluate homozygous calls.

**fanping** · 12-01-2011, 12:16 PM

Thanks for your concise and useful reply. You are right that 1/1 calls with PV4 do have some reference allele.

I just remember that the strand bias in PV4, (i.e. 1st value in PV4) can be calculated with DP4 using exact test. (p.s. I calculated and it works.) If DP4 is not just information of 0/1 calls, why PV4 is only for 0/1?

I appreciate if you can also help me to understand this. Thanks.

Originally posted by swbarnes2 View Post

Just because you are doing a haploid genome doesn't mean that you can't have genuine mixed calls. Submitters don't always give clonal samples.

For the 1/1 calls with PV4 values, do the DP4's show at least one read for reference allele? I bet they do.

So yes, it looks like the PV4 isn't going to help you evaluate homozygous calls. It doesn't look like it's supposed to. It looks like it's supposed to tell you on mixed calls whether the evidence supporting one of the alleles is suspect.

Use the DP4, and the GQ, and the PL to evaluate homozygous calls.

**clarissaboschi** · 10-05-2012, 05:59 AM

I would like to know if anyone has any optimal values for PV4. Specially to filter Indels.

Thanks

**skomal** · 01-03-2013, 06:23 AM

Computing PV4 for multi-base variants

Hi!

Does anyone know how PV4 are computed for multi-base variants. More specifically,
1) How is tail-distance computed for MBV? Is it the shortest distance from the start of the MBV to either end?
2) How is base quality bias computed (w.r.t. MBV)?

Thanks

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

mpileup for SNP (PV4)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News