I got INFO in the raw.bcf as:
DP=118;AF1=1;CI95=1,1;DP4=0,0,1,42;MQ=20;FQ=-156
DP=154;AF1=1;CI95=1,1;DP4=0,1,1,42;MQ=20;FQ=-139;PV4=1,1,1,1
It is confused why the first one have no PV4. I could understand that PV4 are strand bias, baseQ bias, mapQ bias and tail distance bias. These four biases are obtained by exact test of DP4, t test of baseQ, T test of mapQ and T test of tail distance separately. In the first case, we do have DP4, why can we get the Strand bias? If it is because that there is no sample in the reference group, why can we get the other three biases? Are they one sample T test or Two sample T test? These bias is used to determine whether baseQ or mapQ prefer to be a fix number. Then how to determine this fix number? Thanks
Anyway, when we had the PV4, how can we determine the SNP qualities with these p values? Obviously, we do not want larger biases. I thought the lower the p value is, the significant the bias is. However, I am not sure it is right. Just take tail distance bias as an example, do we have to pooled all the tail distance first and then do the T test? If so, I think lower p value will correspond to widely distribution of the tail distance and thus we will have lower bias. Again, this is just my thought. I am really confused about all these value.
I know in the vcfutils.pl varFilter all the filter options for PV4 are the minimum values. Why? Does that mean the larger P value is better? Does any one have any optimal values for these filter options?
Thank you very much,
fanping
DP=118;AF1=1;CI95=1,1;DP4=0,0,1,42;MQ=20;FQ=-156
DP=154;AF1=1;CI95=1,1;DP4=0,1,1,42;MQ=20;FQ=-139;PV4=1,1,1,1
It is confused why the first one have no PV4. I could understand that PV4 are strand bias, baseQ bias, mapQ bias and tail distance bias. These four biases are obtained by exact test of DP4, t test of baseQ, T test of mapQ and T test of tail distance separately. In the first case, we do have DP4, why can we get the Strand bias? If it is because that there is no sample in the reference group, why can we get the other three biases? Are they one sample T test or Two sample T test? These bias is used to determine whether baseQ or mapQ prefer to be a fix number. Then how to determine this fix number? Thanks
Anyway, when we had the PV4, how can we determine the SNP qualities with these p values? Obviously, we do not want larger biases. I thought the lower the p value is, the significant the bias is. However, I am not sure it is right. Just take tail distance bias as an example, do we have to pooled all the tail distance first and then do the T test? If so, I think lower p value will correspond to widely distribution of the tail distance and thus we will have lower bias. Again, this is just my thought. I am really confused about all these value.
I know in the vcfutils.pl varFilter all the filter options for PV4 are the minimum values. Why? Does that mean the larger P value is better? Does any one have any optimal values for these filter options?
Thank you very much,
fanping
Comment