Hello all,
In a single sample resequencing project we are trying to extract snps from a BAM file. We are also interested in heterozygous snps that differ from the reference allele. Say in a case where the reference shows a T, in theory we can sequence a C/G in a individual. (or S in IUPAC coding.)
I used mpileup as described on the samtools page, but only supplied a single sample. (Normal pileup deprecated) When I see this profile in the pilup: ggggggCccggg
This is what the vcf file tells:
chrY position . T G,C 29.3 . DP=12;AF1=1;CI95=0.5,1;DP4=0,0,1,7;MQ=41 PL 75,28,64,13,0,64
Yet, the vcf documentation states: "ALT comma separated list of alternate non-reference alleles called on at least one of the samples.."
So the first question: How can you still discriminate between samples if you would want to do so?
Second: The outputed format is not accepted by seattleseq snp annotation which we would like to do. How can we fix this, or are there other methods we can try?
I feel the vcf format may be a helpfull addition to a standard format. Yet I get the idea there are still some inconsistencies which makes pipelining these data a frustrating job. Hopefully this thread is welcome in the bioinformatics forum. Any comments are welcome, thanks!
In a single sample resequencing project we are trying to extract snps from a BAM file. We are also interested in heterozygous snps that differ from the reference allele. Say in a case where the reference shows a T, in theory we can sequence a C/G in a individual. (or S in IUPAC coding.)
I used mpileup as described on the samtools page, but only supplied a single sample. (Normal pileup deprecated) When I see this profile in the pilup: ggggggCccggg
This is what the vcf file tells:
chrY position . T G,C 29.3 . DP=12;AF1=1;CI95=0.5,1;DP4=0,0,1,7;MQ=41 PL 75,28,64,13,0,64
Yet, the vcf documentation states: "ALT comma separated list of alternate non-reference alleles called on at least one of the samples.."
So the first question: How can you still discriminate between samples if you would want to do so?
Second: The outputed format is not accepted by seattleseq snp annotation which we would like to do. How can we fix this, or are there other methods we can try?
I feel the vcf format may be a helpfull addition to a standard format. Yet I get the idea there are still some inconsistencies which makes pipelining these data a frustrating job. Hopefully this thread is welcome in the bioinformatics forum. Any comments are welcome, thanks!