Seqanswers Leaderboard Ad

**Heisman** · 12-14-2011, 07:27 PM

With mpileup if there are two alternate alleles I don't believe there is a way to tell how many reads go to each (from the mpileup output). You can open up the bam file in IGV and look that way, I think.

From the mpileup output, assuming you have run it on a single individual, you can look at the "AC1=" part of the info field. If it's 1, it's heterozygous. If it's 2, it's homozygous. You can also look at the genotype field. If it's "0/1" it's heterozygous, if it's "1/1" it's homozygous.

**meher** · 12-15-2011, 01:39 AM

Thanks for your answer.

I have a bam file which is pooled with 10 individuals, so in this case can we still look at the value 'AC' and decide its zygosity status or is GT the only field from where we can infer the zygosity status.

**Heisman** · 12-15-2011, 06:23 AM

I've yet to analyze pooled samples so I'm not sure. Could you paste the header lines and a couple of example lines of the called variants?

**Alex Renwick** · 12-15-2011, 06:57 AM

Originally posted by Heisman View Post

With mpileup if there are two alternate alleles I don't believe there is a way to tell how many reads go to each (from the mpileup output). ...

Actually, you can get this information from mpileup. It's in the fifth column of the pileup format output. This column indicates the state of all reads at a given position. You can tell how many reads have the reference allele, how many have the alternate, and how many have something else. It's a little ugly to parse, but the information is there.

**Heisman** · 12-15-2011, 07:03 AM

Originally posted by Alex Renwick View Post

Actually, you can get this information from mpileup. It's in the fifth column of the pileup format output. This column indicates the state of all reads at a given position. You can tell how many reads have the reference allele, how many have the alternate, and how many have something else. It's a little ugly to parse, but the information is there.

When there are two alternate alleles, I've noticed that with DP4 reads for both alleles are lumped together, ie, there are still 4 columns instead of 6.

**Alex Renwick** · 12-15-2011, 07:14 AM

Originally posted by Heisman View Post

When there are two alternate alleles, I've noticed that with DP4 reads for both alleles are lumped together, ie, there are still 4 columns instead of 6.

Hmm...I think we must be talking about different things. The default pileup output looks like (from samtools):

Code:

seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23  ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23  ,.$....,,.,.,...,,,.,...    7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23  ,$....,,.,.,...,,,.,...^l.  <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22  ...T,,.,.,...,,,.,....  33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22  ....,,.,.,.C.,,,.,..G.  +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23  ....,,.,.,...,,,.,....^k.   %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23  A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<

This shows, for example, that position 279 of seq1 has reference allele C, and that 21 reads agree with the reference while one read has A and one read has T.

**Heisman** · 12-15-2011, 07:22 AM

I don't believe there is any way to get pileup outuput when using the newer version of mpileup, correct? Or am I mistaken? That's an interesting idea though to use pileup for that purpose (assuming I'm not mistaken).

**zee** · 12-15-2011, 07:23 AM

VCF version 4 contains this information if you call your variants with mpileup-bcftools or the GATK UnifiedGenotyper. Have a look at the docs but note that both programs do not produce the exact same information by default. I find the GATK VCF more informative for counting allele frequency and depth.

**Alex Renwick** · 12-15-2011, 09:39 AM

Originally posted by Heisman View Post

I don't believe there is any way to get pileup outuput when using the newer version of mpileup, correct? Or am I mistaken? That's an interesting idea though to use pileup for that purpose (assuming I'm not mistaken).

Call mpileup without computing genotype (-g or -u option) and it produces pileup format. E.g., "samtools mpileup -f ref.fasta sample.bam > sample.pileup".

**meher** · 12-15-2011, 12:14 PM

That's true, even if there are two alternate alleles it gives only four values in DP4. So we cannot decide the heterozygosity in this case. Is there a way to find this? will be very interesting for me if there is a solution!!

**meher** · 12-15-2011, 01:48 PM

Hi, here are the few lines of the output;

##fileformat=VCFv4.1
##samtoolsVersion=0.1.16 (r963:234)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the site allele frequency of the first ALT allele">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CI95,Number=2,Type=Float,Description="Equal-tail Bayesian credible interval of the site allele frequency at the 95% level">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT alignment-sorted.bam
chr1 3002525 . C A 34 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=27;FQ=-36 GT:PL:GQ
1/1:66,9,0:16
chr1 3009311 . A G 63 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=42;FQ=-36 GT:PL:GQ 1/1:95,9,0:16
chr1 3010971 . G A 67 . DP=7;AF1=0.5;CI95=0.5,0.5;DP4=1,1,2,3;MQ=33;FQ=28;PV4=1,0.18,0.1,1 GT:PL:GQ 0/1:97,0,55:58
chr1 3011005 . T C 10.4 . DP=5;AF1=0.5;CI95=0.5,0.5;DP4=2,1,0,2;MQ=33;FQ=13.2;PV4=0.4,0.14,0.42,1 GT:PL:GQ 0/1:40,0,77:42
chr1 3011166 . T G 11.3 . DP=4;AF1=0.5001;CI95=0.5,0.5;DP4=1,1,2,0;MQ=26;FQ=6.48;PV4=1,0.21,1,1 GT:PL:GQ 0/1:41,0,33:36
chr1 3011231 . T A 37 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,1,2;MQ=28;FQ=-36 GT:PL:GQ 1/1:69,9,0:16
chr1 3030035 . T C 39 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=42;FQ=-36 GT:PL:GQ 1/1:71,9,0:16
chr1 3031485 . G T 48 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=42;FQ=-36 GT:PL:GQ 1/1:80,9,0:16
chr1 3035555 . A G 70 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=58;FQ=-36 GT:PL:GQ 1/1:102,9,0:16
chr1 3044119 . G C 25.3 . DP=5;AF1=0.5336;CI95=0.5,1;DP4=2,0,2,1;MQ=22;FQ=-18.1;PV4=1,1,1,0.26 GT:PL:GQ 0/1:55,0,9:12
chr1 3044145 . A G 75.8 . DP=6;AF1=1;CI95=0.5,1;DP4=1,0,2,3;MQ=25;FQ=-37;PV4=1,1,1,1 GT:PL:GQ 1/1:108,10,0:17
chr1 3044148 . A G 81.3 . DP=5;AF1=1;CI95=0.5,1;DP4=0,0,2,3;MQ=28;FQ=-42 GT:PL:GQ 1/1:114,15,0:27
chr1 3047201 . C T 11.1 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=33;FQ=-33 GT:PL:GQ 1/1:42,6,0:9

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

How to generate allele depth at a specific position

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News