SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNAseq: Pipeline to detect allele specific expression dariober Bioinformatics 9 07-17-2015 12:46 PM
Quantifying allele-specific expression imbalance using NGS jastop Bioinformatics 2 08-29-2012 07:44 AM
Allele-specific alignments of NGS datasets using ASAP fkrueger Bioinformatics 2 01-09-2012 07:53 AM
allele-specific expression baohua100 Bioinformatics 0 05-10-2011 11:32 PM
PubMed: Allele-specific expression assays using Solexa. Newsbot! Literature Watch 0 09-11-2009 02:01 AM

Reply
 
Thread Tools
Old 12-14-2011, 11:50 AM   #1
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default How to generate allele depth at a specific position

Hi all,

I have two questions, first one is regarding allele depth which exactly is as follows,
How can we find, the number of the reads at each variant site have a particular allele, i.e. if you have ref = C and ALT = A,G, how many of the reads have A, G and C? Is there a tool to generate this?

The second question is,
How can we infer a SNP to be homozygous or heterozygous from the mpileup output, vcf file?

Could someone help me with this!!!
meher is offline   Reply With Quote
Old 12-14-2011, 06:27 PM   #2
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

With mpileup if there are two alternate alleles I don't believe there is a way to tell how many reads go to each (from the mpileup output). You can open up the bam file in IGV and look that way, I think.

From the mpileup output, assuming you have run it on a single individual, you can look at the "AC1=" part of the info field. If it's 1, it's heterozygous. If it's 2, it's homozygous. You can also look at the genotype field. If it's "0/1" it's heterozygous, if it's "1/1" it's homozygous.
Heisman is offline   Reply With Quote
Old 12-15-2011, 12:39 AM   #3
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Thanks for your answer.

I have a bam file which is pooled with 10 individuals, so in this case can we still look at the value 'AC' and decide its zygosity status or is GT the only field from where we can infer the zygosity status.
meher is offline   Reply With Quote
Old 12-15-2011, 05:23 AM   #4
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

I've yet to analyze pooled samples so I'm not sure. Could you paste the header lines and a couple of example lines of the called variants?
Heisman is offline   Reply With Quote
Old 12-15-2011, 05:57 AM   #5
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

Quote:
Originally Posted by Heisman View Post
With mpileup if there are two alternate alleles I don't believe there is a way to tell how many reads go to each (from the mpileup output). ...
Actually, you can get this information from mpileup. It's in the fifth column of the pileup format output. This column indicates the state of all reads at a given position. You can tell how many reads have the reference allele, how many have the alternate, and how many have something else. It's a little ugly to parse, but the information is there.
Alex Renwick is offline   Reply With Quote
Old 12-15-2011, 06:03 AM   #6
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by Alex Renwick View Post
Actually, you can get this information from mpileup. It's in the fifth column of the pileup format output. This column indicates the state of all reads at a given position. You can tell how many reads have the reference allele, how many have the alternate, and how many have something else. It's a little ugly to parse, but the information is there.
When there are two alternate alleles, I've noticed that with DP4 reads for both alleles are lumped together, ie, there are still 4 columns instead of 6.
Heisman is offline   Reply With Quote
Old 12-15-2011, 06:14 AM   #7
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

Quote:
Originally Posted by Heisman View Post
When there are two alternate alleles, I've noticed that with DP4 reads for both alleles are lumped together, ie, there are still 4 columns instead of 6.
Hmm...I think we must be talking about different things. The default pileup output looks like (from samtools):

Code:
seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23  ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23  ,.$....,,.,.,...,,,.,...    7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23  ,$....,,.,.,...,,,.,...^l.  <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22  ...T,,.,.,...,,,.,....  33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22  ....,,.,.,.C.,,,.,..G.  +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23  ....,,.,.,...,,,.,....^k.   %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23  A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
This shows, for example, that position 279 of seq1 has reference allele C, and that 21 reads agree with the reference while one read has A and one read has T.
Alex Renwick is offline   Reply With Quote
Old 12-15-2011, 06:22 AM   #8
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

I don't believe there is any way to get pileup outuput when using the newer version of mpileup, correct? Or am I mistaken? That's an interesting idea though to use pileup for that purpose (assuming I'm not mistaken).
Heisman is offline   Reply With Quote
Old 12-15-2011, 06:23 AM   #9
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

VCF version 4 contains this information if you call your variants with mpileup-bcftools or the GATK UnifiedGenotyper. Have a look at the docs but note that both programs do not produce the exact same information by default. I find the GATK VCF more informative for counting allele frequency and depth.
zee is offline   Reply With Quote
Old 12-15-2011, 08:39 AM   #10
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

Quote:
Originally Posted by Heisman View Post
I don't believe there is any way to get pileup outuput when using the newer version of mpileup, correct? Or am I mistaken? That's an interesting idea though to use pileup for that purpose (assuming I'm not mistaken).
Call mpileup without computing genotype (-g or -u option) and it produces pileup format. E.g., "samtools mpileup -f ref.fasta sample.bam > sample.pileup".
Alex Renwick is offline   Reply With Quote
Old 12-15-2011, 11:14 AM   #11
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

That's true, even if there are two alternate alleles it gives only four values in DP4. So we cannot decide the heterozygosity in this case. Is there a way to find this? will be very interesting for me if there is a solution!!
meher is offline   Reply With Quote
Old 12-15-2011, 12:48 PM   #12
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Hi, here are the few lines of the output;


##fileformat=VCFv4.1
##samtoolsVersion=0.1.16 (r963:234)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the site allele frequency of the first ALT allele">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CI95,Number=2,Type=Float,Description="Equal-tail Bayesian credible interval of the site allele frequency at the 95% level">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT alignment-sorted.bam
chr1 3002525 . C A 34 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=27;FQ=-36 GT:PL:GQ
1/1:66,9,0:16
chr1 3009311 . A G 63 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=42;FQ=-36 GT:PL:GQ 1/1:95,9,0:16
chr1 3010971 . G A 67 . DP=7;AF1=0.5;CI95=0.5,0.5;DP4=1,1,2,3;MQ=33;FQ=28;PV4=1,0.18,0.1,1 GT:PL:GQ 0/1:97,0,55:58
chr1 3011005 . T C 10.4 . DP=5;AF1=0.5;CI95=0.5,0.5;DP4=2,1,0,2;MQ=33;FQ=13.2;PV4=0.4,0.14,0.42,1 GT:PL:GQ 0/1:40,0,77:42
chr1 3011166 . T G 11.3 . DP=4;AF1=0.5001;CI95=0.5,0.5;DP4=1,1,2,0;MQ=26;FQ=6.48;PV4=1,0.21,1,1 GT:PL:GQ 0/1:41,0,33:36
chr1 3011231 . T A 37 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,1,2;MQ=28;FQ=-36 GT:PL:GQ 1/1:69,9,0:16
chr1 3030035 . T C 39 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=42;FQ=-36 GT:PL:GQ 1/1:71,9,0:16
chr1 3031485 . G T 48 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=42;FQ=-36 GT:PL:GQ 1/1:80,9,0:16
chr1 3035555 . A G 70 . DP=3;AF1=1;CI95=0.5,1;DP4=0,0,2,1;MQ=58;FQ=-36 GT:PL:GQ 1/1:102,9,0:16
chr1 3044119 . G C 25.3 . DP=5;AF1=0.5336;CI95=0.5,1;DP4=2,0,2,1;MQ=22;FQ=-18.1;PV4=1,1,1,0.26 GT:PL:GQ 0/1:55,0,9:12
chr1 3044145 . A G 75.8 . DP=6;AF1=1;CI95=0.5,1;DP4=1,0,2,3;MQ=25;FQ=-37;PV4=1,1,1,1 GT:PL:GQ 1/1:108,10,0:17
chr1 3044148 . A G 81.3 . DP=5;AF1=1;CI95=0.5,1;DP4=0,0,2,3;MQ=28;FQ=-42 GT:PL:GQ 1/1:114,15,0:27
chr1 3047201 . C T 11.1 . DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=33;FQ=-33 GT:PL:GQ 1/1:42,6,0:9
meher is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO