SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
No "0/0" (homozygous ref) genotypes in VCF file slp Bioinformatics 8 03-13-2013 06:05 PM
VCF - genotypes - "missing alleles" Elsie Bioinformatics 2 02-10-2013 01:08 PM
Order of genotypes likelihoods for multiple alleles in vcf Karen Chait Berman Bioinformatics 0 07-18-2012 11:38 PM
looking for a software to compare genotypes between VCF files? caswater Bioinformatics 5 06-14-2012 02:22 PM
GATK error with a VCF including missing genotypes a11msp Bioinformatics 0 04-23-2012 05:33 AM

Reply
 
Thread Tools
Old 04-22-2013, 12:34 PM   #1
jebowers
Member
 
Location: Athens, GA, USA

Join Date: Apr 2013
Posts: 19
Default Vcf genotypes for RILs

I was wondering if there is any way to change the assumptions for the genotypes calling in .vcf files from mpileup in samtools. I am working with a diploid organism but the individuals are mostly homozygous recombinant inbred lines with only about 1% residual heterozygosity. (or highly inbred lines). The problem is that when calling a SNP with low coverage (1-3 reads) and only one allele is observed in a sample, it often assumes the individual is heterozygous if the observed allele is the less common allele.

The problem is that it assumes that all the loci in my individuals are in H-W equilibrium, when in fact due to experimental design they are not anywhere close to being in HW eq and most loci are going to be homozygous. Filtering by the quality on genotype calls reduces the problem but also discards much of the data.

Of course sequencing to a high depth would solve this question with the existing tools, but when I expect >99% homozygous individuals at each loci that should not be necessary, as one or two A "Reads" should be enough to predict an AA genotype.
jebowers is offline   Reply With Quote
Old 04-22-2013, 01:58 PM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Could you post what arguments you are using? This is a question I am very interested in knowing the answer to.
chadn737 is offline   Reply With Quote
Old 04-22-2013, 02:30 PM   #3
jebowers
Member
 
Location: Athens, GA, USA

Join Date: Apr 2013
Posts: 19
Default

I made bowtie2 for alignments, followed by samtools to create sorted bam files.

mpileup -BuDf Refseq.fa differentsorted.bam(100 separate files) | bcftools view -bvcg - > out.bcf

bcftools view -N output.bcf > output.vcf

I have also used vcftools option --geno-depth on the .vcf file but the results are all -1 (missing data).

I have tried various permutiations in addition with similar results.
jebowers is offline   Reply With Quote
Old 04-22-2013, 02:34 PM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

That's what I figured.....I wonder what would happen if you didn't use the -c argument when you run bcftools. This calls the -e argument which does the test for Hardy-Weinberg Equilibrium:

Consensus/Variant Calling Options:
-c Call variants using Bayesian inference. This option automatically invokes option -e.

-d FLOAT When -v is in use, skip loci where the fraction of samples covered by reads is below FLOAT. [0]

-e Perform max-likelihood inference only, including estimating the site allele frequency, testing Hardy-Weinberg equlibrium and testing associations with LRT.


http://samtools.sourceforge.net/samtools.shtml#4

Maybe try instead.
Code:
mpileup -BuDf Refseq.fa differentsorted.bam(100 separate files) | bcftools view -bvg - > out.bcf
I'd be interested to know how this affects the results. I have never run bcftools without the -c argument.

PS. I see you're in Athens, GA.....if you wouldn't mind I'd like to ask you a few questions. I am starting a post-doc at UGA in Aug.

Last edited by chadn737; 04-22-2013 at 02:38 PM.
chadn737 is offline   Reply With Quote
Old 04-22-2013, 02:38 PM   #5
jebowers
Member
 
Location: Athens, GA, USA

Join Date: Apr 2013
Posts: 19
Default

Thanks so much, guess I have to re-run that 2 week mpileup.
jebowers is offline   Reply With Quote
Old 04-22-2013, 02:40 PM   #6
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Could you run it on one or two files instead and test it? 2 weeks is a long time to try something new out if you don't now what the result will be.
chadn737 is offline   Reply With Quote
Old 04-22-2013, 03:17 PM   #7
jebowers
Member
 
Location: Athens, GA, USA

Join Date: Apr 2013
Posts: 19
Default

Yes planning on doing so. But right now our computer cluster is having disk issues so I don't expect quick results.

I think that I will have to do the -b option only (not -bvg) on the bcftools view as -v and -g invoke -c.

Last edited by jebowers; 04-22-2013 at 03:22 PM. Reason: x
jebowers is offline   Reply With Quote
Old 04-22-2013, 05:41 PM   #8
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

You're right, my bad, I should have read that a bit more closely.
chadn737 is offline   Reply With Quote
Old 05-15-2013, 05:11 PM   #9
mxr1895
Junior Member
 
Location: new zealand

Join Date: Feb 2012
Posts: 6
Default

Hi,
I got around a similar problem (I'm working with the yeast equivalent of RI lines) by using Freebayes, which has an option for ploidy. This allows you to genotype your RI samples as if they were haploids.
However, in my experience, low coverage will result in poor genoype calls.
Cheers,

Miguel
mxr1895 is offline   Reply With Quote
Reply

Tags
.vcf, mpileup, rils, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO