SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
dbSNP frequencies JohnK Bioinformatics 2 12-05-2013 08:36 AM
How do I calculate tetranucleotide frequencies? AndrewRGross Bioinformatics 4 11-07-2013 03:12 PM
Need help identifying SNPs and allele frequencies dkenned1 Bioinformatics 1 11-06-2011 12:17 PM
Plot of Frequencies - Different Colors hicham Bioinformatics 2 04-12-2010 04:56 AM

Reply
 
Thread Tools
Old 03-19-2015, 09:06 AM   #1
mrw3288
Junior Member
 
Location: S.Paulo

Join Date: Jun 2013
Posts: 4
Default Haplotype frequencies

I am interested in looking at some very short regions (~200 bp) in the human genome which contain ~15 SNPs. So, taking for instance Phase 3 phased haplotype data (vcf) from 1000 genomes (~2500 individuals), I would like to identify all the different haplotypes and count them, thus allowing me to obtain the haplotype frequencies in this sample of individuals.

I have tried using 'vcfgeno2haplo' and 'vcf2tsv' which are part of vcflib on Galaxy but I cannot the first to accept my data; can someone suggest how I might do this or where I should look?

There's an interesting tool described at
http://www.biomedcentral.com/1471-2105/15/200
for visualizing this information but it does not enable one to pull out the statistical data that I need.

Thanks.
mrw3288 is offline   Reply With Quote
Old 01-10-2016, 12:34 PM   #2
cowman
Member
 
Location: North west UK

Join Date: Jan 2011
Posts: 13
Default

I am trying to answer the same question but on a genome scale. What are the haplotype frequencies at each locus across all loci.
Plink --blocks will generate lists of haplotype blocks but there is only one block per locus which implies a maximum of two haplotypes per locus in a population which cannot always be true.
Impute and Shapeit will phase genotype data but I have not found any sets of haplotype blocks assoicated with them so even if their models contemplate more than two haplotypes at a locus it is not possible to estract this information from the phased data.
Any ideas about how I should find the number and frequencies of haploytpes at each locus in the 1000 genomes data?
cowman is offline   Reply With Quote
Old 01-11-2016, 08:25 AM   #3
mrw3288
Junior Member
 
Location: S.Paulo

Join Date: Jun 2013
Posts: 4
Default

A colleague of mine should supply the code. But I don't understand what you mean by "at each locus across all loci"; what is your "locus" or what sort of "marker" are you considering?
mrw3288 is offline   Reply With Quote
Old 01-11-2016, 08:50 AM   #4
cowman
Member
 
Location: North west UK

Join Date: Jan 2011
Posts: 13
Default

Hi Mrw3288, thanks for your reply.
I mean SNP loci. I will be working with 1000 genomes data. I want to know the number of haplotypes at each SNP locus in the genome in each population in the data set.
However locus can be a flexible concept and I could work with windows of say 10kb.
cowman is offline   Reply With Quote
Old 01-11-2016, 08:59 AM   #5
mrw3288
Junior Member
 
Location: S.Paulo

Join Date: Jun 2013
Posts: 4
Default

So next what do you mean by "at each SNP locus"? You need at least two (adjacent) SNPs (along a chromosome) to start making a haplotype.
Anyway, if I solve my question (200 bp window) I should be able to solve yours (10 kb window), and I'll let you know if/when that happens.
mrw3288 is offline   Reply With Quote
Old 01-11-2016, 10:47 AM   #6
cowman
Member
 
Location: North west UK

Join Date: Jan 2011
Posts: 13
Default

True, but haplotypes have boundaries at SNP so the number of haplotypes associated with one SNP may be different from the adjacent SNP. The resolution that I will work to will partly depend on speed and practicality.
I will look forward to hearing of your solution.
cowman is offline   Reply With Quote
Old 03-01-2016, 08:35 AM   #7
mrw3288
Junior Member
 
Location: S.Paulo

Join Date: Jun 2013
Posts: 4
Default

Here (attached) is a partial solution; it's an R script written by my colleague Jaqueline Wang. Ignore the comments in Portuguese. Put the script and your *.vcf.gz (eg sliced from 1000genomes) file in the same directory and run the script. The output is a *.tsv file. You'll have to slice the vcf according to your needs.
Attached Files
File Type: zip CalcFreqHaplVCF.zip (1.7 KB, 22 views)
mrw3288 is offline   Reply With Quote
Old 03-03-2016, 04:18 AM   #8
cowman
Member
 
Location: North west UK

Join Date: Jan 2011
Posts: 13
Default

Thanks a lot for that. I will let you know how I get on.
cowman is offline   Reply With Quote
Reply

Tags
haplotypes, phased, vcf file

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO