Seqanswers Leaderboard Ad

**gringer** · 02-16-2012, 02:51 AM

Grey usually means that the alleles are not polymorphic. You can't calculate LD without polymorphism. Are you sure both the markers have variation in your sample?

**sservice2003** · 02-16-2012, 12:44 PM

Hi - thanks for the reply!

The "1" allele at both markers is very rare, but both markers are polymorphic. As an example, here are the counts of PERSONS with various genotypic configurations at the two markers:

Marker 1 Marker 2 # People
1/2 2/2 87
2/2 1/2 2
2/2 2/2 6028

Since no persons are doubly heterozygous, we can estimate haplotype frequencies as (where the first # is the allele at marker 1, and the second # is the allele at marker 2):
1-2: 0.007
2-1: 0.0002
2-2: 0.9927

With just 3 haplotypes abs(D') =1, and in this example r2 is 0. These are the same data I am feeding into Haploview yet the square comparing these 2 markers is greyed out. All squares related to marker 2 are greyed out, LD data are presented for marker 1 in combination with other markers. Alternative allele freq of marker 2 is 0.00016, and I had changed the minimum minor allele on the check markers tab to be 0.0001 to force include. I guess I should just consider all grey squares for rare alleles in my data set to have r2=0 -but it's not very pretty.

In the meantime I did find an alternative program, snp.plotter, that runs in R, and will also plot association results above the LD info - and it deals with these rare alleles w/o problem, so I'll just go with that program I guess.

thanks again for your reply!

**gringer** · 02-17-2012, 12:20 AM

Originally posted by sservice2003 View Post

The "1" allele at both markers is very rare, but both markers are polymorphic. As an example, here are the counts of PERSONS with various genotypic configurations at the two markers:

Marker 1 Marker 2 # People
1/2 2/2 87
2/2 1/2 2
2/2 2/2 6028

...

In the meantime I did find an alternative program, snp.plotter, that runs in R, and will also plot association results above the LD info - and it deals with these rare alleles w/o problem, so I'll just go with that program I guess.

I think it's a bit of a stretch to call a SNP polymorphic if there are only 2 counts out of 6117. That's easily within the range of sequencing error. Yes, you can calculate these values, but it's unlikely to be a useful calculation. If this R program gives probability values associated with the D' value then it might be okay to use, but it would be much better to estimate your recombination statistics for that region using nearby SNPs with a higher polymorphic fraction.

With just 3 haplotypes abs(D') =1

What calculation are you using to determine this? I'm a little rusty on this, but here's my working:

Code:

D = f(1-1) * f(2-2) - f(1-2) * f(2-1) = 0 - 0.007 * 0.002 = -.000014
[where 1 is minor allele, 2 is major allele in both cases]
f(1-)*f(-2) = 87 * (87+6028) / (6028 + 2 + 87)^2 = .014218
f(2-)*f(-1) = (2+6028) * (2) / (6028 + 2 + 87)^2 = .0003223 [minimum value]
D' = -.000014 / .0003223 = -.0434

is this right?

**sservice2003** · 02-17-2012, 11:29 AM

Originally posted by gringer View Post

I think it's a bit of a stretch to call a SNP polymorphic if there are only 2 counts out of 6117. That's easily within the range of sequencing error. Yes, you can calculate these values, but it's unlikely to be a useful calculation. If this R program gives probability values associated with the D' value then it might be okay to use, but it would be much better to estimate your recombination statistics for that region using nearby SNPs with a higher polymorphic fraction.

What calculation are you using to determine this? I'm a little rusty on this, but here's my working:

Code:

D = f(1-1) * f(2-2) - f(1-2) * f(2-1) = 0 - 0.007 * 0.002 = -.000014
[where 1 is minor allele, 2 is major allele in both cases]
f(1-)*f(-2) = 87 * (87+6028) / (6028 + 2 + 87)^2 = .014218
f(2-)*f(-1) = (2+6028) * (2) / (6028 + 2 + 87)^2 = .0003223 [minimum value]
D' = -.000014 / .0003223 = -.0434

is this right?

Hi - yes you're certainly right that 2 observations could be sequencing error.

Whenever one of the four haplotypes is missing, abs(D') is always 1. In your calculations above, the frequency of the 2-1 haplotype is 0.00016 (lost a zero?), so that the estimate of D is -1.18x10^-6. When D is <0 the maximum D can obtain is the minimum of f(1-)*f(-1) and f(2-)*f(-2) - the formula you have above is for when D >0.

Thanks for your reply!

**xiangfeiloulan** · 02-20-2012, 03:10 AM

haploview software

hi,I want to ask a quetion about haploview.
when i use haploview,i face such a problem:
Too many loci in a single block (> 500 non-redundant)

my command lines is below:
java -Xmx40000m -jar /panfs/CD/zhangfan/bin/Haploview4.1.tar/Haploview4.1/Haploview.jar -n -log chr9.log -haps chr9.haps -info chr9.info -dprime -blockoutput ALL -maxDistance 100 -minMAF 0.01 -pairwiseTagging

I want to get LD file and select tagSNP,please help me,thank you!

**gringer** · 02-20-2012, 03:18 AM

Originally posted by xiangfeiloulan View Post

Too many loci in a single block (> 500 non-redundant)

my command lines is below:
java -Xmx40000m -jar /panfs/CD/zhangfan/bin/Haploview4.1.tar/Haploview4.1/Haploview.jar -n -log chr9.log -haps chr9.haps -info chr9.info -dprime -blockoutput ALL -maxDistance 100 -minMAF 0.01 -pairwiseTagging

I want to get LD file and select tagSNP,please help me,thank you!

Haploview is complaining because it's not able to handle that many loci. Try increasing your minor allele frequency (minMAF) so that loci are only considered for tagging SNP selection if they are more heterozygous.

I guess you could remove the 500 loci limit by editing the code, but my guess is that the limit is in there so that processing can be done in a tractable amount of time.

**xiangfeiloulan** · 02-20-2012, 07:12 PM

thank you for your reply,i will try. Also i want to know what else software I can use to get LD and Frq files,because haploview needs too big memory and it runs so slow,please give me advice,now I have got phased haptype data .thank you !

**gringer** · 02-20-2012, 11:58 PM

Haploview is nice for visualisation, but not so great on a genomic scale. You could try Plink, which has been specifically designed for processing data sets with many more SNPs:

There are no fixed limits to the size of the data file; it uses currently 1 byte for 4 SNP genotypes and some overhead per SNP and per individual. This means that you should be able to get datasets of, say, 1 million SNPs and up to 5000 individuals, in a machine with 2GB RAM without causing too much stress/swapping, etc.

[from http://pngu.mgh.harvard.edu/~purcell...aq.shtml#faq5]

There's a pruning method that can be used to generate a set of SNPs with low pairwise LD:

http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#prune

And a tagger / LD calculator / block estimator:

http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml

**DanFrost** · 02-21-2012, 10:47 AM

Try out software SNP & Variation Suite. Though it is not free to buy, it is free to try. And if you want to get a quick look at how LD can be calculated and displayed across the whole genome without running into memory limitations, this may be a place to start.

Full disclose: I work for this company. I am simply encouraging downloading the free trial to see if there is a quick solution (inside the free trial time period) that may help you out.

**[email protected]** · 01-29-2013, 06:32 PM

Hi,
I need help regarding haploview. i wanted to plot LD blocks for DArT markers data. I have R^2 values, P-values, marker positions, D' and D value. Is there any method or example file i can import such information in HaploView?
Kind Regards

**gringer** · 01-29-2013, 06:45 PM

It sounds like you're trying to use the wrong hammer. If you've already got the necessary statistics, you'd be better off using something like R's LDheatmap package to plot the data. From the looks of it (I haven't used it), you can give a matrix of LD values, as well as marker positions/names, and it will produce a plot of the data.

Haploview does have a few different methods for defining block boundaries, but they are based on the assumption that blocks are discrete, with no holes, sub-blocks, or overlaps (all of which I've observed in dense Human SNP data).

**[email protected]** · 01-29-2013, 07:47 PM

Hi Gringer,
Thanks for reply.
I have tried Ld Heatmap but it graphical presentation is not as smart as Haploview. I have seen some papers where they used DArT marker data to create LD blocks.
My question is "how i can create input file for Haploview" using DArT marker data (1, 0).
if you can help i will be grateful.

Kind Regards

**gringer** · 01-29-2013, 08:03 PM

Do you have genotype data as well (it wasn't specified in your original statement)? If so, and it is discrete 0/1 data for each marker, it should be fine to use for Haploview, which I think expects dimorphic SNPs. You would then be using Haploview to generate LD estimates, rather than using your own statistics (hence my 'hammer' comment).

You need to convert your data into something that Haploview can understand. See here.

If you have a small number of markers, the standard Linkage format should be fine (one line per individual in PED file, list of marker locations in MAP file). Assuming your genotype data is {0,1}, just add 1 to the number, because 0 will be treated as missing (i.e. {1,2}). If you don't have any pedigree information, assign everyone to a different family, and don't give them any known father/mother IDs (e.g. 'FAM01 ID01 0 0 0 0 <genotypes>').

If you have a lot of markers (or just prefer a rotated format), you can try the HapMap project data dump format. This format has one line per marker (similar to PLINK tped format), with the pedigree stored in the header of the file (lines beginning with #@).

**[email protected]** · 01-29-2013, 11:26 PM

Dear Gringer,
Wonderful, you helped me to make. I just tried and it was working. i am grateful for your help.

Kind Regards

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

help with Haploview

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News