Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with Haploview

    Hi, I'm using Haploview and am having trouble getting LD information to appear for rare alleles. I have changed the allele frequency cut-off to the minimum allele frequency in my sample, and 100% of my sample has genotype data. The "rating" box on the Check Markers tab says the markers are all included, yet on the LD tab no LD information is presented for these markers - the portion of the LD plot for these markers is gray, like the background.

    Has anyone experienced this before? Is there another LD plotting software that will be better suited to data with rare alleles? By rare I mean frequencies in the range of 10^-3 to 10^-5, I have data on >5000 persons.

    Thanks in advance for your help!

  • #2
    Grey usually means that the alleles are not polymorphic. You can't calculate LD without polymorphism. Are you sure both the markers have variation in your sample?

    Comment


    • #3
      Hi - thanks for the reply!

      The "1" allele at both markers is very rare, but both markers are polymorphic. As an example, here are the counts of PERSONS with various genotypic configurations at the two markers:

      Marker 1 Marker 2 # People
      1/2 2/2 87
      2/2 1/2 2
      2/2 2/2 6028

      Since no persons are doubly heterozygous, we can estimate haplotype frequencies as (where the first # is the allele at marker 1, and the second # is the allele at marker 2):
      1-2: 0.007
      2-1: 0.0002
      2-2: 0.9927

      With just 3 haplotypes abs(D') =1, and in this example r2 is 0. These are the same data I am feeding into Haploview yet the square comparing these 2 markers is greyed out. All squares related to marker 2 are greyed out, LD data are presented for marker 1 in combination with other markers. Alternative allele freq of marker 2 is 0.00016, and I had changed the minimum minor allele on the check markers tab to be 0.0001 to force include. I guess I should just consider all grey squares for rare alleles in my data set to have r2=0 -but it's not very pretty.

      In the meantime I did find an alternative program, snp.plotter, that runs in R, and will also plot association results above the LD info - and it deals with these rare alleles w/o problem, so I'll just go with that program I guess.

      thanks again for your reply!

      Comment


      • #4
        Originally posted by sservice2003 View Post
        The "1" allele at both markers is very rare, but both markers are polymorphic. As an example, here are the counts of PERSONS with various genotypic configurations at the two markers:

        Marker 1 Marker 2 # People
        1/2 2/2 87
        2/2 1/2 2
        2/2 2/2 6028

        ...

        In the meantime I did find an alternative program, snp.plotter, that runs in R, and will also plot association results above the LD info - and it deals with these rare alleles w/o problem, so I'll just go with that program I guess.
        I think it's a bit of a stretch to call a SNP polymorphic if there are only 2 counts out of 6117. That's easily within the range of sequencing error. Yes, you can calculate these values, but it's unlikely to be a useful calculation. If this R program gives probability values associated with the D' value then it might be okay to use, but it would be much better to estimate your recombination statistics for that region using nearby SNPs with a higher polymorphic fraction.

        With just 3 haplotypes abs(D') =1
        What calculation are you using to determine this? I'm a little rusty on this, but here's my working:
        Code:
        D = f(1-1) * f(2-2) - f(1-2) * f(2-1) = 0 - 0.007 * 0.002 = -.000014
        [where 1 is minor allele, 2 is major allele in both cases]
        f(1-)*f(-2) = 87 * (87+6028) / (6028 + 2 + 87)^2 = .014218
        f(2-)*f(-1) = (2+6028) * (2) / (6028 + 2 + 87)^2 = .0003223 [minimum value]
        D' = -.000014 / .0003223 = -.0434
        is this right?

        Comment


        • #5
          Originally posted by gringer View Post
          I think it's a bit of a stretch to call a SNP polymorphic if there are only 2 counts out of 6117. That's easily within the range of sequencing error. Yes, you can calculate these values, but it's unlikely to be a useful calculation. If this R program gives probability values associated with the D' value then it might be okay to use, but it would be much better to estimate your recombination statistics for that region using nearby SNPs with a higher polymorphic fraction.





          What calculation are you using to determine this? I'm a little rusty on this, but here's my working:
          Code:
          D = f(1-1) * f(2-2) - f(1-2) * f(2-1) = 0 - 0.007 * 0.002 = -.000014
          [where 1 is minor allele, 2 is major allele in both cases]
          f(1-)*f(-2) = 87 * (87+6028) / (6028 + 2 + 87)^2 = .014218
          f(2-)*f(-1) = (2+6028) * (2) / (6028 + 2 + 87)^2 = .0003223 [minimum value]
          D' = -.000014 / .0003223 = -.0434
          is this right?
          Hi - yes you're certainly right that 2 observations could be sequencing error.

          Whenever one of the four haplotypes is missing, abs(D') is always 1. In your calculations above, the frequency of the 2-1 haplotype is 0.00016 (lost a zero?), so that the estimate of D is -1.18x10^-6. When D is <0 the maximum D can obtain is the minimum of f(1-)*f(-1) and f(2-)*f(-2) - the formula you have above is for when D >0.

          Thanks for your reply!

          Comment


          • #6
            haploview software

            hi,I want to ask a quetion about haploview.
            when i use haploview,i face such a problem:
            Too many loci in a single block (> 500 non-redundant)

            my command lines is below:
            java -Xmx40000m -jar /panfs/CD/zhangfan/bin/Haploview4.1.tar/Haploview4.1/Haploview.jar -n -log chr9.log -haps chr9.haps -info chr9.info -dprime -blockoutput ALL -maxDistance 100 -minMAF 0.01 -pairwiseTagging

            I want to get LD file and select tagSNP,please help me,thank you!

            Comment


            • #7
              Originally posted by xiangfeiloulan View Post
              Too many loci in a single block (> 500 non-redundant)

              my command lines is below:
              java -Xmx40000m -jar /panfs/CD/zhangfan/bin/Haploview4.1.tar/Haploview4.1/Haploview.jar -n -log chr9.log -haps chr9.haps -info chr9.info -dprime -blockoutput ALL -maxDistance 100 -minMAF 0.01 -pairwiseTagging

              I want to get LD file and select tagSNP,please help me,thank you!
              Haploview is complaining because it's not able to handle that many loci. Try increasing your minor allele frequency (minMAF) so that loci are only considered for tagging SNP selection if they are more heterozygous.

              I guess you could remove the 500 loci limit by editing the code, but my guess is that the limit is in there so that processing can be done in a tractable amount of time.

              Comment


              • #8
                thank you for your reply,i will try. Also i want to know what else software I can use to get LD and Frq files,because haploview needs too big memory and it runs so slow,please give me advice,now I have got phased haptype data .thank you !

                Comment


                • #9
                  Haploview is nice for visualisation, but not so great on a genomic scale. You could try Plink, which has been specifically designed for processing data sets with many more SNPs:

                  There are no fixed limits to the size of the data file; it uses currently 1 byte for 4 SNP genotypes and some overhead per SNP and per individual. This means that you should be able to get datasets of, say, 1 million SNPs and up to 5000 individuals, in a machine with 2GB RAM without causing too much stress/swapping, etc.
                  [from http://pngu.mgh.harvard.edu/~purcell...aq.shtml#faq5]

                  There's a pruning method that can be used to generate a set of SNPs with low pairwise LD:



                  And a tagger / LD calculator / block estimator:

                  Last edited by gringer; 02-21-2012, 12:00 AM.

                  Comment


                  • #10
                    Try out software SNP & Variation Suite. Though it is not free to buy, it is free to try. And if you want to get a quick look at how LD can be calculated and displayed across the whole genome without running into memory limitations, this may be a place to start.

                    Full disclose: I work for this company. I am simply encouraging downloading the free trial to see if there is a quick solution (inside the free trial time period) that may help you out.

                    Comment


                    • #11
                      Hi,
                      I need help regarding haploview. i wanted to plot LD blocks for DArT markers data. I have R^2 values, P-values, marker positions, D' and D value. Is there any method or example file i can import such information in HaploView?
                      Kind Regards

                      Comment


                      • #12
                        It sounds like you're trying to use the wrong hammer. If you've already got the necessary statistics, you'd be better off using something like R's LDheatmap package to plot the data. From the looks of it (I haven't used it), you can give a matrix of LD values, as well as marker positions/names, and it will produce a plot of the data.

                        Haploview does have a few different methods for defining block boundaries, but they are based on the assumption that blocks are discrete, with no holes, sub-blocks, or overlaps (all of which I've observed in dense Human SNP data).

                        Comment


                        • #13
                          Hi Gringer,
                          Thanks for reply.
                          I have tried Ld Heatmap but it graphical presentation is not as smart as Haploview. I have seen some papers where they used DArT marker data to create LD blocks.
                          My question is "how i can create input file for Haploview" using DArT marker data (1, 0).
                          if you can help i will be grateful.

                          Kind Regards

                          Comment


                          • #14
                            Do you have genotype data as well (it wasn't specified in your original statement)? If so, and it is discrete 0/1 data for each marker, it should be fine to use for Haploview, which I think expects dimorphic SNPs. You would then be using Haploview to generate LD estimates, rather than using your own statistics (hence my 'hammer' comment).

                            You need to convert your data into something that Haploview can understand. See here.

                            If you have a small number of markers, the standard Linkage format should be fine (one line per individual in PED file, list of marker locations in MAP file). Assuming your genotype data is {0,1}, just add 1 to the number, because 0 will be treated as missing (i.e. {1,2}). If you don't have any pedigree information, assign everyone to a different family, and don't give them any known father/mother IDs (e.g. 'FAM01 ID01 0 0 0 0 <genotypes>').

                            If you have a lot of markers (or just prefer a rotated format), you can try the HapMap project data dump format. This format has one line per marker (similar to PLINK tped format), with the pedigree stored in the header of the file (lines beginning with #@).

                            Comment


                            • #15
                              Dear Gringer,
                              Wonderful, you helped me to make. I just tried and it was working. i am grateful for your help.

                              Kind Regards

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X