Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linkage analysis for WGS data

    Hi all,

    We have performed whole genome sequencing(WGS) in 6 affected individuals and 2 parent samples to identify the causal variant for a particular phenotype. The disease is known to inherit by recessive mode.

    For all the samples we have the SNP's identified by WGS analysis in VCF format. Could someone suggest a method/tool to perform linkage analysis to map a specific region/gene to the phenotype?

    Any suggestions are valuable!!

  • #2
    You might give PLINK a try (you might have to convert the vcf file to .ped and .map, but there are scripts for that).

    Comment


    • #3
      You might like to try LINKDATAGEN (http://bioinf.wehi.edu.au/software/linkdatagen/#mps) to convert your VCF file into input files for linkage programs such as MERLIN (http://www.sph.umich.edu/csg/abecasis/merlin/index.html) and MORGAN (http://www.stat.washington.edu/thomp...N/Morgan.shtml).

      This approach has been successful in identifying causative variants by linkage analysis of exome-sequencing data (Smith KR, Bromhead CJ, Hildebrand MS, Shearer AE, Lockhart PJ, Najmabadi H, Leventer RJ, McGillivray G, Amor DJ, Smith RJ, Bahlo M (2011). Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biology 12:R85). It should work even better for whole-genome sequencing data since your variants are not enriched for exonic regions.

      Do these individuals form a single pedigree or are there unrelated individuals as well?

      Comment


      • #4
        Thanks for the suggestion. I have checked the tool LINKDATAGEN and tried to convert the vcf to merlin input format. But, here is an error which i met with the first step:

        vcf2linkdatagen.pl -annotfile annotHapMap2.txt -pop CEU ‑mindepth 10 -missingness 0 -idlist MyVCFlist.txt > MySNPs.brlmm

        Here my annotHapMap2.txt file is of the format:

        chr1 14548 BICF2G630707759 0
        chr1 80040 BICF2P1173580 0
        chr1 82626 BICF2G630707846 0
        chr1 212740 BICF2P1383091 0

        and the standard error shown is:

        Use of uninitialized value in pattern match (m//) at ./vcf2linkdatagen.pl line 311, <ANNOT> line 1
        .
        .
        .
        .
        Use of uninitialized value in pattern match (m//) at ./vcf2linkdatagen.pl line 311, <ANNOT> line 172387.
        # of SNPs in the annotation file = 172387
        # of SNPs with allele frequency data for population CEU in annotation file = 0
        -----------------------------------------------------------------
        Reading in the idlist file...
        BC102/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
        BC103/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
        BC104/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf

        Reading in VCF file
        BC102/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
        Reading in VCF file
        BC103/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
        Reading in VCF file
        BC104/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf

        -----------------------------------------------------------------
        In the recoding hash subroutine
        key_cnt =7059769 no of SNPs in annotation file.
        7059769 SNPs from the annotation file do not have any genotypes called in the vcf files in
        0 SNPs that have both annotation and genotype data
        -----------------------------------------------------------------
        Recoding genotypes to BRLMM format
        -----------------------------------------------------------------
        Missingness threshold set to 0. Any SNP with more than 0% missing calls will be discarded
        0 SNPs with called genotypes prior to filtering for missingness
        0 SNPs removed
        0 SNPs remaining
        -----------------------------------------------------------------
        Writing brlmm file

        Finished at 14:43:38



        Could someone help to fix this. Is it due to the annotfile?

        Any help is appreciated.

        Originally posted by PeteH View Post
        You might like to try LINKDATAGEN (http://bioinf.wehi.edu.au/software/linkdatagen/#mps) to convert your VCF file into input files for linkage programs such as MERLIN (http://www.sph.umich.edu/csg/abecasis/merlin/index.html) and MORGAN (http://www.stat.washington.edu/thomp...N/Morgan.shtml).

        This approach has been successful in identifying causative variants by linkage analysis of exome-sequencing data (Smith KR, Bromhead CJ, Hildebrand MS, Shearer AE, Lockhart PJ, Najmabadi H, Leventer RJ, McGillivray G, Amor DJ, Smith RJ, Bahlo M (2011). Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biology 12:R85). It should work even better for whole-genome sequencing data since your variants are not enriched for exonic regions.

        Do these individuals form a single pedigree or are there unrelated individuals as well?

        Comment


        • #5
          Have you read and followed the instructions in the Quick Start Guide and Documentation? Note in particular:
          1. The VCF files used as input to the program are not the same VCFs that you originally generated from your WGS data.
          2. You must create new VCF files where each sample is genotyped at the HapMap positions provided in the annotation files.
          3. The annotation files assume that you've mapped to the hg19 reference


          If you've followed the guide and are still having problems then posting a small subset of your VCF that reproduces that error will help in resolving your issue.

          Pete

          Comment


          • #6
            Okay. But, i have to mention that my genome is not human. The annotation file is the genotyped positions for the dog genome and the format is as shown in the previous post.

            So, what is meant my new VCF file, Is it a subset having only the genotypes at the positions in the annotation file?

            In my case the annotation file and VCF file are of same genome version, so i believe it will not be an issue for model organisms.


            Originally posted by PeteH View Post
            Have you read and followed the instructions in the Quick Start Guide and Documentation? Note in particular:
            1. The VCF files used as input to the program are not the same VCFs that you originally generated from your WGS data.
            2. You must create new VCF files where each sample is genotyped at the HapMap positions provided in the annotation files.
            3. The annotation files assume that you've mapped to the hg19 reference


            If you've followed the guide and are still having problems then posting a small subset of your VCF that reproduces that error will help in resolving your issue.

            Pete

            Comment


            • #7
              Ah, as the data are non-human this may require some extra work. Posting a small subset of your annotation and VCF file, along with the code used to generate it, will likely assist people to help you.

              I've brought this thread to the attention of my colleagues who wrote LINKDATAGEN. They should be able to provide more specific advice.
              Pete

              Comment


              • #8
                Here are the few lines of the annotation file:

                chr1 14548 BICF2G630707759 0
                chr1 80040 BICF2P1173580 0
                chr1 82626 BICF2G630707846 0
                chr1 212740 BICF2P1383091 0

                And the vcf file is generated from samtools and below is the command used:

                vcf2linkdatagen.pl -annotfile annotfile.txt -pop CEU ‑mindepth 10 -missingness 0 -idlist MyVCFlist.txt > MySNPs.brlmm

                Happy to provide further information if required.


                Originally posted by PeteH View Post
                Ah, as the data are non-human this may require some extra work. Posting a small subset of your annotation and VCF file, along with the code used to generate it, will likely assist people to help you.

                I've brought this thread to the attention of my colleagues who wrote LINKDATAGEN. They should be able to provide more specific advice.
                Pete

                Comment


                • #9
                  Dear Meher,

                  As Pete suggested using LINKDATAGEN will require a bit more work as you will have to "trick" it into using your data. However it should be doable if you reformat your vcf file into the same file format as we have for the example vcf file. If that doesn't work you could script up something that changes your vcf data into a brlmm style genotype call file. This is what the script vcf_to_linkdatagen does.
                  The other problems you have are of course the lack of a genetic map file but you could use the physical map locations and apply either the human rule of thumb mapping of 1 Mb= 1 cM to create pseudo genetic maps or have a look at some dog cross papers where you may be able to identify a more suitable rule of thumb (for example for mouse it is 6 Mb = 1 cM as they have a lower recombination rate). You would also need allele frequency information for the Lander-Green algorithm so I would suggest just using equal allele frequencies. The density of the data that you have will undo some of the "damage" of these assumptions.
                  You will need to create an annotation file using these ideas and then name it as one of the hapmap annotation files used by LINKDATAGEN. that way you can trick it into doing the dog genome data analysis - or rather doing a lot of QC and getting your files ready for all sorts of analyses including doing multipoint mapping with MERLIN.

                  Good luck with it & I hope this helps.

                  mbahlo

                  Comment


                  • #10
                    Hey.. I am currently trying to use LINKDATAGEN and facing the same problem. How did you solve it?? My sample set if human though. So I can't find any reason why it is showing that error.

                    Any suggestion would be helpful

                    Thanks a lot

                    Comment


                    • #11
                      Hey.. I am currently trying to use LINKDATAGEN and facing a frequent problem mentioned by many users. I emailed you my concern too but didn't receive any response. My sample set is human. And I used the following codes:

                      samtools mpileup -d10000 -q13 -Q13 -gf hg19.fa -l annotHapMap2U.txt samplex.bam | bcftools view -cg -t0.5 - > samplex.HM.vcf

                      Perl vcf2linkdatagen.pl -variantCaller mpileup -annotfile annotHapMap2U.txt -pop CEU -mindepth 10 -missingness 0 samplex.vcf > samplex.brlmm

                      The error is:
                      Use of uninitialized value $chr in concatenation (.) or string at vcf2linkdatagen.pl line 487, <IN> line 1.... to line 63964968

                      How to correct it? The same BAM file was used to generate vcf using GATK, so it can't be an issue with that. It's just not working with this tool.

                      Any suggestion would be appreciated.

                      Thanks a lot

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X