SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Nextera WGS data aparna Illumina/Solexa 11 06-04-2013 11:30 AM
PubMed: SwiftLink: Parallel MCMC linkage analysis utilising multicore CPU and GPU. Newsbot! Literature Watch 0 12-16-2012 02:51 AM
PubMed: Linkage analysis in the next-generation sequencing era. Newsbot! Literature Watch 0 04-18-2012 02:11 AM
PubMed: Next-generation linkage analysis. Newsbot! Literature Watch 0 04-18-2012 02:11 AM
Quantitative Linkage Analysis 454 Amplicon Sequencing sdl11 Bioinformatics 0 01-26-2012 01:55 PM

Reply
 
Thread Tools
Old 10-09-2013, 06:40 AM   #1
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default Linkage analysis for WGS data

Hi all,

We have performed whole genome sequencing(WGS) in 6 affected individuals and 2 parent samples to identify the causal variant for a particular phenotype. The disease is known to inherit by recessive mode.

For all the samples we have the SNP's identified by WGS analysis in VCF format. Could someone suggest a method/tool to perform linkage analysis to map a specific region/gene to the phenotype?

Any suggestions are valuable!!
meher is offline   Reply With Quote
Old 10-09-2013, 07:33 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You might give PLINK a try (you might have to convert the vcf file to .ped and .map, but there are scripts for that).
dpryan is offline   Reply With Quote
Old 10-09-2013, 01:26 PM   #3
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

You might like to try LINKDATAGEN (http://bioinf.wehi.edu.au/software/linkdatagen/#mps) to convert your VCF file into input files for linkage programs such as MERLIN (http://www.sph.umich.edu/csg/abecasis/merlin/index.html) and MORGAN (http://www.stat.washington.edu/thomp...N/Morgan.shtml).

This approach has been successful in identifying causative variants by linkage analysis of exome-sequencing data (Smith KR, Bromhead CJ, Hildebrand MS, Shearer AE, Lockhart PJ, Najmabadi H, Leventer RJ, McGillivray G, Amor DJ, Smith RJ, Bahlo M (2011). Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biology 12:R85). It should work even better for whole-genome sequencing data since your variants are not enriched for exonic regions.

Do these individuals form a single pedigree or are there unrelated individuals as well?
PeteH is offline   Reply With Quote
Old 10-11-2013, 04:31 AM   #4
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Thanks for the suggestion. I have checked the tool LINKDATAGEN and tried to convert the vcf to merlin input format. But, here is an error which i met with the first step:

vcf2linkdatagen.pl -annotfile annotHapMap2.txt -pop CEU ‑mindepth 10 -missingness 0 -idlist MyVCFlist.txt > MySNPs.brlmm

Here my annotHapMap2.txt file is of the format:

chr1 14548 BICF2G630707759 0
chr1 80040 BICF2P1173580 0
chr1 82626 BICF2G630707846 0
chr1 212740 BICF2P1383091 0

and the standard error shown is:

Use of uninitialized value in pattern match (m//) at ./vcf2linkdatagen.pl line 311, <ANNOT> line 1
.
.
.
.
Use of uninitialized value in pattern match (m//) at ./vcf2linkdatagen.pl line 311, <ANNOT> line 172387.
# of SNPs in the annotation file = 172387
# of SNPs with allele frequency data for population CEU in annotation file = 0
-----------------------------------------------------------------
Reading in the idlist file...
BC102/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
BC103/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
BC104/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf

Reading in VCF file
BC102/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
Reading in VCF file
BC103/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf
Reading in VCF file
BC104/VCF/alignment_FixMate_rawSAM_filtered_SNP.vcf

-----------------------------------------------------------------
In the recoding hash subroutine
key_cnt =7059769 no of SNPs in annotation file.
7059769 SNPs from the annotation file do not have any genotypes called in the vcf files in
0 SNPs that have both annotation and genotype data
-----------------------------------------------------------------
Recoding genotypes to BRLMM format
-----------------------------------------------------------------
Missingness threshold set to 0. Any SNP with more than 0% missing calls will be discarded
0 SNPs with called genotypes prior to filtering for missingness
0 SNPs removed
0 SNPs remaining
-----------------------------------------------------------------
Writing brlmm file

Finished at 14:43:38



Could someone help to fix this. Is it due to the annotfile?

Any help is appreciated.

Quote:
Originally Posted by PeteH View Post
You might like to try LINKDATAGEN (http://bioinf.wehi.edu.au/software/linkdatagen/#mps) to convert your VCF file into input files for linkage programs such as MERLIN (http://www.sph.umich.edu/csg/abecasis/merlin/index.html) and MORGAN (http://www.stat.washington.edu/thomp...N/Morgan.shtml).

This approach has been successful in identifying causative variants by linkage analysis of exome-sequencing data (Smith KR, Bromhead CJ, Hildebrand MS, Shearer AE, Lockhart PJ, Najmabadi H, Leventer RJ, McGillivray G, Amor DJ, Smith RJ, Bahlo M (2011). Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biology 12:R85). It should work even better for whole-genome sequencing data since your variants are not enriched for exonic regions.

Do these individuals form a single pedigree or are there unrelated individuals as well?
meher is offline   Reply With Quote
Old 10-11-2013, 03:19 PM   #5
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

Have you read and followed the instructions in the Quick Start Guide and Documentation? Note in particular:
  1. The VCF files used as input to the program are not the same VCFs that you originally generated from your WGS data.
  2. You must create new VCF files where each sample is genotyped at the HapMap positions provided in the annotation files.
  3. The annotation files assume that you've mapped to the hg19 reference

If you've followed the guide and are still having problems then posting a small subset of your VCF that reproduces that error will help in resolving your issue.

Pete
PeteH is offline   Reply With Quote
Old 10-11-2013, 10:22 PM   #6
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Okay. But, i have to mention that my genome is not human. The annotation file is the genotyped positions for the dog genome and the format is as shown in the previous post.

So, what is meant my new VCF file, Is it a subset having only the genotypes at the positions in the annotation file?

In my case the annotation file and VCF file are of same genome version, so i believe it will not be an issue for model organisms.


Quote:
Originally Posted by PeteH View Post
Have you read and followed the instructions in the Quick Start Guide and Documentation? Note in particular:
  1. The VCF files used as input to the program are not the same VCFs that you originally generated from your WGS data.
  2. You must create new VCF files where each sample is genotyped at the HapMap positions provided in the annotation files.
  3. The annotation files assume that you've mapped to the hg19 reference

If you've followed the guide and are still having problems then posting a small subset of your VCF that reproduces that error will help in resolving your issue.

Pete
meher is offline   Reply With Quote
Old 10-11-2013, 11:15 PM   #7
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

Ah, as the data are non-human this may require some extra work. Posting a small subset of your annotation and VCF file, along with the code used to generate it, will likely assist people to help you.

I've brought this thread to the attention of my colleagues who wrote LINKDATAGEN. They should be able to provide more specific advice.
Pete
PeteH is offline   Reply With Quote
Old 10-12-2013, 07:02 AM   #8
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Here are the few lines of the annotation file:

chr1 14548 BICF2G630707759 0
chr1 80040 BICF2P1173580 0
chr1 82626 BICF2G630707846 0
chr1 212740 BICF2P1383091 0

And the vcf file is generated from samtools and below is the command used:

vcf2linkdatagen.pl -annotfile annotfile.txt -pop CEU ‑mindepth 10 -missingness 0 -idlist MyVCFlist.txt > MySNPs.brlmm

Happy to provide further information if required.


Quote:
Originally Posted by PeteH View Post
Ah, as the data are non-human this may require some extra work. Posting a small subset of your annotation and VCF file, along with the code used to generate it, will likely assist people to help you.

I've brought this thread to the attention of my colleagues who wrote LINKDATAGEN. They should be able to provide more specific advice.
Pete
meher is offline   Reply With Quote
Old 10-17-2013, 04:45 PM   #9
mbahlo
Junior Member
 
Location: Melbourne

Join Date: Oct 2013
Posts: 1
Default

Dear Meher,

As Pete suggested using LINKDATAGEN will require a bit more work as you will have to "trick" it into using your data. However it should be doable if you reformat your vcf file into the same file format as we have for the example vcf file. If that doesn't work you could script up something that changes your vcf data into a brlmm style genotype call file. This is what the script vcf_to_linkdatagen does.
The other problems you have are of course the lack of a genetic map file but you could use the physical map locations and apply either the human rule of thumb mapping of 1 Mb= 1 cM to create pseudo genetic maps or have a look at some dog cross papers where you may be able to identify a more suitable rule of thumb (for example for mouse it is 6 Mb = 1 cM as they have a lower recombination rate). You would also need allele frequency information for the Lander-Green algorithm so I would suggest just using equal allele frequencies. The density of the data that you have will undo some of the "damage" of these assumptions.
You will need to create an annotation file using these ideas and then name it as one of the hapmap annotation files used by LINKDATAGEN. that way you can trick it into doing the dog genome data analysis - or rather doing a lot of QC and getting your files ready for all sorts of analyses including doing multipoint mapping with MERLIN.

Good luck with it & I hope this helps.

mbahlo
mbahlo is offline   Reply With Quote
Old 03-14-2016, 11:02 AM   #10
Samarpana
Member
 
Location: INDIA

Join Date: Dec 2013
Posts: 16
Default

Hey.. I am currently trying to use LINKDATAGEN and facing the same problem. How did you solve it?? My sample set if human though. So I can't find any reason why it is showing that error.

Any suggestion would be helpful

Thanks a lot
Samarpana is offline   Reply With Quote
Old 03-14-2016, 11:09 AM   #11
Samarpana
Member
 
Location: INDIA

Join Date: Dec 2013
Posts: 16
Default

Hey.. I am currently trying to use LINKDATAGEN and facing a frequent problem mentioned by many users. I emailed you my concern too but didn't receive any response. My sample set is human. And I used the following codes:

samtools mpileup -d10000 -q13 -Q13 -gf hg19.fa -l annotHapMap2U.txt samplex.bam | bcftools view -cg -t0.5 - > samplex.HM.vcf

Perl vcf2linkdatagen.pl -variantCaller mpileup -annotfile annotHapMap2U.txt -pop CEU -mindepth 10 -missingness 0 samplex.vcf > samplex.brlmm

The error is:
Use of uninitialized value $chr in concatenation (.) or string at vcf2linkdatagen.pl line 487, <IN> line 1.... to line 63964968

How to correct it? The same BAM file was used to generate vcf using GATK, so it can't be an issue with that. It's just not working with this tool.

Any suggestion would be appreciated.

Thanks a lot
Samarpana is offline   Reply With Quote
Reply

Tags
linkage, ngs

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO