SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
softwares - splicing cedance Bioinformatics 12 08-04-2011 09:28 AM
Is anybody developing some RNA-Seq softwares? fangquan RNA Sequencing 1 08-01-2011 10:57 AM
different softwares for whole-genome alignment? bio_tt Bioinformatics 4 05-08-2011 07:35 AM
Softwares for SNP ranking shum1 Bioinformatics 3 07-20-2010 11:23 PM
Graphical softwares! MoBi Bioinformatics 3 12-16-2009 12:38 AM

Reply
 
Thread Tools
Old 12-10-2011, 11:06 PM   #1
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default Haplotype Softwares

Hi,

I need to know about the haplotype construction softwares from genotype. which is the most accurate to be use? I encountered many like Haploview, Phase, SNPHAP but which one is the most accurate to be use?

Thanks in advance.
AsoBioInfo is offline   Reply With Quote
Old 12-11-2011, 12:15 PM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

A few more questions need to be answered before you can get a useful answer. Here are some I can think of: Do you have data for a full genome, or only a single gene region? Do you have relationship information available (e.g. trios, or a full genealogy)? Is the data restricted to dimorphic variants (e.g. SNPs)? Are you able to split the data into males and females (they have different recombination rates)?

I've used haploview for graphical demonstrations because it reports both r^2 and D', but I was working on single genes, and did the block structure analysis manually. I don't think Haploview was able to manage whole chromosomes at once a few years ago.

Just as a heads-up, there are problems (e.g. overlapping blocks, recombination holes, and sub-blocks) which mean you can't always define particular regions as distinct blocks.
gringer is offline   Reply With Quote
Old 12-11-2011, 09:18 PM   #3
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Hi,

Thnx for your reply!

Data is of a single gene region. We will be predicting haplotype from the genotypic data. Data is of SNP's only. Data is regardless of gender.

I tried to install Haploview but its giving an error message that it is too big to fit in memory. Our main purpose is TO CONSTRUCT HAPLOTYPES.

Thank You,
AsoBioInfo is offline   Reply With Quote
Old 12-12-2011, 01:53 AM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Looking for articles on Google scholar / pubmed, it doesn't look like there's been much more interest in Haplotype blocks since ~2005 (i.e. after SNPchips came out), so I'd just go with Haploview. But given that that doesn't work....

Your error suggests either running Haploview in an unexpected way, or a machine mismatch (e.g. Haploview expects a more modern OS). What are the specifications of the computer you are using? What version of Haploview are you using? If running Linux, the output of the following commands would be useful:
Code:
uname -a
java -version
java -jar <Haploview location>/haploview.jar
gringer is offline   Reply With Quote
Old 12-12-2011, 03:01 AM   #5
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Yes, thank you so much.

It starts working now. How can we create the the input files for them? I tried using the SNP's add-in for Excel 2007 but it is now giving some problems. I will input the files in phased haplotypes format. How will I prepare the input files?

Thanks!
AsoBioInfo is offline   Reply With Quote
Old 12-12-2011, 03:30 AM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

File formats accepted for Haploview can be found here:

http://www.broadinstitute.org/scienc...file-formats-0

I found the Linkage format to be the easiest to create from arbitrary SNP data formats. There's a bit more about the format here (PED is the same as Linkage):

http://pngu.mgh.harvard.edu/~purcell...data.shtml#ped

If using Linkage/Ped files, you'll also need an associated MAP file for the markers:

http://pngu.mgh.harvard.edu/~purcell...data.shtml#map

Plink is able to convert between PED files (one line per individual), transposed PED files (one line per marker), and long-format files (one line per individual/genotype pair), which should cover most of the types of text-based input data you get:

http://pngu.mgh.harvard.edu/~purcell.../data.shtml#tr
http://pngu.mgh.harvard.edu/~purcell...ata.shtml#long

Some pre-processing may be necessary to get your data into these formats, but it's not too difficult. However, if you're using Excel to create these files, you're probably going to end up getting tied in knots sometime down the track (your "giving some problems" comment is a pointer to this). I'd recommend you convert your .xls files to .csv, and spend a bit of time with R, Perl, or Python to do this.
gringer is offline   Reply With Quote
Old 12-12-2011, 04:14 AM   #7
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Informative!

Just one question, PED files can be easily edited in word pad. PED files with the Linkage format can only be created through Python or PERL or is there any alternative way?

Thx
AsoBioInfo is offline   Reply With Quote
Old 12-12-2011, 04:16 AM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

You can create them in wordpad if you're a particularly masochistic person. If you post a few lines of your input data, I can probably whip up a Perl script to do the right thing (or some R code, if you would prefer that).
gringer is offline   Reply With Quote
Old 12-12-2011, 04:33 AM   #9
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Actually I am in learning process of Perl and R. If I will write the scipts in WordPad and just store the file with the extension of .ped and .info, will it work?

I will also send you some lines of data so that you can send me the perl coding so that I will be able to construct haplotypes.
AsoBioInfo is offline   Reply With Quote
Old 12-12-2011, 04:40 AM   #10
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
If I will write the scipts in WordPad and just store the file with the extension of .ped and .info
This will work, as long as you keep with the conventions (i.e. space or tab-separated columns). The important thing to make sure about is that the first column is unique for each individual (assuming they're unrelated), and the next five columns are all numeric. If you don't have any other information about individuals, you can set the Individual ID to 1, and the remainder to 0. I prefer setting the phenotype column to 1, because it makes it easier to distinguish where the genotypes start if there are any unknowns:
Code:
IND001 1 0 0 0 1  A A  G G  A C
IND002 1 0 0 0 1  0 0  A G  A A
Haploview doesn't care about the file extensions, but using .ped and .map for your PED file and MAP file respectively is a good idea.
gringer is offline   Reply With Quote
Old 12-12-2011, 05:00 AM   #11
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Perfect!

I will just start my work now and if I encounter any problem then I will be looking forward for your help once again

Thank you so much.....
AsoBioInfo is offline   Reply With Quote
Old 12-21-2011, 12:59 AM   #12
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Hi again!

I have done with large data set but now I have set of data which is very small from which Haploview is unable to draw the Haplotypes and LD plots as the data set is small. I have tried Haplotype Inference technique also but failed.

I have also done through one algortihm as mentioned in the following link:
http://www.biorecipes.com/Haplotypes/code.html

I just want to be sure as if they are 60 samples or more but if only two values are present like 0 and 1, the possible haplotypes can only be (0,0),(0,1),(1,0),(1,1), in simple terms. Am I going in the right way? *-)
AsoBioInfo is offline   Reply With Quote
Old 12-21-2011, 01:12 AM   #13
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
I just want to be sure as if they are 60 samples or more but if only two values are present like 0 and 1, the possible haplotypes can only be (0,0),(0,1),(1,0),(1,1), in simple terms. Am I going in the right way? *-)
If there's a single variant in a diploid chromosome, there are only two possible haplotypes. Any haplotyping program will only consider a locus interesting if there are observed (or inferred) [substantially] fewer than the expected number of haplotypes.

You can force Haploview to display all common haplotypes within a region using a custom block by dragging (in the LD view) across the markers of interest. The definition of common can be adjusted in the haplotype parameters (something like minimum haplotype frequency).

Last edited by gringer; 12-21-2011 at 03:40 AM. Reason: Whoops, only 2 possible haplotypes at a single dimorphic locus (not 4)
gringer is offline   Reply With Quote
Old 12-21-2011, 01:29 AM   #14
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Sounds fine. Because my data is somewhat look like this: (only one pair)

Sample1 11
Sample2 00
Sample3 01

The custom dragging in the LD Plot is not possible as the LD plot is not created (data set is too small). I have done this custom dragging with other samples available.
AsoBioInfo is offline   Reply With Quote
Old 12-21-2011, 01:34 AM   #15
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
The custom dragging in the LD Plot is not possible as the LD plot is not created (data set is too small). I have done this custom dragging with other samples available.
Then you probably have markers that have been flagged to be ignored for some reason. Go to the marker tab and see if there are any red fields for those markers (this will explain why the marker was excluded. Check the 'rating' box for your markers, and they should be included and appear in the LD plot.
gringer is offline   Reply With Quote
Old 12-21-2011, 02:58 AM   #16
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

The check for the rating is already included. The marker is only one (because the pair/column is one)may be, that's the reason for not showing LD - just a guess
AsoBioInfo is offline   Reply With Quote
Old 12-21-2011, 02:59 AM   #17
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Yes, of course. For some reason I'd forgotten that you had only one marker -- no other markers to compare with, so no LD checks are possible (LD is a pairwise measure).
gringer is offline   Reply With Quote
Old 12-21-2011, 03:26 AM   #18
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Thanks!

Hmmm... So the haplotype possibilities will be (0,0),(0,1),(1,0),(1,1). Right?
AsoBioInfo is offline   Reply With Quote
Old 12-21-2011, 03:38 AM   #19
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Hmmm... So the haplotype possibilities will be (0,0),(0,1),(1,0),(1,1). Right?
Those are the genotype possibilities for a single polymorphism at a diploid locus, with alleles coded as 1/0.

Upon further reflection, my previous answer of four was incorrect (which I've now changed to reduce confusion from other readers)... there are only two possible haplotypes (ways of describing the sequence on a single chromosome) at such a locus: 1 and 0. You need multiple markers to get non-trivial haplotypes.

Last edited by gringer; 12-21-2011 at 03:41 AM.
gringer is offline   Reply With Quote
Old 12-28-2011, 08:48 AM   #20
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default

Hi gringer,

I tried to use Haploview as well, in Linkage Fromat, I loaded PED file to the Data File, and MAP file to Locus Information File, it gave the error message:

"File Error: Individual 6 in family L3 appears more than once".

I have 4 sisters all in the same family L3, the first 6 columns are same, I don't if this error happened due to that?

Thanks for the help.
emilyjia2000 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO