Seqanswers Leaderboard Ad

**chrchang** · 07-04-2014, 03:06 AM

Most PLINK operations, including the r^2 computation, don't actually care much about pedigree information. Just fill in "0" for the parental and maternal IDs and you'll be fine.

**Marius** · 07-04-2014, 03:43 AM

Great to know! So as it seems I will have to create 2 input files for PLINK, right (a .ped and a .map file)?

My current SNP data table looks like this:

Ind chrI_1673 chrI_1686 chrI_1733
ind.1_a G A C
ind.1_b G G G
ind.2_a G A C
ind.2_b G A C

So basically, each individual is represented by 2 rows, and each SNP is represented by a column. Importantly, the columns are not phased among each other!

I have found an example of a .ped-file:

HCB181 1 0 0 1 1 2 2 2 2
HCB182 1 0 0 1 1 2 2 1 2
HCB183 1 0 0 1 2 2 2 1 2
etc.

What I don't understand here are the columns 2-4? For the rest, I would guess that my sample from above would then look like this:

ind.1 ? ? ? G G A G C G
ind.2 ? ? ? G G A A C C

So my question: what are the columns 2-4? Also, I'm still struggeling with interpreting the .map file, for which I found an example here:

1 rs6681049 0 1
1 rs4074137 0 2
1 rs7540009 0 3
1 rs1891905 0 4

Thank you for your help!!

**Marius** · 07-04-2014, 06:10 AM

I did some more investigating and found the following:

.map-file
1 rs6681049 0 1
1 rs4074137 0 2
1 rs7540009 0 3
1 rs1891905 0 4

1st Column = chromosome
2nd Column = marker ID
3rd column = genetic distance
4th column = physical position

- What do I do if I don't have the genetic distance information? Do I just add a zero everywhere? I guess this information is not needed for R^2 calculation, right?
- I guess the physical position is not continuous, so it starts from 0 on on each chromosome?

.ped file

HCB181 1 0 0 1 1 2 2 2 2
HCB182 1 0 0 1 1 2 2 1 2
HCB183 1 0 0 1 2 2 2 1 2
etc.

1st column: Sample ID
2nd column: Paternal ID
3rd column: Maternal ID
4th column: Sex (1=male; 2=female; other=unknown)
5th column: Genotypes (space or tab separated, 2 for each marker. 0=missing)

What do I do if I miss the information for e.g. columns 2, 3, 4? Do I just fill up these columns with zeros, or can I just skip them? It seems that sometimes there are additional columns with information at the beginning of that file, such as 'affected'/'unaffected'. But people just seem not to add this if not used.

**chrchang** · 07-04-2014, 07:47 AM

Originally posted by Marius View Post

I did some more investigating and found the following:

.map-file
1 rs6681049 0 1
1 rs4074137 0 2
1 rs7540009 0 3
1 rs1891905 0 4

1st Column = chromosome
2nd Column = marker ID
3rd column = genetic distance
4th column = physical position

- What do I do if I don't have the genetic distance information? Do I just add a zero everywhere? I guess this information is not needed for R^2 calculation, right?
- I guess the physical position is not continuous, so it starts from 0 on on each chromosome?

.ped file

HCB181 1 0 0 1 1 2 2 2 2
HCB182 1 0 0 1 1 2 2 1 2
HCB183 1 0 0 1 2 2 2 1 2
etc.

1st column: Sample ID
2nd column: Paternal ID
3rd column: Maternal ID
4th column: Sex (1=male; 2=female; other=unknown)
5th column: Genotypes (space or tab separated, 2 for each marker. 0=missing)

What do I do if I miss the information for e.g. columns 2, 3, 4? Do I just fill up these columns with zeros, or can I just skip them? It seems that sometimes there are additional columns with information at the beginning of that file, such as 'affected'/'unaffected'. But people just seem not to add this if not used.

* It's safe to set all values in the .map "genetic distance" column to zero. Almost no commands actually use this information.
* As for the .ped, the first six columns are normally as follows:
1. Family ID
2. Individual ID
3. Parental ID (safe to set to '0' if unknown)
4. Maternal ID (safe to set to '0')
5. Sex (1 = male, 2 = female, 0 = unknown)
6. Phenotype (-9 if unknown)
(columns 7+ have genotype info)

You can just set both the family and individual IDs to the sample ID.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

LD along genome, R or PLINK?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News