ootunaoo 01-07-2016 08:48 AM

Questions regarding how to compare 2 genomes
Hi Seqanswers,

I would like to ask several questions regarding how to compare 2 genomes in order to find differences: Assume I have 2 dataset of sequencing data from 2 plants of a same species (e.g. arabidopsis) - 1 plant has normal phenotype, the other has disease phenotype. Theoretically, the disease phenotype is known to be controlled by a single gene, and these 2 plants should have similar genome accept the region that responsible for the phenotype. I would want to somehow compare the 2 genomes to find out differences between 2 plants (in order to find the disease gene).

I'm a newbie in Bioinformatics (also newbie of Seqanswers), I do not know where to start. Would you mind providing me some guides to help me find an approach for my problem? - Projects or publications that have similar object; Documents, internet or books, that I should read; Maybe a suggested pipeline would be great...

Thank you for reading.

P/S: And I am sorry if my english is horrible.

GenoMax 01-07-2016 08:58 AM

The following tool may or may not work in your case but it may be worth taking a look: There is a paper linked at the site as well.

gsgs 01-09-2016 11:23 PM

mathematically speaking,

suppose you have the two mappings a:{1,..,n}-->{A,C,G,T} and b:{1,..,m}-->{A,C,G,T}
representing the two genomes.

pick L (e.g. L=16) and compute
f(x)=1, iff exists y such that a(x+i)=b(y+i) , i=0..L-1

this can quickly be computed by marking all values of b in a 4^L table.
[you may add the inverse complement of b() here]

then plot moving averages of f, the number of values in the averages being
approximately the length of the expected gene.

this gives an overview of the matching-quality by genome-region

you should see a "valley" in a nonmatching region

[is there a name for this function ?]

