View Single Post
Old 01-09-2016, 11:23 PM   #3
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

mathematically speaking,

suppose you have the two mappings a:{1,..,n}-->{A,C,G,T} and b:{1,..,m}-->{A,C,G,T}
representing the two genomes.

pick L (e.g. L=16) and compute
f:{1,..,n-L}-->{0,1}
with
f(x)=1, iff exists y such that a(x+i)=b(y+i) , i=0..L-1

this can quickly be computed by marking all values of b in a 4^L table.
[you may add the inverse complement of b() here]

then plot moving averages of f, the number of values in the averages being
approximately the length of the expected gene.

this gives an overview of the matching-quality by genome-region

you should see a "valley" in a nonmatching region


[is there a name for this function ?]
gsgs is offline   Reply With Quote