Hey guys !
I'm actually working on a CNV project for the first time and need your help.
I mapped on a ref genome 44 Ilumina pack coming from 44 different individues. I have extract the number of read mapped on each genic feature and try to normalize all of this mess.
I applied the following formula, found in a publication :
NormalizedData = (Number of reads in feature * genome size) / (Total number of read in sample * read size)
They told then :
< 0.05 = Gene isnt present
< 0.5 = deletion
> 1.5 = duplication
What do you guys think of this technique ? I read a lot of possibilities on the net, but what do you usually do with this kind of data ? I heard a fiew words about LDA and tried it on R with MASS package but I got some issues ... And i'm not so good at statistics, so ..
I want to maximize the differences and "group" my values arround thresholds. But I think weak the way I actually do.
Thanks a lot
I'm actually working on a CNV project for the first time and need your help.
I mapped on a ref genome 44 Ilumina pack coming from 44 different individues. I have extract the number of read mapped on each genic feature and try to normalize all of this mess.
I applied the following formula, found in a publication :
NormalizedData = (Number of reads in feature * genome size) / (Total number of read in sample * read size)
They told then :
< 0.05 = Gene isnt present
< 0.5 = deletion
> 1.5 = duplication
What do you guys think of this technique ? I read a lot of possibilities on the net, but what do you usually do with this kind of data ? I heard a fiew words about LDA and tried it on R with MASS package but I got some issues ... And i'm not so good at statistics, so ..
I want to maximize the differences and "group" my values arround thresholds. But I think weak the way I actually do.
Thanks a lot
Comment