Hi everyone,
I was just wondering something that I would like to discuss with experts - or at least those that are more expert in that area than myself as a newbie. I was not quite sure into what category to put that thread, so I opened it up here...
So to my thoughts:
We have a single reference genome that we use for our genome analysis, e.g. as used by the 1,000 genomes project. This reference had been assembled from a few individuals. If I think about all the populational differences, e.g. regarding different regions on earth or even between families, I somehow get the feeling that using this single reference genome for analysis is not sufficient as it does not represent those population-specific differences. Could it not be that we miss important genetic variations or identify some that actually are not one within that population?
On the other hand, we have already access to population-specific variant data, e.g. from the 1,000 genomes project. So what comes to my mind now is: Why not using that data to construct a "customized" reference genome and use that for genome data analysis instead, i.e. using the general reference genome as basis and adapting it by particular genetic variants found in the majority of the data of a specific population? With that we could even create family-specific reference genomes, e.g. if we want to analyze a child's DNA by using its parents' DNA to customize the reference genome.
I am wondering if that would be beneficial for genome data analysis at all, and if so what could be the theoretical challenges in combining those data? I mean, it will probably not be done that easily by just checking the genetic variants of a population and applying them to the reference genome at that position...?
Would be great if we could discuss on that
Best,
Cindy
I was just wondering something that I would like to discuss with experts - or at least those that are more expert in that area than myself as a newbie. I was not quite sure into what category to put that thread, so I opened it up here...
So to my thoughts:
We have a single reference genome that we use for our genome analysis, e.g. as used by the 1,000 genomes project. This reference had been assembled from a few individuals. If I think about all the populational differences, e.g. regarding different regions on earth or even between families, I somehow get the feeling that using this single reference genome for analysis is not sufficient as it does not represent those population-specific differences. Could it not be that we miss important genetic variations or identify some that actually are not one within that population?
On the other hand, we have already access to population-specific variant data, e.g. from the 1,000 genomes project. So what comes to my mind now is: Why not using that data to construct a "customized" reference genome and use that for genome data analysis instead, i.e. using the general reference genome as basis and adapting it by particular genetic variants found in the majority of the data of a specific population? With that we could even create family-specific reference genomes, e.g. if we want to analyze a child's DNA by using its parents' DNA to customize the reference genome.
I am wondering if that would be beneficial for genome data analysis at all, and if so what could be the theoretical challenges in combining those data? I mean, it will probably not be done that easily by just checking the genetic variants of a population and applying them to the reference genome at that position...?
Would be great if we could discuss on that
Best,
Cindy