This has been puzzling me for a while and I haven't been able to find a good answer for this. On the Ensembl website (ensembl.org/) you can find reference genomes for various species. What confuses me is they have a "base pairs" assembly statistic and a "golden path length" statistic.
After reading the FAQ/glossary, I get that the "base pairs" statistic is the whole assembly with redundant regions and haplotypes, and "golden path length" is the reference without these regions, but what I've found is that often the "base pairs" statistic is much shorter than the "golden path length". For example, the Cod assembly "base pair" stat is 608 Mb, and the "golden path length" is 832 MB. How is that possible? Is anyone familiar with this database? Thanks much.
After reading the FAQ/glossary, I get that the "base pairs" statistic is the whole assembly with redundant regions and haplotypes, and "golden path length" is the reference without these regions, but what I've found is that often the "base pairs" statistic is much shorter than the "golden path length". For example, the Cod assembly "base pair" stat is 608 Mb, and the "golden path length" is 832 MB. How is that possible? Is anyone familiar with this database? Thanks much.
Comment