Seqanswers Leaderboard Ad

**westerman** · 08-01-2014, 07:53 AM

I presume you are talking about this FAQ/Help:

(Species Home Page) Base Pairs (whole assembly)

The total number of base pairs for the entire assembly is the sum of all sequences in the dna table of the core database. It is available from the species-specific home page. This includes redundant regions such as haplotypic sequences and the pseudo-autosomal region (PAR) of the Y chromosome in human, and gaps in Drosophila melanogaster. See the assembly details of each species for more information.

(Species Home Page) Golden Path

The "golden path" is the length of the reference assembly. It consists of the sum of all top-level sequences in the seq_region table, omitting any redundant regions such as haplotypes and PARs.

Note that the information is coming from two different places -- the 'dna table' for base pairs, the 'seq_reqion table' for the golden path.

Golden paths are usually built up from layering sequence information onto a physical map. They can also be created via combining scaffolds together into a "best guess". In either case they are an approximation of what the real genome looks like -- which is what the 'base pairs' assembly tries to reflect. Even the 'base pairs' assembly can be wildly wrong if not enough of the genome has been sequenced. For poorly sequenced genome I would not find it surprising to find 'base pairs' to be less than 'golden path' since the golden path will have a lot of gaps between the known sequences while the 'base pairs' is a simple sum of base pair counts.

In the cod case Ensembl says that the genome is 0.9 GB while the golden path is 0.83 GB and the base pairs are 0.61 GB. So one or all of those numbers are incorrect.

Given that cod has only been sequenced to a depth of 25x I suspect that there is a lot of sequencing yet to be done and eventually that 'base pairs' number will be raised. Ensembl goes on to say "... Owing to the fragmentary nature of the Atlantic cod assembly ..." which just reinforces the fact that not all of the base pairs are known.

Hope this helps.

**jwag** · 08-01-2014, 09:01 AM

Thanks, that really helps. I've recently de novo assembled a fish genome, so I'm trying to compare how close others have come to assembling a complete genome (with respect to the theoretical genome size based on C value). I didn't realize that the "base pair" statistic doesn't include gaps from scaffolds, so it makes sense that (especially in a highly repetitive genome such as in fish) it would be much lower than the "golden path".

Thanks for your help!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Whole assembly vs Golden path length in Ensembl?

Comment

Comment

Latest Articles

ad_right_rmr

News