I'm looking for downloadable data to make lists/tables/graphs
from mutations for better understanding the evolution
usually I download the big files from genbank ftp and then filter
for what I need.
currently I'm looking for human Y-chromosomes,
but these are large files, so I want to download a database
of compressed human whole y-chromosomes and a utility that creates
the chromosomes from the list in fasta-format
(alternatively a new compressed format, but it should be
well documented and easy and clearly,reasonably defined)
I already did this for human(+partially primate) mtDNA,
where they have ~16000 full mtDNAs at genbank,
which I can send, if someone is interested.
but I can't find good y-chr data
found this paper:
Online databases for mtDNA and Y chromosome polymorphisms in human populations
Alessandra Congiu1,2,§, Paolo Anagnostou1,4,§, Nicola Milia1,2,§, Marco
Capocasa3,4, Francesco Montinaro4,5 & Giovanni Destro Bisol
GenBank, European Nucleotide Archive and DNA Data Bank of Japan,
usually referred to as “primary databases”
PopSet makes downloading of population data easie
14 mtDNA online databases, three of which also contain Y chromosome data ´
(DNA-Fingerprint, Family Tree DNA and SMGF)
Of the 12 databases for which this information was obtainable, only six have been
updated in the course of 2012. A reference paper and online help is available only
for 10 databases. We were able to list 7 Y chromosome databases,
Only three databases were found to have been updated in 2012
Family Tree DNA is the largest archive for mtDNA sequences (mainly unpublished) both at
low (HVR-1 and II) and high resolution (com-plete mtDNA or coding region) (see Appendix
1A). Phylotree and mtDNA Community provide the largest wealth of published whole genome
sequences, with figures (14508 and 13492, respectively) not far from GenBank (16414).
The largest number of Y chromosome STR haplotypes is available in
Family Tree DNA (236302), Ysearch (112513) and YHRD databases (101055)
The former is also the greatest source of SNP/STR combined haplo-
types (62795). Data from scientific literature are used in YHRD. By contrast, US Y-STR database
seems to contains most, if not only, haplotypes submitted from forensic laboratories and institu-
tions. It is noteworthy that, unlike with mtDNA, GenBank does not give access to Y chromosome
population data in the haplotypic form.What databases make it possible to retrieve/sha
Unrestricted downloading is possible from 9 mtDNA databases, whereas three of them (Family
Tree DNA, DNA-Fingerprint and mtDNA man-ager) make it possible to retrieve only a part of the data
Data can be downloaded from only one Y chromosome database (Ysearch), whereas
another two allow a partial retrieval (Family Tree DNA and DNA-Fingerprint
Phylotree contains the largest number of complete mtDNA genomes,
A slightly lower number of mtDNA genomes is available in the recently published mtDNA
Community (679 not available in GenBank),
The number of sequences available in GenBank outnumbers these data-bases.
Unfortunately, retrieving data from the relevant papers or (for unpublished data) obtaining them
from corresponding authors is not always an easy task.
YHRD contains a large number of high quality data for both STR and SNP loci. However, it
cannot be directly accessed
GenBank was found to include a total 16,414 complete DNA sequences (Database accessed
on 20/09/2012).
from mutations for better understanding the evolution
usually I download the big files from genbank ftp and then filter
for what I need.
currently I'm looking for human Y-chromosomes,
but these are large files, so I want to download a database
of compressed human whole y-chromosomes and a utility that creates
the chromosomes from the list in fasta-format
(alternatively a new compressed format, but it should be
well documented and easy and clearly,reasonably defined)
I already did this for human(+partially primate) mtDNA,
where they have ~16000 full mtDNAs at genbank,
which I can send, if someone is interested.
but I can't find good y-chr data
found this paper:
Online databases for mtDNA and Y chromosome polymorphisms in human populations
Alessandra Congiu1,2,§, Paolo Anagnostou1,4,§, Nicola Milia1,2,§, Marco
Capocasa3,4, Francesco Montinaro4,5 & Giovanni Destro Bisol
GenBank, European Nucleotide Archive and DNA Data Bank of Japan,
usually referred to as “primary databases”
PopSet makes downloading of population data easie
14 mtDNA online databases, three of which also contain Y chromosome data ´
(DNA-Fingerprint, Family Tree DNA and SMGF)
Of the 12 databases for which this information was obtainable, only six have been
updated in the course of 2012. A reference paper and online help is available only
for 10 databases. We were able to list 7 Y chromosome databases,
Only three databases were found to have been updated in 2012
Family Tree DNA is the largest archive for mtDNA sequences (mainly unpublished) both at
low (HVR-1 and II) and high resolution (com-plete mtDNA or coding region) (see Appendix
1A). Phylotree and mtDNA Community provide the largest wealth of published whole genome
sequences, with figures (14508 and 13492, respectively) not far from GenBank (16414).
The largest number of Y chromosome STR haplotypes is available in
Family Tree DNA (236302), Ysearch (112513) and YHRD databases (101055)
The former is also the greatest source of SNP/STR combined haplo-
types (62795). Data from scientific literature are used in YHRD. By contrast, US Y-STR database
seems to contains most, if not only, haplotypes submitted from forensic laboratories and institu-
tions. It is noteworthy that, unlike with mtDNA, GenBank does not give access to Y chromosome
population data in the haplotypic form.What databases make it possible to retrieve/sha
Unrestricted downloading is possible from 9 mtDNA databases, whereas three of them (Family
Tree DNA, DNA-Fingerprint and mtDNA man-ager) make it possible to retrieve only a part of the data
Data can be downloaded from only one Y chromosome database (Ysearch), whereas
another two allow a partial retrieval (Family Tree DNA and DNA-Fingerprint
Phylotree contains the largest number of complete mtDNA genomes,
A slightly lower number of mtDNA genomes is available in the recently published mtDNA
Community (679 not available in GenBank),
The number of sequences available in GenBank outnumbers these data-bases.
Unfortunately, retrieving data from the relevant papers or (for unpublished data) obtaining them
from corresponding authors is not always an easy task.
YHRD contains a large number of high quality data for both STR and SNP loci. However, it
cannot be directly accessed
GenBank was found to include a total 16,414 complete DNA sequences (Database accessed
on 20/09/2012).