PDA

View Full Version : NCBI vs UCSC


rudi283
02-28-2011, 11:25 AM
Could someone point me in the right direction to find out if hg19 (UCSC) is exactly the same as GRCh37 (NCBI)? I thought they're the same but just found that dbSNP Build 132 is for GRCh37 and dbSNP Build 131 for gh19.
I used coordinates from hg19 to design probes to capture my genes but as a refseq I was going to use, already anntotated, seq from NCBI. My worry is that there may be some discrepancies between these two.

laura
03-01-2011, 03:24 AM
hg19 is the same as GRCh37, though as GRCh37 getting assembly patches from the GRC while the main chromosomes may not change some of the alternative haplotypes might not always be identical

Annotations laid on top of the assembly may not always be identical depending on the method use to place the annotation on the assembly

rudi283
03-01-2011, 04:07 AM
I'm confused even more now:(
Does that mean that the chr coordinates are not stable between these two (hg19 and GRCh37)?
As for the annotations I need to see only the genes/exons and probably SNPs if it will be possible
'Annotations laid on top of the assembly may not always be identical' -so the genes can be let say shifted?

laura
03-01-2011, 05:10 AM
The chromosomal coordinates should be exactly the same, hg19 is just UCSC's name for GRCh37

It means 2 different mapping programs may not give the same position for the same piece of dna. That being said if they getting their annotation from a central source e.g dbSNP or CCDS both sites should show coordinates which are the same as the central source

rudi283
03-01-2011, 05:33 AM
Thank you very much for the answer!
So could I use refseq from NCBI and annotate it with SNPs (131) from UCSC if the coordinates are the same anyway?

laura
03-01-2011, 05:37 AM
You should be able to do that but it might not be the best way.

What are you actually trying to do?

If you are looking for which cdnas overlap your snps you might be better looking at a tool like the ensembl variant effect predictor http://www.ensembl.org/tools.html.

If you are looking for which snps overlaps your cdnas of interest you are probably better using http://www.ensembl.org/biomart/martview/ or the UCSC table browser http://genome.ucsc.edu/cgi-bin/hgTables?org=Human&db=hg19&hgsid=187697957&hgta_doMainPage=1

rudi283
03-01-2011, 06:04 AM
I'm looking for the nucleotide changes in my samples - genes which I'm interested in and I would like to compare the results with SNPs which are already in databases.
Coordinates to design probes were taken from hg19 but because sequences from NCBI are already annotated I thought I could use them.
I would like to download a file with SNPs not only for coding part but for introns as well but it doesn't seem to be straightforward.

laura
03-01-2011, 06:09 AM
You might be better trying to look at NCBI's dbsnp vcf dumps to find all the snps of interest in a particular region then using something like the ensembl variant effect predictor to annotate their consequences

ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/v4.0/

rudi283
03-01-2011, 07:13 AM
I downloaded the vcf file and looks like there are more SNPs that I've got from UCSC which is great.
I try to compare information for one of the genes, to check how big the differences are between the databases. I'm quite confused as according the data in a 1000 genomes project looks like there is almost 200 more SNPs for that gene that in the vcf file:( But it's from the previous genome build so not sure how/if I could use it.

laura
03-01-2011, 07:27 AM
Which 1000 genomes vcf files are you looking at?

The main project 1000 genomes variants have not yet been submitted to dbSNP so not all of those 20100804 snps will be in dbSNP

rudi283
03-01-2011, 10:25 AM
You're right I was looking on the wrong thing.
So I guess will be ok if I annotate the refseq from NCBI with the SNPs from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/v4.0/ ?
Thank you very much for you help!

laura
03-01-2011, 10:33 AM
That should be fine,

I do recommend looking at the ensembl variant effect predictor it links effects to ensembl ids which can be very easily linked to refseq ids when desired using biomart or the ensembl api

rudi283
03-02-2011, 05:53 AM
I'll try.
I was wondering if you may know how could I convert the vcf file to gff/gtf format-I need to have the SNPs in this format to be able to annotate it on the refseq

laura
03-02-2011, 05:56 AM
I am sure there are converts that do exist but I don't know of any myself. I would suggest putting vcf to gff in google and seeing what comes out, you should only need the first 8 columns so it should be a fairly easy perl/python/awk script to write

m_two
04-01-2011, 03:46 PM
There are a few minor differences between GRCh37 and hg19.

The random contig sequences are the same but the names are different.
Depending on the source of the sequence or annotation "1" may need to be converted to "chr1" and the PAR on chr Y may or may not be masked. In addition UCSC hg19 is currenly using the old mitochondrial sequence but NCBI and Ensembl have transitioned to NC_012920 the rCRS.

> http://genome.ucsc.edu/cgi-bin/hgGateway?hgsid=187301261&clade=mammal&org=Human&db=hg19
>
> Note on chrM
> Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as "chrM" in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome Browser. We plan to use the Revised Cambridge Reference Sequence (rCRS) in the next human assembly release.