SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
gene location on UCSC vs NCBI nguyendofx Bioinformatics 2 01-28-2012 02:39 PM
NCBI GEO - There's gotta be a better way ETHANol Bioinformatics 1 04-06-2011 10:16 PM
Ensembl vs NCBI GERALD Bioinformatics 3 02-25-2011 04:37 AM
ncbi sra cburger Bioinformatics 0 02-02-2011 08:04 AM
How can I look up CDS from NCBI litc Bioinformatics 1 12-06-2010 01:03 PM

Reply
 
Thread Tools
Old 02-28-2011, 10:25 AM   #1
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default NCBI vs UCSC

Could someone point me in the right direction to find out if hg19 (UCSC) is exactly the same as GRCh37 (NCBI)? I thought they're the same but just found that dbSNP Build 132 is for GRCh37 and dbSNP Build 131 for gh19.
I used coordinates from hg19 to design probes to capture my genes but as a refseq I was going to use, already anntotated, seq from NCBI. My worry is that there may be some discrepancies between these two.
rudi283 is offline   Reply With Quote
Old 03-01-2011, 02:24 AM   #2
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

hg19 is the same as GRCh37, though as GRCh37 getting assembly patches from the GRC while the main chromosomes may not change some of the alternative haplotypes might not always be identical

Annotations laid on top of the assembly may not always be identical depending on the method use to place the annotation on the assembly
laura is offline   Reply With Quote
Old 03-01-2011, 03:07 AM   #3
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default

I'm confused even more now
Does that mean that the chr coordinates are not stable between these two (hg19 and GRCh37)?
As for the annotations I need to see only the genes/exons and probably SNPs if it will be possible
'Annotations laid on top of the assembly may not always be identical' -so the genes can be let say shifted?
rudi283 is offline   Reply With Quote
Old 03-01-2011, 04:10 AM   #4
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

The chromosomal coordinates should be exactly the same, hg19 is just UCSC's name for GRCh37

It means 2 different mapping programs may not give the same position for the same piece of dna. That being said if they getting their annotation from a central source e.g dbSNP or CCDS both sites should show coordinates which are the same as the central source
laura is offline   Reply With Quote
Old 03-01-2011, 04:33 AM   #5
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default

Thank you very much for the answer!
So could I use refseq from NCBI and annotate it with SNPs (131) from UCSC if the coordinates are the same anyway?
rudi283 is offline   Reply With Quote
Old 03-01-2011, 04:37 AM   #6
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

You should be able to do that but it might not be the best way.

What are you actually trying to do?

If you are looking for which cdnas overlap your snps you might be better looking at a tool like the ensembl variant effect predictor http://www.ensembl.org/tools.html.

If you are looking for which snps overlaps your cdnas of interest you are probably better using http://www.ensembl.org/biomart/martview/ or the UCSC table browser http://genome.ucsc.edu/cgi-bin/hgTab...a_doMainPage=1
laura is offline   Reply With Quote
Old 03-01-2011, 05:04 AM   #7
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default

I'm looking for the nucleotide changes in my samples - genes which I'm interested in and I would like to compare the results with SNPs which are already in databases.
Coordinates to design probes were taken from hg19 but because sequences from NCBI are already annotated I thought I could use them.
I would like to download a file with SNPs not only for coding part but for introns as well but it doesn't seem to be straightforward.
rudi283 is offline   Reply With Quote
Old 03-01-2011, 05:09 AM   #8
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

You might be better trying to look at NCBI's dbsnp vcf dumps to find all the snps of interest in a particular region then using something like the ensembl variant effect predictor to annotate their consequences

ftp://ftp.ncbi.nih.gov/snp/organisms...9606/VCF/v4.0/
laura is offline   Reply With Quote
Old 03-01-2011, 06:13 AM   #9
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default

I downloaded the vcf file and looks like there are more SNPs that I've got from UCSC which is great.
I try to compare information for one of the genes, to check how big the differences are between the databases. I'm quite confused as according the data in a 1000 genomes project looks like there is almost 200 more SNPs for that gene that in the vcf file But it's from the previous genome build so not sure how/if I could use it.
rudi283 is offline   Reply With Quote
Old 03-01-2011, 06:27 AM   #10
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

Which 1000 genomes vcf files are you looking at?

The main project 1000 genomes variants have not yet been submitted to dbSNP so not all of those 20100804 snps will be in dbSNP
laura is offline   Reply With Quote
Old 03-01-2011, 09:25 AM   #11
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default

You're right I was looking on the wrong thing.
So I guess will be ok if I annotate the refseq from NCBI with the SNPs from ftp://ftp.ncbi.nih.gov/snp/organisms...9606/VCF/v4.0/ ?
Thank you very much for you help!
rudi283 is offline   Reply With Quote
Old 03-01-2011, 09:33 AM   #12
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

That should be fine,

I do recommend looking at the ensembl variant effect predictor it links effects to ensembl ids which can be very easily linked to refseq ids when desired using biomart or the ensembl api
laura is offline   Reply With Quote
Old 03-02-2011, 04:53 AM   #13
rudi283
Member
 
Location: europe

Join Date: Sep 2010
Posts: 27
Default

I'll try.
I was wondering if you may know how could I convert the vcf file to gff/gtf format-I need to have the SNPs in this format to be able to annotate it on the refseq
rudi283 is offline   Reply With Quote
Old 03-02-2011, 04:56 AM   #14
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

I am sure there are converts that do exist but I don't know of any myself. I would suggest putting vcf to gff in google and seeing what comes out, you should only need the first 8 columns so it should be a fairly easy perl/python/awk script to write
laura is offline   Reply With Quote
Old 04-01-2011, 02:46 PM   #15
m_two
Member
 
Location: USA

Join Date: Mar 2010
Posts: 50
Default

There are a few minor differences between GRCh37 and hg19.

The random contig sequences are the same but the names are different.
Depending on the source of the sequence or annotation "1" may need to be converted to "chr1" and the PAR on chr Y may or may not be masked. In addition UCSC hg19 is currenly using the old mitochondrial sequence but NCBI and Ensembl have transitioned to NC_012920 the rCRS.

> http://genome.ucsc.edu/cgi-bin/hgGat...=Human&db=hg19
>
> Note on chrM
> Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as "chrM" in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome Browser. We plan to use the Revised Cambridge Reference Sequence (rCRS) in the next human assembly release.
m_two is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:21 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO