Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rudi283
    Member
    • Sep 2010
    • 27

    NCBI vs UCSC

    Could someone point me in the right direction to find out if hg19 (UCSC) is exactly the same as GRCh37 (NCBI)? I thought they're the same but just found that dbSNP Build 132 is for GRCh37 and dbSNP Build 131 for gh19.
    I used coordinates from hg19 to design probes to capture my genes but as a refseq I was going to use, already anntotated, seq from NCBI. My worry is that there may be some discrepancies between these two.
  • laura
    Senior Member
    • Sep 2008
    • 151

    #2
    hg19 is the same as GRCh37, though as GRCh37 getting assembly patches from the GRC while the main chromosomes may not change some of the alternative haplotypes might not always be identical

    Annotations laid on top of the assembly may not always be identical depending on the method use to place the annotation on the assembly

    Comment

    • rudi283
      Member
      • Sep 2010
      • 27

      #3
      I'm confused even more now
      Does that mean that the chr coordinates are not stable between these two (hg19 and GRCh37)?
      As for the annotations I need to see only the genes/exons and probably SNPs if it will be possible
      'Annotations laid on top of the assembly may not always be identical' -so the genes can be let say shifted?

      Comment

      • laura
        Senior Member
        • Sep 2008
        • 151

        #4
        The chromosomal coordinates should be exactly the same, hg19 is just UCSC's name for GRCh37

        It means 2 different mapping programs may not give the same position for the same piece of dna. That being said if they getting their annotation from a central source e.g dbSNP or CCDS both sites should show coordinates which are the same as the central source

        Comment

        • rudi283
          Member
          • Sep 2010
          • 27

          #5
          Thank you very much for the answer!
          So could I use refseq from NCBI and annotate it with SNPs (131) from UCSC if the coordinates are the same anyway?

          Comment

          • laura
            Senior Member
            • Sep 2008
            • 151

            #6
            You should be able to do that but it might not be the best way.

            What are you actually trying to do?

            If you are looking for which cdnas overlap your snps you might be better looking at a tool like the ensembl variant effect predictor http://www.ensembl.org/tools.html.

            If you are looking for which snps overlaps your cdnas of interest you are probably better using http://www.ensembl.org/biomart/martview/ or the UCSC table browser http://genome.ucsc.edu/cgi-bin/hgTab...a_doMainPage=1

            Comment

            • rudi283
              Member
              • Sep 2010
              • 27

              #7
              I'm looking for the nucleotide changes in my samples - genes which I'm interested in and I would like to compare the results with SNPs which are already in databases.
              Coordinates to design probes were taken from hg19 but because sequences from NCBI are already annotated I thought I could use them.
              I would like to download a file with SNPs not only for coding part but for introns as well but it doesn't seem to be straightforward.

              Comment

              • laura
                Senior Member
                • Sep 2008
                • 151

                #8
                You might be better trying to look at NCBI's dbsnp vcf dumps to find all the snps of interest in a particular region then using something like the ensembl variant effect predictor to annotate their consequences

                ftp://ftp.ncbi.nih.gov/snp/organisms...9606/VCF/v4.0/

                Comment

                • rudi283
                  Member
                  • Sep 2010
                  • 27

                  #9
                  I downloaded the vcf file and looks like there are more SNPs that I've got from UCSC which is great.
                  I try to compare information for one of the genes, to check how big the differences are between the databases. I'm quite confused as according the data in a 1000 genomes project looks like there is almost 200 more SNPs for that gene that in the vcf file But it's from the previous genome build so not sure how/if I could use it.

                  Comment

                  • laura
                    Senior Member
                    • Sep 2008
                    • 151

                    #10
                    Which 1000 genomes vcf files are you looking at?

                    The main project 1000 genomes variants have not yet been submitted to dbSNP so not all of those 20100804 snps will be in dbSNP

                    Comment

                    • rudi283
                      Member
                      • Sep 2010
                      • 27

                      #11
                      You're right I was looking on the wrong thing.
                      So I guess will be ok if I annotate the refseq from NCBI with the SNPs from ftp://ftp.ncbi.nih.gov/snp/organisms...9606/VCF/v4.0/ ?
                      Thank you very much for you help!

                      Comment

                      • laura
                        Senior Member
                        • Sep 2008
                        • 151

                        #12
                        That should be fine,

                        I do recommend looking at the ensembl variant effect predictor it links effects to ensembl ids which can be very easily linked to refseq ids when desired using biomart or the ensembl api

                        Comment

                        • rudi283
                          Member
                          • Sep 2010
                          • 27

                          #13
                          I'll try.
                          I was wondering if you may know how could I convert the vcf file to gff/gtf format-I need to have the SNPs in this format to be able to annotate it on the refseq

                          Comment

                          • laura
                            Senior Member
                            • Sep 2008
                            • 151

                            #14
                            I am sure there are converts that do exist but I don't know of any myself. I would suggest putting vcf to gff in google and seeing what comes out, you should only need the first 8 columns so it should be a fairly easy perl/python/awk script to write

                            Comment

                            • m_two
                              Member
                              • Mar 2010
                              • 50

                              #15
                              There are a few minor differences between GRCh37 and hg19.

                              The random contig sequences are the same but the names are different.
                              Depending on the source of the sequence or annotation "1" may need to be converted to "chr1" and the PAR on chr Y may or may not be masked. In addition UCSC hg19 is currenly using the old mitochondrial sequence but NCBI and Ensembl have transitioned to NC_012920 the rCRS.

                              > http://genome.ucsc.edu/cgi-bin/hgGat...=Human&db=hg19
                              >
                              > Note on chrM
                              > Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as "chrM" in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome Browser. We plan to use the Revised Cambridge Reference Sequence (rCRS) in the next human assembly release.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...