Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transcript location?

    Hello,
    Sorry for the newbie question but I fail to understand how to locate on the reference genome the transcripts of a gene found e.g. in NCBI Genbank.

    Here for instance the description of a transcript for IDH2:


    The IDH2 gene is 25499 bases long on chr15. The transcript is 1818 bases long. So how/where do I find the location(s) of this transcript in the gene?

    Thanks a lot for your help.

  • #2
    You can go to the "gene" record for this transcript which can be found in "related information" --> "gene" section (right column). http://www.ncbi.nlm.nih.gov/gene?Lin..._uid=588282795

    From there you can select to see the "Genbank" format entry where you will find the location of the transcript in the gene (excerpted below from Genbank record).
    Code:
    gene               2775..21273
                         /gene="IDH2"
                         /gene_synonym="D2HGA2; ICD-M; IDH; IDHM; IDP; IDPM;
                         mNADP-IDH"
                         /note="isocitrate dehydrogenase 2 (NADP+), mitochondrial;
                         Derived by automated computational analysis using gene
                         prediction method: BestRefSeq."
                         /db_xref="GeneID:3418"
                         /db_xref="HGNC:5383"
                         /db_xref="HPRD:00973"
                         /db_xref="MIM:147650"
         mRNA            join(2775..2975,13607..13698,14607..14772,16504..16664,
                         16749..16892,17676..17812,17988..18139,19864..19976,
                         20153..20250,20343..20435,20898..21273)
                         /gene="IDH2"
                         /gene_synonym="D2HGA2; ICD-M; IDH; IDHM; IDP; IDPM;
                         mNADP-IDH"
                         /product="isocitrate dehydrogenase 2 (NADP+),
                         mitochondrial"
                         /note="Derived by automated computational analysis using
                         gene prediction method: BestRefSeq."
                         /transcript_id="NM_002168.2"
                         /db_xref="GI:28178831"
                         /db_xref="GeneID:3418"
                         /db_xref="HGNC:5383"
                         /db_xref="HPRD:00973"
                         /db_xref="MIM:147650"
         CDS             join(2861..2975,13607..13698,14607..14772,16504..16664,
                         16749..16892,17676..17812,17988..18139,19864..19976,
                         20153..20250,20343..20435,20898..20985)
                         /gene="IDH2"
    To see the gene in genomic context, use the linkout to UCSC (found under "link to other resources" in gene record): http://genome.ucsc.edu/cgi-bin/hgTra...27211-90645786 or Ensembl: http://www.ensembl.org/Homo_sapiens/...26277-90645736
    Last edited by GenoMax; 03-18-2014, 05:10 PM.

    Comment


    • #3
      Hi GenoMax, thanks a lot for your reply. A few questions though:

      Originally posted by GenoMax View Post
      You can go to the "gene" record for this transcript which can be found in "related information" --> "gene" section (right column). http://www.ncbi.nlm.nih.gov/gene?Lin..._uid=588282795

      From there you can select to see the "Genbank" format entry where you will find the location of the transcript in the gene (excerpted below from Genbank record).
      Where is the link to GenBank? When I click on the link under the section "Genomic regions, transcripts, and products" it doesn't show the same genomic region for the gene as the one you pasted.

      Originally posted by GenoMax View Post
      Code:
      gene               2775..21273
                           /gene="IDH2"
                           /gene_synonym="D2HGA2; ICD-M; IDH; IDHM; IDP; IDPM;
                           mNADP-IDH"
                           /note="isocitrate dehydrogenase 2 (NADP+), mitochondrial;
                           Derived by automated computational analysis using gene
                           prediction method: BestRefSeq."
                           /db_xref="GeneID:3418"
                           /db_xref="HGNC:5383"
                           /db_xref="HPRD:00973"
                           /db_xref="MIM:147650"
           mRNA            join(2775..2975,13607..13698,14607..14772,16504..16664,
                           16749..16892,17676..17812,17988..18139,19864..19976,
                           20153..20250,20343..20435,20898..21273)
                           /gene="IDH2"
                           /gene_synonym="D2HGA2; ICD-M; IDH; IDHM; IDP; IDPM;
                           mNADP-IDH"
                           /product="isocitrate dehydrogenase 2 (NADP+),
                           mitochondrial"
                           /note="Derived by automated computational analysis using
                           gene prediction method: BestRefSeq."
                           /transcript_id="NM_002168.2"
                           /db_xref="GI:28178831"
                           /db_xref="GeneID:3418"
                           /db_xref="HGNC:5383"
                           /db_xref="HPRD:00973"
                           /db_xref="MIM:147650"
           CDS             join(2861..2975,13607..13698,14607..14772,16504..16664,
                           16749..16892,17676..17812,17988..18139,19864..19976,
                           20153..20250,20343..20435,20898..20985)
                           /gene="IDH2"
      Ok, so the length of the gene is here 21273-2775+1=18499 bp. The position of the gene (2775..21273) and transcripts is relative to the region NG_023302, right? I am looking now for the mapping to an absolute position of the form chr15:N1-N2.

      Originally posted by GenoMax View Post
      To see the gene in genomic context, use the linkout to UCSC (found under "link to other resources" in gene record): http://genome.ucsc.edu/cgi-bin/hgTra...27211-90645786
      Cool, but I see a region which is 90645786-90627211+1=18576 bp long.

      Originally posted by GenoMax View Post
      Here the region is 90645736-90626277+1=19460 bp long.

      So how do I know exactly where does the (2775..21273) map? Can't I get immediately the absolute position chr15:N1-N2 with the correct length?

      Comment


      • #4
        Originally posted by neofit View Post
        Where is the link to GenBank? When I click on the link under the section "Genomic regions, transcripts, and products" it doesn't show the same genomic region for the gene as the one you pasted.
        You need a couple of steps to get to the GenBank entry.

        You will have to scroll down the page a ways to get to the sections I am attaching screenshots for both steps. Click on the highlighted links.
        Attached Files

        Comment


        • #5
          Originally posted by neofit View Post
          Ok, so the length of the gene is here 21273-2775+1=18499 bp. The position of the gene (2775..21273) and transcripts is relative to the region NG_023302, right? I am looking now for the mapping to an absolute position of the form chr15:N1-N2.
          Based on the gene record the location of this gene for GRCh38 assembly is Chromosome15 - NC_000015.10 (90083978..90102476, complement)

          That said there is a warning about the RefSeqGene records in the attached image.

          The latest RefSeqGene record version is NG_023302.1 (note the .1) where as the one in the genome build is NG_023302.
          Attached Files

          Comment


          • #6
            Thanks, this is much clearer now. What is still don't get though is why the following sequence :
            ggcatgaggtagtaattggagtctccagtaagggtttgtttttccccagg....
            http://genome.ucsc.edu/cgi-bin/das/h...3389,208257979

            does not correspond to the one shown under "FASTA" in ncbi (see screenshot below), although the locations seem to be the same.

            I guess is has something to do with complement/reverse strand (?) but I don't manage to find how to convert one sequence to the other. Any hint?
            Attached Files

            Comment


            • #7
              Originally posted by neofit View Post
              Thanks, this is much clearer now. What is still don't get though is why the following sequence :
              ggcatgaggtagtaattggagtctccagtaagggtttgtttttccccagg....
              http://genome.ucsc.edu/cgi-bin/das/h...3389,208257979

              does not correspond to the one shown under "FASTA" in ncbi (see screenshot below), although the locations seem to be the same.

              I guess is has something to do with complement/reverse strand (?) but I don't manage to find how to convert one sequence to the other. Any hint?
              Are you sure this is the same genome build at UCSC and NCBI?

              If I take the sequence (I randomly selected a few hundred bases from the start) from UCSC record link you included it seems to be hitting the following location in GRCh38 at NCBI using blast:

              Code:
              Range 1: 207368665 to 207371264GenBankGraphics Next Match Previous Match
              Alignment statistics for match #1 Score	Expect	Identities	Gaps	Strand
              4802 bits(2600) 	0.0 	2600/2600(100%) 	0/2600(0%) 	Plus/Plus
              
              Query  1          GGCATGAGGTAGTAATTGGAGTCTCCAGTAAGGGTTTGTTTttccccagggctgttataa  60
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207368665  GGCATGAGGTAGTAATTGGAGTCTCCAGTAAGGGTTTGTTTTTCCCCAGGGCTGTTATAA  207368724
              
              Query  61         ccaattatctcaaatttggtggcttaaaataacagaaatgtattctcttgcagttctgga  120
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207368725  CCAATTATCTCAAATTTGGTGGCTTAAAATAACAGAAATGTATTCTCTTGCAGTTCTGGA  207368784
              
              Query  121        tattgtggaggaaaagtctgaaatcaaggtgttggcgtggccacactttctccaaaagtt  180
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207368785  TATTGTGGAGGAAAAGTCTGAAATCAAGGTGTTGGCGTGGCCACACTTTCTCCAAAAGTT  207368844
              
              Query  181        ccaggggaatattcttccttcactcttttggcttctagtggctccagctgcttcttggct  240
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207368845  CCAGGGGAATATTCTTCCTTCACTCTTTTGGCTTCTAGTGGCTCCAGCTGCTTCTTGGCT  207368904
              
              Query  241        tatggcagcctgactccaatgtctgcctctgtcttcacgtggccttctcgcagtgtagct  300
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207368905  TATGGCAGCCTGACTCCAATGTCTGCCTCTGTCTTCACGTGGCCTTCTCGCAGTGTAGCT  207368964
              
              Query  301        ctgtgtctcaaatctctttctcttttctcttataagaacaccagtcattggattaaggat  360
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207368965  CTGTGTCTCAAATCTCTTTCTCTTTTCTCTTATAAGAACACCAGTCATTGGATTAAGGAT  207369024
              
              Query  361        ctgccctaaatccacgatgacctcatcttgaaatcattaacttaattacatctattaaga  420
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207369025  CTGCCCTAAATCCACGATGACCTCATCTTGAAATCATTAACTTAATTACATCTATTAAGA  207369084
              
              Query  421        ccctctttccaaataagatcacattcacaggtaccagaggttaggacttagatgtatttt  480
                                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Sbjct  207369085  CCCTCTTTCCAAATAAGATCACATTCACAGGTACCAGAGGTTAGGACTTAGATGTATTTT  207369144
              But a blat search against the GRCh37 at UCSC seems to point to the location marked in your screenshot even though the FASTA header says GRCh38 assembly in that screenshot.

              Code:
                 ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN
              ---------------------------------------------------------------------------------------------------
              browser details YourSeq         2500     1  2500  2500 100.0%     2   +  208233389 208235888   2500
              browser details YourSeq          148    70   511  2500  79.1%    20   -   38193407  38193817    411
              A blat search against the GRCh38 at UCSC matches the blast results at NCBI.

              Code:
                 ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN
              ---------------------------------------------------------------------------------------------------
              browser details YourSeq         2500     1  2500  2500 100.0%     2   +  207368665 207371164   2500
              browser details YourSeq          139   111   507  2500  74.8%    12   +    6320367   6320758    392
              Something is odd here.
              Last edited by GenoMax; 03-19-2014, 03:19 PM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X