Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UCSC refSeq Gene and hg19 coordinate

    Hello,
    I have a list of mRNA NM_ numers.
    In UCSC, hg19->refGene table, I can get exons and cds coordinates for every NM_.

    However, when I pull out a subsequence from hg19 based on refGene coordinates, the result seems to be not correct for reverse strand. Reverse complement of the pulled exons dosn't work as well.

    -------
    example:
    I have a: NM_012345.3
    From UCSC i know, that for NM_012345 the first CDS is beetwen 50000:50100, strand: "-", chr1
    Then I use:
    Code:
    samtools faidx /path/hg19.fa chr1:50000-50100
    The result doesn't start with ATG (and it should starts).


    Where is the problem? I know that UCSC doesn't use the version (NM_012345 instead of NM_012345.3) but it should work.

    (hg19 is downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)

  • #2
    NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.

    Comment


    • #3
      Originally posted by dpryan View Post
      NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.
      heh, it was an abstract example 012345 is like abcdef
      I test ~3000 genes in such way. 1600 works good, they are "+" strand.
      ~1400 are "-" and when I use samtools faidx, I can't get correct mRNA, CDS.

      Comment


      • #4
        Ah, in the future, always give working examples

        Remember that anything on the "-" strand should end in ATG (actually, CAT), rather than start with it.

        Comment


        • #5
          ok, real example:
          I have gene IL10, NM_000572.2.
          Based on NM_000572 from UCSC I get:

          name: NM_000572
          chrom: chr1
          strand: -
          txStart: 206940947
          txEnd: 206945839
          cdsStart: 206941980
          cdsEnd: 206945780
          exonStarts: 206940947,206943173,206944251,206944700,206945615,
          exonEnds: 206942073,206943239,206944404,206944760,206945839,
          name2: IL10

          so first CDS is from 206941980 to 206942073

          then I use:
          Code:
          samtools faidx hg19.fa chr1:206941978-206942075
          ( I added +2 to each side because UCSC is 0-based, hg19 1-based)
          the output:
          GTCTCAGTTTCGTATCTTCATTGTCATGTAGGCTTCTATGTAGTTGATGAAGATGTCAAACTCACTCATGGCTTTGTAGATGCCTTTCTCTTGGAGCT

          no ATG, and TAC in here;/

          Comment


          • #6
            It's on the '-' strand, so you're grabbing the end, rather than the beginning

            Comment


            • #7
              Originally posted by dpryan View Post
              It's on the '-' strand, so you're grabbing the end, rather than the beginning
              heh yes, I've just realised it.
              If starnd is "-", start codon is cdsEnd and end codon is cdsStart! Very confusing!
              + 1 to experience

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X