Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • thedamian
    Member
    • Feb 2012
    • 50

    UCSC refSeq Gene and hg19 coordinate

    Hello,
    I have a list of mRNA NM_ numers.
    In UCSC, hg19->refGene table, I can get exons and cds coordinates for every NM_.

    However, when I pull out a subsequence from hg19 based on refGene coordinates, the result seems to be not correct for reverse strand. Reverse complement of the pulled exons dosn't work as well.

    -------
    example:
    I have a: NM_012345.3
    From UCSC i know, that for NM_012345 the first CDS is beetwen 50000:50100, strand: "-", chr1
    Then I use:
    Code:
    samtools faidx /path/hg19.fa chr1:50000-50100
    The result doesn't start with ATG (and it should starts).


    Where is the problem? I know that UCSC doesn't use the version (NM_012345 instead of NM_012345.3) but it should work.

    (hg19 is downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.

    Comment

    • thedamian
      Member
      • Feb 2012
      • 50

      #3
      Originally posted by dpryan View Post
      NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.
      heh, it was an abstract example 012345 is like abcdef
      I test ~3000 genes in such way. 1600 works good, they are "+" strand.
      ~1400 are "-" and when I use samtools faidx, I can't get correct mRNA, CDS.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Ah, in the future, always give working examples

        Remember that anything on the "-" strand should end in ATG (actually, CAT), rather than start with it.

        Comment

        • thedamian
          Member
          • Feb 2012
          • 50

          #5
          ok, real example:
          I have gene IL10, NM_000572.2.
          Based on NM_000572 from UCSC I get:

          name: NM_000572
          chrom: chr1
          strand: -
          txStart: 206940947
          txEnd: 206945839
          cdsStart: 206941980
          cdsEnd: 206945780
          exonStarts: 206940947,206943173,206944251,206944700,206945615,
          exonEnds: 206942073,206943239,206944404,206944760,206945839,
          name2: IL10

          so first CDS is from 206941980 to 206942073

          then I use:
          Code:
          samtools faidx hg19.fa chr1:206941978-206942075
          ( I added +2 to each side because UCSC is 0-based, hg19 1-based)
          the output:
          GTCTCAGTTTCGTATCTTCATTGTCATGTAGGCTTCTATGTAGTTGATGAAGATGTCAAACTCACTCATGGCTTTGTAGATGCCTTTCTCTTGGAGCT

          no ATG, and TAC in here;/

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            It's on the '-' strand, so you're grabbing the end, rather than the beginning

            Comment

            • thedamian
              Member
              • Feb 2012
              • 50

              #7
              Originally posted by dpryan View Post
              It's on the '-' strand, so you're grabbing the end, rather than the beginning
              heh yes, I've just realised it.
              If starnd is "-", start codon is cdsEnd and end codon is cdsStart! Very confusing!
              + 1 to experience

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 10:09 AM
              0 responses
              9 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 08:59 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              24 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Working...