Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UCSC refSeq Gene and hg19 coordinate

    Hello,
    I have a list of mRNA NM_ numers.
    In UCSC, hg19->refGene table, I can get exons and cds coordinates for every NM_.

    However, when I pull out a subsequence from hg19 based on refGene coordinates, the result seems to be not correct for reverse strand. Reverse complement of the pulled exons dosn't work as well.

    -------
    example:
    I have a: NM_012345.3
    From UCSC i know, that for NM_012345 the first CDS is beetwen 50000:50100, strand: "-", chr1
    Then I use:
    Code:
    samtools faidx /path/hg19.fa chr1:50000-50100
    The result doesn't start with ATG (and it should starts).


    Where is the problem? I know that UCSC doesn't use the version (NM_012345 instead of NM_012345.3) but it should work.

    (hg19 is downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)

  • #2
    NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.

    Comment


    • #3
      Originally posted by dpryan View Post
      NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.
      heh, it was an abstract example 012345 is like abcdef
      I test ~3000 genes in such way. 1600 works good, they are "+" strand.
      ~1400 are "-" and when I use samtools faidx, I can't get correct mRNA, CDS.

      Comment


      • #4
        Ah, in the future, always give working examples

        Remember that anything on the "-" strand should end in ATG (actually, CAT), rather than start with it.

        Comment


        • #5
          ok, real example:
          I have gene IL10, NM_000572.2.
          Based on NM_000572 from UCSC I get:

          name: NM_000572
          chrom: chr1
          strand: -
          txStart: 206940947
          txEnd: 206945839
          cdsStart: 206941980
          cdsEnd: 206945780
          exonStarts: 206940947,206943173,206944251,206944700,206945615,
          exonEnds: 206942073,206943239,206944404,206944760,206945839,
          name2: IL10

          so first CDS is from 206941980 to 206942073

          then I use:
          Code:
          samtools faidx hg19.fa chr1:206941978-206942075
          ( I added +2 to each side because UCSC is 0-based, hg19 1-based)
          the output:
          GTCTCAGTTTCGTATCTTCATTGTCATGTAGGCTTCTATGTAGTTGATGAAGATGTCAAACTCACTCATGGCTTTGTAGATGCCTTTCTCTTGGAGCT

          no ATG, and TAC in here;/

          Comment


          • #6
            It's on the '-' strand, so you're grabbing the end, rather than the beginning

            Comment


            • #7
              Originally posted by dpryan View Post
              It's on the '-' strand, so you're grabbing the end, rather than the beginning
              heh yes, I've just realised it.
              If starnd is "-", start codon is cdsEnd and end codon is cdsStart! Very confusing!
              + 1 to experience

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                Yesterday, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 07:17 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-29-2024, 10:49 AM
              0 responses
              29 views
              0 likes
              Last Post seqadmin  
              Working...
              X