Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenBank files. Pairing mRNA with CDS.

    Hi,

    I have a GenBank file, that contains several mRNA and CDS.
    I'd like to pull from that file pairs of mRNA and CDS.

    For example:
    Having a .gb file for NF1 gene
    http://www.ncbi.nlm.nih.gov/nuccore/...report=genbank
    I know, that mRNA with ID NM_000267.3 has corresponding CDS with ID NP_000258.1

    I know it because a tag of mRNA: /product="neurofibromin 1, transcript variant 2"
    describes "/product" tag of CDS: /product="neurofibromin isoform 2"

    I use Perl's Bio::SeqIO for parsing .gb files. I can pull all main tags like mRNA and CDS, but I don't see any way how to combine them in pairs.

    Thanks for suggestions.

  • #2
    The pairing is implicit in GenBank format by virtue of the order of the features. The may be a BioPerl option to deduce this since BioPerl can IIRC do GenBank to GFF3 conversion.

    P.S. If you used GFF3 the parent/child relationship would be explicit - the NCBI have fixed their GFF3 since I wrote this: http://blastedbio.blogspot.co.uk/201...ll-broken.html

    Comment


    • #3
      heh, I don't see this order
      for example here:
      http://www.ncbi.nlm.nih.gov/nuccore/NG_017013.1

      mRNA with /transcript_id="NM_001126118.1" is the 5th mRNA from the top.
      Corresponding CDS is 9th from the top.

      Where is an order here?

      Comment


      • #4
        Originally posted by thedamian View Post
        heh, I don't see this order
        Oh, you've got multiple transcripts. That does complicate life

        Comment


        • #5
          I am not sure if this might help ...

          The gene2accession file from NCBI [ fetchable using wget like this ...
          wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
          and then ungzip ] contains the mrna and the protein acessions.

          Eg.
          grep NP_000258 gene2accession | grep GRCh37
          9606 4763 REVIEWED NM_000267.3 270132515 NP_000258.1 4557793 NC_000017.10 224589808 29421944 29704694 + Reference GRCh37.p9 Primary Assembly
          9606 4763 REVIEWED NM_000267.3 270132515 NP_000258.1 4557793 NT_010799.15 224514948 4158938 4441688 + Reference GRCh37.p9 Primary Assembly

          Comment


          • #6
            Thank you Richard!
            Seems it is what I need!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 11:49 AM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X