Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get exon (cds) annotation of a genome?

    Hi,

    I need annotation of exons-only (coding sequence region information) of the chicken genome (GallusGallus).
    How can I get it the most simplest way? I've tried to look into tables provided by the UCSC browser without much success (db=galGal4&hgta_group=allTables&hgta_track=galGal4&hgta_table=cds&hgta_regionType=genome&position=chr5%3A55031036-55105194&hgta_outputType=primaryTable&).

    The ucsc provides a table named cds. Does this mean the coding sequence information? however, I'm not able to understand the table. there is not extra information of which chromosome each line refers to or etc..

    I'd be grateful for any suggestions!

    Thanks.
    Inuk

  • #2
    What kind of form do you want it in exactly? Are you thinking a multiple fasta file of CDS genes? Or do you want a gff3/gtf of just CDS regions? Both are possible, and not that difficult. For a fasta file, you could just download it from Ensembl. The CDS only annotation file (gff3/gtf) would take a little manipulation, but it wouldn't be too hard.

    Comment


    • #3
      Thanks for the prompt reply Wallysb01!

      I figured that I shall work with exons instead of CDS, since UTR regions need to be considered for my analysis. I'd be happy to have it in fasta format. I think that I might have gotten the data from the ucsc browser but I'm unsure. it shows something like (single line)
      585 NM_001031401 chr1 + 77009 89017 79071 88071 15 77009,79071,80408,80609,81604,82298,83944,84189,84590,85139,85419,86072,86552,87495,87934, 77075,79155,80482,80739,81715,82353,84055,84300,84646,85209,85553,86180,86847,87567,89017, 0 HCLS1 cmpl cmpl -1,0,0,2,0,0,1,1,1,0,1,0,0,1,1,

      here, they say that the format is as follows:
      bin, name, chrom, txstart, txend, cdsstart, cdsend, exoncount, exonstarts, exonedns, score, name2, cdsstartstat, cdsendstat, exonframes.

      Does this mean that I can use the txstart/txend as the whole exon region? What exactly does txstart txend stand for? Is it transcription start end?
      Btw, the introns are filtered out.

      For my analysis, I'm looking for small RNA matches within the exon region.

      Regards.

      Comment


      • #4
        Hey,

        Does anyone know or have an annotation of hg19 with only ENTREZ gene IDs?

        Comment


        • #5
          NCBI provides the seq_gene.md file for the galGal4 assembly. The description of the columns is provided in the README. Is that acceptable?

          Comment


          • #6
            Originally posted by inukj View Post
            Hi,

            I need annotation of exons-only (coding sequence region information) of the chicken genome (GallusGallus).
            How can I get it the most simplest way? I've tried to look into tables provided by the UCSC browser without much success (db=galGal4&hgta_group=allTables&hgta_track=galGal4&hgta_table=cds&hgta_regionType=genome&position=chr5%3A55031036-55105194&hgta_outputType=primaryTable&).

            The ucsc provides a table named cds. Does this mean the coding sequence information? however, I'm not able to understand the table. there is not extra information of which chromosome each line refers to or etc..

            I'd be grateful for any suggestions!

            Thanks.
            Inuk
            I use Mutalyzer which provides webservice https://mutalyzer.nl/positionConverter

            Comment


            • #7
              Sorry, I was asking for human genome, hg19.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X