Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get exon (cds) annotation of a genome?

    Hi,

    I need annotation of exons-only (coding sequence region information) of the chicken genome (GallusGallus).
    How can I get it the most simplest way? I've tried to look into tables provided by the UCSC browser without much success (db=galGal4&hgta_group=allTables&hgta_track=galGal4&hgta_table=cds&hgta_regionType=genome&position=chr5%3A55031036-55105194&hgta_outputType=primaryTable&).

    The ucsc provides a table named cds. Does this mean the coding sequence information? however, I'm not able to understand the table. there is not extra information of which chromosome each line refers to or etc..

    I'd be grateful for any suggestions!

    Thanks.
    Inuk

  • #2
    What kind of form do you want it in exactly? Are you thinking a multiple fasta file of CDS genes? Or do you want a gff3/gtf of just CDS regions? Both are possible, and not that difficult. For a fasta file, you could just download it from Ensembl. The CDS only annotation file (gff3/gtf) would take a little manipulation, but it wouldn't be too hard.

    Comment


    • #3
      Thanks for the prompt reply Wallysb01!

      I figured that I shall work with exons instead of CDS, since UTR regions need to be considered for my analysis. I'd be happy to have it in fasta format. I think that I might have gotten the data from the ucsc browser but I'm unsure. it shows something like (single line)
      585 NM_001031401 chr1 + 77009 89017 79071 88071 15 77009,79071,80408,80609,81604,82298,83944,84189,84590,85139,85419,86072,86552,87495,87934, 77075,79155,80482,80739,81715,82353,84055,84300,84646,85209,85553,86180,86847,87567,89017, 0 HCLS1 cmpl cmpl -1,0,0,2,0,0,1,1,1,0,1,0,0,1,1,

      here, they say that the format is as follows:
      bin, name, chrom, txstart, txend, cdsstart, cdsend, exoncount, exonstarts, exonedns, score, name2, cdsstartstat, cdsendstat, exonframes.

      Does this mean that I can use the txstart/txend as the whole exon region? What exactly does txstart txend stand for? Is it transcription start end?
      Btw, the introns are filtered out.

      For my analysis, I'm looking for small RNA matches within the exon region.

      Regards.

      Comment


      • #4
        Hey,

        Does anyone know or have an annotation of hg19 with only ENTREZ gene IDs?

        Comment


        • #5
          NCBI provides the seq_gene.md file for the galGal4 assembly. The description of the columns is provided in the README. Is that acceptable?

          Comment


          • #6
            Originally posted by inukj View Post
            Hi,

            I need annotation of exons-only (coding sequence region information) of the chicken genome (GallusGallus).
            How can I get it the most simplest way? I've tried to look into tables provided by the UCSC browser without much success (db=galGal4&hgta_group=allTables&hgta_track=galGal4&hgta_table=cds&hgta_regionType=genome&position=chr5%3A55031036-55105194&hgta_outputType=primaryTable&).

            The ucsc provides a table named cds. Does this mean the coding sequence information? however, I'm not able to understand the table. there is not extra information of which chromosome each line refers to or etc..

            I'd be grateful for any suggestions!

            Thanks.
            Inuk
            I use Mutalyzer which provides webservice https://mutalyzer.nl/positionConverter

            Comment


            • #7
              Sorry, I was asking for human genome, hg19.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X