Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Downloading UCSC annotation files

    Hello,

    Do you know maybe how to download the UCSC annotation files (with genomes of Campylobacter jejuni, Campylobacter jejuni 81-176, Campylobacter jejuni RM1221) from UCSC browser?

  • #2
    I don't think the microbial UCSC genome browser allows downloads of data (like the regular UCSC browser does). You should check @NCBI (e.g. https://www.ncbi.nlm.nih.gov/genome/149).

    Comment


    • #3
      Dwonloading annotation files from UCSC browser?

      Thanks for quick answer.
      Do you know how to download the annotation files from regular UCSC browser?

      Comment


      • #4
        Download links for genomes are at: http://hgdownload.soe.ucsc.edu/downloads.html

        There are links for annotation database once you select an organism (above link only has eukaryotes and a few exceptional genomes but no bacteria).

        If you need GTF/GFF format then use the "Table browser" tool.

        Comment


        • #5
          Yikes. I actually didn't expect it to work ... but ...

          This script ~seems~ to work ...

          #!/bin/sh
          POSITION="chr:1-1641481"
          wget --progress=dot \
          'http://microbes.ucsc.edu/cgi-bin/hgTables?db=campJeju&hgta_compressType=none&'\
          'hgta_group=genes&hgta_outputType=gff&outGff=1&hgta_regionType=range&'\
          'hgta_table=refSeq&hgta_track=refSeq&org=Campylobacter&position='${POSITION}\
          '&submit=submit&hgta_doTopSubmit=1' \
          -O genscan.${POSITION}.gtf


          Hack the fields to get the table you want; i.e "db" which is the organism code, "hgta_table" for the annontation table, etc. Also, can probably make the positionrange really big rather than precise genome size.

          I can't vouch for the output ;check that it is doing the right thing,
          but it looks like this ...

          head genscan.chr\:1-1641481.gtf
          chr campJeju_refSeq start_codon 1 3 1.000000 + . gene_id "Cj0001"; transcript_id "Cj0001";
          chr campJeju_refSeq CDS 1 1320 1.000000 + 0 gene_id "Cj0001"; transcript_id "Cj0001";
          chr campJeju_refSeq stop_codon 1321 1323 1.000000 + . gene_id "Cj0001"; transcript_id "Cj0001";
          chr campJeju_refSeq exon 1 1323 1.000000 + . gene_id "Cj0001"; transcript_id "Cj0001";
          chr campJeju_refSeq start_codon 1483 1485 2.000000 + . gene_id "Cj0002"; transcript_id "Cj0002";
          chr campJeju_refSeq CDS 1483 2547 2.000000 + 0 gene_id "Cj0002"; transcript_id "Cj0002";
          chr campJeju_refSeq stop_codon 2548 2550 2.000000 + . gene_id "Cj0002"; transcript_id "Cj0002";
          chr campJeju_refSeq exon 1483 2550 2.000000 + . gene_id "Cj0002"; transcript_id "Cj0002";
          chr campJeju_refSeq start_codon 2579 2581 3.000000 + . gene_id "Cj0003"; transcript_id "Cj0003";
          chr campJeju_refSeq CDS 2579 4885 3.000000 + 0 gene_id "Cj0003"; transcript_id "Cj0003



          It looks like the separate microbial genomes site at UCSC uses the same "table browser" code as the regular UCSC site. There might be a way to use a mysql client to snag the data, too.
          Last edited by Richard Finney; 04-11-2016, 09:58 AM.

          Comment


          • #6
            Thanks so much!

            Comment


            • #7
              Wow, that's really great!!! Wonderful!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X