Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Virus reference annotation

    Hi,

    I am trying to assemble a virus transcriptome using TopHat/Cufflinks. The reference genome is http://www.ncbi.nlm.nih.gov/nuccore/EF999921, and I wonder how I can get the annotation file (gtf/gff).

    Thank you!
    Dawn

  • #2
    You could try this script to see if it will convert genbank format file to GFF: http://www.hpa-bioinformatics.org.uk...s/snippets/115

    Comment


    • #3
      > library(cummeRbund)
      载入需要的程辑包:BiocGenerics
      载入需要的程辑包:parallel

      载入程辑包:‘BiocGenerics’

      The following objects are masked from ‘packagearallel’:

      clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
      clusterExport, clusterMap, parApply, parCapply, parLapply,
      parLapplyLB, parRapply, parSapply, parSapplyLB

      The following object is masked from ‘package:stats’:

      xtabs

      The following objects are masked from ‘package:base’:

      anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
      do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
      is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
      pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
      rownames, sapply, setdiff, sort, table, tapply, union, unique,
      unlist, unsplit

      载入需要的程辑包:RSQLite
      载入需要的程辑包:DBI
      载入需要的程辑包:ggplot2
      载入需要的程辑包:reshape2
      载入需要的程辑包:fastcluster

      载入程辑包:‘fastcluster’

      The following object is masked from ‘package:stats’:

      hclust

      载入需要的程辑包:rtracklayer
      载入需要的程辑包:GenomicRanges
      载入需要的程辑包:S4Vectors
      载入需要的程辑包:stats4
      Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
      载入需要的程辑包:IRanges
      载入需要的程辑包:GenomeInfoDb
      载入需要的程辑包:Gviz
      载入需要的程辑包:grid

      载入程辑包:‘cummeRbund’

      The following object is masked from ‘package:GenomicRanges’:

      promoters

      The following object is masked from ‘package:IRanges’:

      promoters

      The following object is masked from ‘package:BiocGenerics’:

      conditions

      Comment


      • #4
        Hi thanks for the response. Well my question is NOT the file format transfer, but how to obtain the annotation file of the virus.

        Thank!
        Dawn

        Comment


        • #5
          @Dawn: I don't think your particular genome of interest is available in the list of viral genomes @NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/) so there is no ready GFF file. I was thus suggesting that you download the genbank format file for your accession from the link you had in your post and do the conversion to a GFF format file yourself.

          Comment


          • #6
            Originally posted by dawn1313 View Post
            The reference genome is http://www.ncbi.nlm.nih.gov/nuccore/EF999921, and I wonder how I can get the annotation file (gtf/gff).
            Code:
            wget http://togows.dbcls.jp/entry/nucleotide/EF999921.1.gff

            Comment


            • #7
              Thank you so much Piet and that's what I am looking for.

              Best,
              Dawn

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X