Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GFT file for rat

    I want to run tophat for rat samples. Where do I download the gtf file from?
    thanks

  • #2
    You can find it here:



    --
    Phillip

    Comment


    • #3
      the same is much bigger than the one from ucsc, why?

      Comment


      • #4
        Originally posted by HSV-1 View Post
        the same is much bigger than the one from ucsc, why?
        How did you get your one from UCSC? If you make a RefGene based GTF from TableBrowser, it only includes coding features. The pre-built GTF from Ensembl includes all coding and non-coding features. Plus the actual annotations are longer text strings (all the Ensembl accessions for gene ID, exon ID, transcript ID, name, biotype,...) so in raw text the Ensembl file will be larger.

        Also note that the UCSC file uses the notation "chr1", etc while the fist column in the Ensembl will just be "1" etc (some software will expect the prefix "chr").
        Michael Black, Ph.D.
        ScitoVation LLC. RTP, N.C.

        Comment


        • #5
          This is probably the reason.
          How to fix?
          From the same sequence data with ensemble gft I should get more accepted hits by tophat .



          Originally posted by mbblack View Post
          How did you get your one from UCSC? If you make a RefGene based GTF from TableBrowser, it only includes coding features. The pre-built GTF from Ensembl includes all coding and non-coding features. Plus the actual annotations are longer text strings (all the Ensembl accessions for gene ID, exon ID, transcript ID, name, biotype,...) so in raw text the Ensembl file will be larger.

          Also note that the UCSC file uses the notation "chr1", etc while the fist column in the Ensembl will just be "1" etc (some software will expect the prefix "chr").

          Comment


          • #6
            Originally posted by HSV-1 View Post
            From the same sequence data with ensemble gft I should get more accepted hits by tophat .
            No, not for a reasonably mature genome such as the Rat. Ensembl's build may include a handful of novel and/or predicted coding genes, but not many. Ensembl Rat rel. 66.34 had 22,938 coding genes, 22,921 of which were known and have Refseq annotation (I only know this as I'm writing up data that used 66.34 as the reference - you would have to look on Ensembl's web site for the stats for the current release).

            The annotation really should not have any significant affect on your summarized mapping results for a mature feature set like the Rat - it would only matter if there were a large number of novel, unknown or predicted genes in one annotation versus another, or if the splice boundaries of the annotation features were still largely undetermined. But once summarized by gene, your mapped count data should be unaffected given the genome build is fairly well characterized and stable at this point.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X