Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use Gene Annotation Model in Tophat2

    I'm using Galaxy to analyze RNAseq results for differential gene expression in Drosophila mel.

    In tophat2, it says that I can supply a gene annotation model by uploading the file.

    Where do I get the file? Obviously, drosophila melanogastor has an annotated genome, but when I go to download it, I don't know what I'm looking for. There are so many files. (http://flybase.org/static_pages/docs/datafiles.html#)

    There's genes, insertions, alleles... is it all of them? or is there a complete file?

  • #2
    You get the annotations from a GTF file. An example file for fruit fly can be found here: ftp://ftp.ensembl.org/pub/release-77...a_melanogaster. You want to make sure that the GTF file corresponds to the genome build you are using for alignments.

    BTW: Are you using a custom local mirror of galaxy since I don't see an option to supply a GTF file at http://usegalaxy.org (PSU public galaxy).

    Comment


    • #3
      Thanks!

      Is there a way to import that from Biomart in Galaxy? I clicked through to drosophila, but then the options stop...

      Is it a custom local mirror of galaxy?
      Maybe? There are certain features that are different on the galaxy I use than the usegalaxy.org. And I have a specific login that is not an email.

      It's an option in Tophat2 to use a gene annotation model, but the tutorial I'm following doesn't say where they found one. It doesn't say which type of file it requires.

      Comment


      • #4
        BioMart does not export results as GTF files. You can probably do that using the UCSC Main table browser in Galaxy.

        See the explanation on how to use annotation with TopHat2 on this manual page: http://ccb.jhu.edu/software/tophat/manual.shtml (look for the --GTF option). TopHat uses a GFF3/GTF format file.

        Comment


        • #5
          Thanks for the help!

          I was able to use galaxy to get the annotation by:

          In galaxy, right clicking on UCSC main table (opening a new tab/directly clicking didn't do anything), and then putting in the info for dmel and sending it to galaxy.

          Comment


          • #6
            You want to be careful in making sure that the GTF data from UCSC is for the same genome build that you used for rest of the analysis. Otherwise annotations could be inaccurate.

            Comment


            • #7
              The reference genome is: dmel 2006 BDGP R5/dm3 (dm3)
              The UCSC assembly is dm3.

              I'm assuming those are the same? "dm3" being the base/build they're using for both?

              Comment


              • #8
                That is correct.

                I had mainly added that note for the benefit of others, who may find this thread by searching, as something to be aware of.

                Comment


                • #9
                  I had mainly added that note for the benefit of others, who may find this thread by searching, as something to be aware of.
                  That was actually why I included the info.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  37 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  35 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X