Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GFF3 to RefFlat for non UCSC genomes

    Dear all
    I need to convert GFF3 files to RefFlat; or to find how to generate RefFlat files. The reason for my question is that I need to run Picard, module CollectRnaSeqMetrics, that requires RefFlat files.

    I am working on Arabidopsis Thaliana, that is not in UCSC database. This means that I cannot use UCSC RefFlat files but I have to generate them.

    UCSC provides a program that seems to cope with my need: gff3ToGenePred (link: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gff3ToGenePred). In this case the program does not work with the available GFF3 for my genome, TAIR10 (this link for original GFF3 Tair10: ftp://ftp.arabidopsis.org/home/tair/...GFF3_genes.gff and ftp://ftp.arabidopsis.org/home/tair/...ransposons.gff) because of GFF3 errors: GFF3 files seem not standard GFF3.

    For this reason I am also trying to find a GFF3 validator but I only found online validators that do no work.

    Can anyone help me on creating a RefFlat file for genomes that are not in UCSC database?

    Thank you
    Andrea

  • #2
    Ok, problem solved! This is my way:

    1. download your GFF3 file from ftp://ftp.ensemblgenomes.org/pub/ (I found also TAIR10 files here)
    2. download gff3ToGenePred from UCSC web site
    3. run gff3ToGenePred to convert your file to GenePred file format (= refFlat)
    4. run Picard CollectRnaSeqMetrics with this reference refFlat file.

    Done ;-)

    Comment


    • #3
      Originally posted by aner View Post
      Ok, problem solved! This is my way:

      1. download your GFF3 file from ftp://ftp.ensemblgenomes.org/pub/ (I found also TAIR10 files here)
      2. download gff3ToGenePred from UCSC web site
      3. run gff3ToGenePred to convert your file to GenePred file format (= refFlat)
      4. run Picard CollectRnaSeqMetrics with this reference refFlat file.

      Done ;-)
      Thanks for letting everyone know, I was curious about that question. So, was that because of a difference between the TAIR GFF3 annotations from UCSC and the one from Ensembl?
      Last edited by steven; 08-19-2011, 01:22 AM. Reason: question

      Comment


      • #4
        was that because of a difference between the TAIR GFF3 annotations from UCSC and the one from Ensembl?
        No; I did not find GFF3 files of my genome (Arabidopsis Thaliana) into UCSC so I needed to download GFF3 files from another source. Using TAIR10 official web site, the processed GFF3 file (using UCSC gff3ToGenePred) returned errors (formatting errors). I tried to find a valid file before creating it myself, so that I could use that site for all future cases, and I found ensemblgenomes for this purpose.

        Comment


        • #5
          PS: I could run Picard (version 1.51) CollectRnaSeqMetrics but the option CHART did not work... so I have to understand why...

          Comment


          • #6
            Originally posted by aner View Post
            No; I did not find GFF3 files of my genome (Arabidopsis Thaliana) into UCSC so I needed to download GFF3 files from another source. Using TAIR10 official web site, the processed GFF3 file (using UCSC gff3ToGenePred) returned errors (formatting errors). I tried to find a valid file before creating it myself, so that I could use that site for all future cases, and I found ensemblgenomes for this purpose.
            Yes, right, I meant TAIR (no Arabidopsis in UCSC). So the GFF3 from Ensembl worked but not the original one from TAIR.. interesting.

            Comment


            • #7
              Help!!!

              Originally posted by aner View Post
              Dear all
              I need to convert GFF3 files to RefFlat; or to find how to generate RefFlat files. The reason for my question is that I need to run Picard, module CollectRnaSeqMetrics, that requires RefFlat files.

              I am working on Arabidopsis Thaliana, that is not in UCSC database. This means that I cannot use UCSC RefFlat files but I have to generate them.

              UCSC provides a program that seems to cope with my need: gff3ToGenePred (link: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gff3ToGenePred). In this case the program does not work with the available GFF3 for my genome, TAIR10 (this link for original GFF3 Tair10: ftp://ftp.arabidopsis.org/home/tair/...GFF3_genes.gff and ftp://ftp.arabidopsis.org/home/tair/...ransposons.gff) because of GFF3 errors: GFF3 files seem not standard GFF3.

              For this reason I am also trying to find a GFF3 validator but I only found online validators that do no work.

              Can anyone help me on creating a RefFlat file for genomes that are not in UCSC database?

              Thank you
              Andrea
              HELP!!! my research need me to do sth similar with "anner" or "steven".
              But I couldn't find gff3 file of arabidopsis on the ftp address "anner" give

              Comment


              • #8
                my email address is [email protected]
                Please help!!!
                I need proper .gff3 file of arabidopsis that could be recognized and worked out with
                gff3ToPred program.

                Comment


                • #9
                  hi Johnny
                  use this file:

                  ftp://ftp.ensemblgenomes.org/pub/rel...psis_thaliana/

                  then (since it is GTF), instead of gff3ToPred, use gtfToPred:



                  Tell me if this is working now.

                  Comment


                  • #10
                    I Love You, anner!!!

                    Originally posted by aner View Post
                    hi Johnny
                    use this file:

                    ftp://ftp.ensemblgenomes.org/pub/rel...psis_thaliana/

                    then (since it is GTF), instead of gff3ToPred, use gtfToPred:



                    Tell me if this is working now.
                    It's great, Saving me lots of time and energy.
                    I used to think that I have to change formats of gff3 file if using the gff3ToPred program... It's quite easy in your way.
                    Be my friend and let me know if there is anything I could help.

                    Comment


                    • #11
                      ;-) It is good to know that my suggestions led you to your results!! The real meaning of a network!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      45 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X