Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Build expanded genome for ERANGE

    Hi all,

    We're trying to use ERANGE for our human samples. The goal is to identify any possible new transcripts. We looked into TopHat and now want to try ERANGE.
    We got pretty confused by how to build the expanded genome for using ERANGE. How should we do this? And particularly, how/where could we obtain the spike sequence? Any suggestions, opinions would be highly appreciate!!!

    Many thanks in advance!

    Zhuzhu

  • #2
    I also have a question with the use of ERANGE. Where can I get the 'knownGene.txt' file mentioned in the README.build.rds file? I need to see its field definitions to format my gene annotation file for building RDS database (-RNA option). It seems 'knownGene.txt' is not in GFF format. Thank you.

    Comment


    • #3
      The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

      ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

      You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

      Comment


      • #4
        Originally posted by kmcarr View Post
        The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

        ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

        You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).
        Thank you, kmcarr. That helps a lot. I found the table schema of this file.

        Comment


        • #5
          spike sequences

          Hi,

          I am also wondering about the OP's question about spike sequences. They seem to be mentioned only in the ERANGE paper and online help files with no mention of what they actually are! Any ideas ?

          Thanks,

          Kasycas

          Comment


          • #6
            Hi All -

            I might be missing the point of Kasycas and Zhuzhu's questions, but I'll give it a shot anyway:

            If you added spiked-in standards to your sample, then you should be able to find out the sequences through any reference source. Only you can know what your spikes are. Mortazavi et al used in vitro synthesized RNA from Arabidopsis and phage.

            Comment


            • #7
              Hi all and kmcarr,

              Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

              Comment


              • #8
                Originally posted by elisa*_* View Post
                Hi all and kmcarr,

                Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.
                I created a knownGeneTable for TAIR8 starting with the GFF3 file from TAIR. Now whether you consider my method easy or hard depends on whether you are familiar/comfortable with BioPerl, specifically the Bio:B::SeqFeature module. This is the new preferred back end for GBrowse so I already had many of the components in place. I'll give you the outline but if it all sound Greek to you then I'm afraid I'll be no help at all.

                - Install the latest versions of BioPerl and MySQL.

                - Create an empty MySQL database to hold your annotation.

                - Load the database with the annotations from your GFF file using the bp_seqfeature_load.pl script (installed as part of BioPerl).

                - Use the attached script (changing the -dsn and -user parameters as needed) to query the newly created DB and output the knownGenesTable file.

                I know this seems like the long way around to get the file you want but as I said, once you have the database created it is useful for many projects.
                Attached Files

                Comment


                • #9
                  kmcarr,
                  Thank you very much for the info. I will try this.

                  Comment


                  • #10
                    Where to download annotation file for human genome

                    Hello,

                    I have sequence co-ordinates of ChIP-seq peaks. I am trying to map it so that I can know the nearest genes. However, I'm having difficulty in finding annotated file to download. I tried UCSC and also Refseq, but I can't find .gff files. There are so many files and I don't know which to download. I downloaded some of those but they don't have gene names. I just want a file that contains transcription start site, strand (+/-) and gene name for human genome. Can anybody please help me with this?

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X