Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Build expanded genome for ERANGE

    Hi all,

    We're trying to use ERANGE for our human samples. The goal is to identify any possible new transcripts. We looked into TopHat and now want to try ERANGE.
    We got pretty confused by how to build the expanded genome for using ERANGE. How should we do this? And particularly, how/where could we obtain the spike sequence? Any suggestions, opinions would be highly appreciate!!!

    Many thanks in advance!

    Zhuzhu

  • #2
    I also have a question with the use of ERANGE. Where can I get the 'knownGene.txt' file mentioned in the README.build.rds file? I need to see its field definitions to format my gene annotation file for building RDS database (-RNA option). It seems 'knownGene.txt' is not in GFF format. Thank you.

    Comment


    • #3
      The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

      ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

      You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

      Comment


      • #4
        Originally posted by kmcarr View Post
        The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

        ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

        You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).
        Thank you, kmcarr. That helps a lot. I found the table schema of this file.

        Comment


        • #5
          spike sequences

          Hi,

          I am also wondering about the OP's question about spike sequences. They seem to be mentioned only in the ERANGE paper and online help files with no mention of what they actually are! Any ideas ?

          Thanks,

          Kasycas

          Comment


          • #6
            Hi All -

            I might be missing the point of Kasycas and Zhuzhu's questions, but I'll give it a shot anyway:

            If you added spiked-in standards to your sample, then you should be able to find out the sequences through any reference source. Only you can know what your spikes are. Mortazavi et al used in vitro synthesized RNA from Arabidopsis and phage.

            Comment


            • #7
              Hi all and kmcarr,

              Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

              Comment


              • #8
                Originally posted by elisa*_* View Post
                Hi all and kmcarr,

                Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.
                I created a knownGeneTable for TAIR8 starting with the GFF3 file from TAIR. Now whether you consider my method easy or hard depends on whether you are familiar/comfortable with BioPerl, specifically the Bio:B::SeqFeature module. This is the new preferred back end for GBrowse so I already had many of the components in place. I'll give you the outline but if it all sound Greek to you then I'm afraid I'll be no help at all.

                - Install the latest versions of BioPerl and MySQL.

                - Create an empty MySQL database to hold your annotation.

                - Load the database with the annotations from your GFF file using the bp_seqfeature_load.pl script (installed as part of BioPerl).

                - Use the attached script (changing the -dsn and -user parameters as needed) to query the newly created DB and output the knownGenesTable file.

                I know this seems like the long way around to get the file you want but as I said, once you have the database created it is useful for many projects.
                Attached Files

                Comment


                • #9
                  kmcarr,
                  Thank you very much for the info. I will try this.

                  Comment


                  • #10
                    Where to download annotation file for human genome

                    Hello,

                    I have sequence co-ordinates of ChIP-seq peaks. I am trying to map it so that I can know the nearest genes. However, I'm having difficulty in finding annotated file to download. I tried UCSC and also Refseq, but I can't find .gff files. There are so many files and I don't know which to download. I downloaded some of those but they don't have gene names. I just want a file that contains transcription start site, strand (+/-) and gene name for human genome. Can anybody please help me with this?

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    23 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X