Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I go from a fasta and a chromosome to gtf/gff file?

    Hi all,

    I work on a single celled eukaryote whose genome has been sequenced. However the sequencing was not complete as there were several leftover scaffolds that contain more than a hunderd genes. Trying to get this information into cufflinks, I merged all these scaffolds toghether in one articial chromosomes with 4000N spacers in between scaffolds.

    Alignement works great but I would like to use a GTF of this 'chromosome' with the original gene models to see if the gene models were substantially altered by cufflinks RABT. Is there a way where I could go from the FASTA files of the gene models to a GTF file for the 'new' chromosome? I could write a script but a pre-existing solution would be great as I am not a perl expert.

  • #2
    I'm no Perl expert either, maybe a C-program ?

    but I don't understand your language with the acronyms

    Comment


    • #3
      I know even less about C than perl so writing it in C is not an option

      I just think this type of program should already exist as people doing annotation and such would also use this type of tool.

      Comment


      • #4
        that's probably right and I could have the program already
        or could easily create it with little changes from another program
        - if only I understood correctly the details

        Comment


        • #5
          Basically I need to get the orientation, start and stop of several small sequences (genes) contained in one big sequence (chromosome). I've run a BLAT to find these things but it looks like not every gene has only one hit and it does not look so trivial to find every exon-intron boundary and combine these into a transcript.
          Do you have such a program gsgs?

          Comment


          • #6
            no ... but I'm thinking how to write one.
            I still don't know what a GTF is.
            As I understand you have one big sequence and want to separate the
            exons and introns in it.
            You could even align it to other existing exons or introns
            but it's still difficult to find the correct areas ?
            Maybe because there are gaps ?

            with what did you align it ?
            with a known complete genome of a similar species ?
            Last edited by gsgs; 12-03-2012, 03:06 AM.

            Comment


            • #7
              Pretty standard stuff in genome annotation. Just a fixed format to describe the position of genes on a chromosome/genome

              Comment


              • #8
                can I just search for long substrings without stop codon ?
                6 reading frames with DNA

                Comment


                • #9
                  After not looking for a week I found the answer: GMAP has an option to output to GFF3 format. Just wanted to post it here in case anybody else encounters the same problem

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X