Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eukaryotic orf finder

    Hi All,

    I am looking for Eukaryotic orf finder algorithm/source code. I am trying to build training model for unknown eukaryotic genome using Glimmerhmm. I need collect orf's for the Glimmerhmm training model. So I did BLASTp against known eukaryotic protein sequences (closest neighbour to the unknown eukaryote) but am unable to build the training model with resultant orf's. The error I get after trainGlimmerhmm is:
    Training data created successfully! Check exons.dat and seqs for accuracy.


    Acceptor sites for training: 18292
    False acceptor sites for training: 853751
    Donor sites for training: 18219
    False donor sites for training: 672464


    ERROR 69: /GlimmerHMM/train/score exited funny: 35584


    If this process of building training model is right then can anyone help me with this situation. If not then what can I do to build training model? Should I look for acceptor and donor sites in the upstream and downstream of the orf's I got in blastp?

  • #2
    have you solved this problem

    Comment


    • #3
      I would like to know if anyone have solved the problem ?

      Thanks in advance,
      Hideaki

      Comment


      • #4
        Hi!!

        Did you manage to solve this problem??

        I am getting similar error:

        Simple Consensus = cgttgtggtggtgggggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtggtgg
        Markov Consensus = ggatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatgatg
        ******** Old Way = ctctgaggatgatgaggatgatgatgatgatgatgatgatgatgatgagatgatgatgatgatgatgatgatgatgatga
        Segmentation fault (core dumped)
        ERROR 69: /media/sdb1/genome_assembly/GlimmerHMM/train/score exited funny: 35584 at ./../trainGlimmerHMM line 445.


        And the log file:


        Code:
        more TrainGlimmM2014-08-18D15\:53\:24.log
            Training data created successfully! Check exons.dat and seqs for accuracy.
        
        
            Acceptor sites for training: 35581
            False acceptor sites for training: 412224
            Donor sites for training: 35572
            False donor sites for training: 410763
        The training files look like this
        Code:
        1. mfasta
        
            >supercontig_01
            GATCATACAAATCATCCCCTTGGCCTCTGTTAGCCTTCTGCGATCTATCGTGCTCGGAGCAGCTGCAAGC
            CCCGCCAAGTGACAATCCGAAACGGACTCAATAAGATTTGGCGTTGTCGACTTCATTTCAGTTCCGCCGA
            CCTTCCAGCTGCAGCTATCGACTGTCGAAGCCGACCCTCCACGAGTCAAACAGATTGGAAACGATAATAA
            ACCGATCTCCCGAGATAAGAATGGCGCTTTGGTCAAACATGAAGGCGTGAGTGAACACTCTGCTGACTTC
            ATGTAAGTGAGGAGAATATCGCTAAATGTGATACGGACATGACATTAGACTTGCAACAGAAAGAATAATA
            CATGCAGGTCCGAGATGAACAACGAGACAAACCTTGTGTGGTGCTCAACATAGTTTGCTAATAGAAACGT
            GATTGACCGTCACATGGCTCCTTGACTGTCTAGATACATCCGGCTGATCATACTTTGTTCTAGTGTATCC
            ATGACGGAGAAAAGTGCATTTATGATTTTTATGATCGATCTGTTGAATGCCAATAGGCACTTGCGGCTGG
            CCGGCGGAATTGGAAAGGAGCAGGTAGCACTCAACATCAGAGGTGTAACAACCAGCGAACCCATTCAACG
            TTGGAGTCATTTATTGTTTATCTCCGCTCTAGTTTCAGTTTCCTCTCGCGACTTGCTTGTTTGTATCTGA
            GTAAGCACCCGATAATAAAGTAGTTGTCATCACTGGCTTGAAAAATCAAACAATTACTCGCATCTCGCGA
            GAAAGAACAGACTGCTCGTAACAAGCAAGCAAACGCCAAGCTCTTATTCAGATAACATTACTGGATCCCC
            TTCTGCTATCTGATTTATTTAGTGACTGGTCCCGGGCCCGAAGCCGCCACCCTGTGCCACCTCATTTTAA
        
        
        2. exon file
        
            supercontig_01 678584 678745
            supercontig_01 678804 678855
            supercontig_01 678924 679629
            supercontig_01 679711 679801
        
            supercontig_01 681196 681196
            supercontig_01 681108 681102
            supercontig_01 680978 680798
            supercontig_01 680562 680452
            supercontig_01 680342 680256
        
            supercontig_01 683416 683414
            supercontig_01 683197 682953
            supercontig_01 682896 682791
            supercontig_01 682737 682599
            supercontig_01 682548 682162
            supercontig_01 682111 681695
            supercontig_01 681579 681549
            supercontig_01 681489 681408
            supercontig_01 681372 681265

        Thanks in advance!!
        Victoria
        --
        M. Victoria Aguilar Pontes
        PhD student, Fungal Physiology

        CBS-KNAW FUNGAL BIODIVERSITY CENTRE
        Institute of the Royal Netherlands Academy of Arts and Sciences(KNAW)
        Fungal Molecular Physiology, Utrecht University
        [email protected]

        Comment


        • #5
          Hi Victoria,

          What Linux OS did you use?
          I tried to run training on Ubuntu OS, but I failed.
          Then I tried to run on Cent OS and it worked.

          I am not sure the reason, but anyway I could manage to solved the problem.
          Once I succeeded to train, I can run glimmer with the trained files on Ubuntu OS.

          Cheers,
          Hideaki

          Comment


          • #6
            Hi Hi-koike,

            I am using a server running Ubuntu 12.04.5 LTS precise.

            I tried also train another dataset and after 4 days running I got the same error. Any ideas??

            Thanks in advance,
            Victoria
            --
            M. Victoria Aguilar Pontes
            PhD student, Fungal Physiology

            CBS-KNAW FUNGAL BIODIVERSITY CENTRE
            Institute of the Royal Netherlands Academy of Arts and Sciences(KNAW)
            Fungal Molecular Physiology, Utrecht University
            [email protected]

            Comment


            • #7
              Hi Victoria,

              Can you run a glimmer using already trained files?
              If you can, it might be the same problem I experienced.

              Can you get a computer to run Cent OS or RedHat OS ?

              I used an old computer formerly used for Windows computer.
              It is easy to install Cent OS and you can install glimmer on the
              Cent OS computer.

              You might need to get some libraries (I forgot the correct names,
              but you can find it by web-search using error message).
              In my case, I could run training on Cent OS within a day.

              Cheers,
              Hideaki

              Comment


              • #8
                Hi Hi-koike,

                I run trainGlimmer in our server (Ubuntu 12.04.5 LTS precise) with trained files and my own files and I have always got the same error (previous post).

                Now I am running train Glimmer in my computer which is also using Ubuntu 12.04.5 LTS precise but at least the trained files works. So now I am waiting to see the results for own files but this might take longer.

                As a backup plan, I am installing CentOS in the VBox just in case.

                Thank you very much for your help.

                Victoria
                --
                M. Victoria Aguilar Pontes
                PhD student, Fungal Physiology

                CBS-KNAW FUNGAL BIODIVERSITY CENTRE
                Institute of the Royal Netherlands Academy of Arts and Sciences(KNAW)
                Fungal Molecular Physiology, Utrecht University
                [email protected]

                Comment


                • #9
                  Hi Hi-koike,

                  As I said before I got trainglimmer running with the example data in Ubuntu 12.04.5 LTS precise, but my files crash. It is always the same error.

                  Now I am running the example file on Cent OS 7 and I got the same error. Do you remember which Cent OS did you use??

                  Thanks

                  Victoria
                  Last edited by MVictoria; 08-22-2014, 05:20 AM.
                  --
                  M. Victoria Aguilar Pontes
                  PhD student, Fungal Physiology

                  CBS-KNAW FUNGAL BIODIVERSITY CENTRE
                  Institute of the Royal Netherlands Academy of Arts and Sciences(KNAW)
                  Fungal Molecular Physiology, Utrecht University
                  [email protected]

                  Comment


                  • #10
                    Hi Victoria,

                    I am sorry to hear that you could not run on centOS neither.

                    I am not sure the version of centOS which I used, because I am traveling
                    abroad. It might be CentOS 6 because I installed in the April.

                    I have succeeded to run on two RedHat machines and one CentOS machine,
                    but I failed on two Ubuntu machines.

                    On 1 RedHat machine, I could not run because the machined did not
                    have installed libstdcc++.

                    If you got the same error, the problem might be different from mine.
                    I am very sorry that I cannot help.

                    Best regards,
                    Hideaki

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    57 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X