Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GlimmerHmm program...

    Does anyone has the experience using the trainGlimmerHMM of GlimmerHMM program?
    I try to use the trainGlimmerHMM of GlimmerHMM program, I facing the problem to create the exon file for trainGlimmerHMM
    Does anybody willing to share how do create the exon file for trainGlimmerHMM?
    What other alternative tool is required to create the exon file?
    Thanks for sharing.

  • #2
    We use GlimmerHMM, but we use the models which are provided. We do not have enough validated data to train the models ourselves.

    Comment


    • #3
      hi strob,
      Based on what I understand about the trainGlimmerHMM, we needed to provide the multifasta sequence for training and its exon file.
      I got try using the provided model.
      Due to the limitation of model avaibility of GlimmerHMM, I feel that it seems like don't have suitable model to use for my case
      eg. my query sequence is fungi genome. The available model is only arabidopsis, celegans, human, rice and zebrafish.
      It seems like none of the above model is closely related to my fungi genome? Thus I plan to create own model file and use it for gene prediction.
      Thanks for sharing your info.

      Comment


      • #4
        Just make sure that your training set is quite large, preferably experimentally validated genes and as heterogeneous as possible (single exon vs. multi exon genes; long vs. small genes; ...), otherwise you will create a biased annotation.

        Comment


        • #5
          Hi strob,
          What I will do is using fungi genome 1 which is closely related to my query fungi genome for creating the training set. After then, I used the created training set for my GlimmerHMM and query fungi genome.
          I think it will create better gene prediction. What do you think?
          Besides that, if based on the available model file in GlimmerHMM (arabidopsis,celegans,human,rice and zebrafish) which model you will use for query fungi genome?
          I will choose celegans.
          Thanks again for your sharing.

          Comment


          • #6
            Hi edge,

            I am trying to build a species specific model using trainGlimmerHMM. I think that my condition is similar with you. May I know whether you solve your problem? Do you mind to share your experience and pipeline?

            Thank you.

            Comment


            • #7
              Originally posted by ckuanglim View Post
              Hi edge,

              I am trying to build a species specific model using trainGlimmerHMM. I think that my condition is similar with you. May I know whether you solve your problem? Do you mind to share your experience and pipeline?

              Thank you.
              Hi ckuanglim,
              Any thing that you need me share with you?
              I'm generate the exon file based on the gene prediction result by other gene prediction program such as GeneMark and Augustus.
              After then, I use it as an input file to train my own modules in GlimmerHMM.

              Comment


              • #8
                Hi edge,
                In my case, I use splign to compare mRNA sequences with genomics sequences, then I get the nucleotide location and convert into exon file. But, I have some question about the format of exon file.
                Does the nucleotide location include the stop codon?
                Can we put partial fragment (without start or stop codon) in the exon file?
                Do you have any documentation about exon file format?
                Thanks.

                Comment


                • #9
                  GlimmerHMM error 69

                  This instruction might be helpful for novice users:

                  I was trying to run the glimmerHMM for novel fungal genome. I created exon file using the Glimmer for microbes (http://www.ncbi.nlm.nih.gov/genomes/ MICROBES/glimmer_3.cgi) at NCBI. Since this is microbial prediction, I am trying to self-train the Genemark-ES predictor with my genome sequence.

                  Important thing to highlight is about exon file. If you see the exon file format on GlimmerHMM website:
                  seq1 5 15
                  seq1 20 34

                  seq1 50 48
                  seq1 45 36

                  Notice 2 things:
                  1. The ORFs predicted on different strands are separated by new line. I got errors when I did not separate ORFs on different strands.

                  2. The order is important. Leading strand the ORFs should be mentioned in ascending order while lagging strand ORFs should be in descending order.

                  After this I got message that "Training dataset is correctly created" however it was followed by Error 69 which says exited funny: 35584. I am not able to resolve this error but still I will try to use this training set for predicting the final genes.
                  Last edited by sagarutturkar; 10-22-2012, 02:17 PM. Reason: Spelling mistake

                  Comment


                  • #10
                    Hi sagarutturkar,

                    How many genes in your exon file? Error 69 might cause by the limitation on the array.

                    Comment


                    • #11
                      Hi,

                      You need to write your code to sort them acc to specified format. I did that with my exon files. I have 100 files with exon co-ordinates and I used few files to train my model but for some weird reason, I am unable to train Glimmer. For now I switched to HMMgene but I am still working on Glimmerhmm. Let me know if you get your model trained. Thanks.

                      Comment


                      • #12
                        Hi sagarutturkar, So I tried training GlimmerHMM and follow your lead and arrived at the same error 69. I found that atleast in my case, the score.c script was not functioning and I kept getting "Segmentation fault : Core dumped". I'm in talks with the authors about it. But did you manage to solve it?

                        Comment


                        • #13
                          Hey Guys,
                          I got the same error message (Error 69) trying to train a coral genome. I have put the minimum genes i.e. 51 genes.
                          Did somebody solve this error ?
                          thanx
                          Didier

                          Comment


                          • #14
                            Hi everyone,
                            I tried also to train my coral genome with 51 genes but I got the same "Segmentation fault : Core dumped". Did somebody solve the "error 69" ?
                            Thanx
                            Didier

                            Comment


                            • #15
                              Hi,

                              Were you able to resolve the "ERROR 69: segmentation fault"? I am facing the same problem.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X