SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
training GlimmerHMM pageskipro Bioinformatics 9 05-26-2015 12:52 AM
nrdb program poisson200 Bioinformatics 4 05-01-2013 01:17 AM
glimmerHMM parulvk Bioinformatics 3 07-01-2012 09:44 PM
GlimmerHMM output arkal Bioinformatics 0 09-19-2011 10:37 PM
which program(s) to use? maojn7488 General 1 10-01-2010 09:01 AM

Reply
 
Thread Tools
Old 04-07-2010, 11:49 PM   #1
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default GlimmerHmm program...

Does anyone has the experience using the trainGlimmerHMM of GlimmerHMM program?
I try to use the trainGlimmerHMM of GlimmerHMM program, I facing the problem to create the exon file for trainGlimmerHMM
Does anybody willing to share how do create the exon file for trainGlimmerHMM?
What other alternative tool is required to create the exon file?
Thanks for sharing.
edge is offline   Reply With Quote
Old 04-08-2010, 12:06 AM   #2
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default

We use GlimmerHMM, but we use the models which are provided. We do not have enough validated data to train the models ourselves.
strob is offline   Reply With Quote
Old 04-08-2010, 12:13 AM   #3
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

hi strob,
Based on what I understand about the trainGlimmerHMM, we needed to provide the multifasta sequence for training and its exon file.
I got try using the provided model.
Due to the limitation of model avaibility of GlimmerHMM, I feel that it seems like don't have suitable model to use for my case
eg. my query sequence is fungi genome. The available model is only arabidopsis, celegans, human, rice and zebrafish.
It seems like none of the above model is closely related to my fungi genome? Thus I plan to create own model file and use it for gene prediction.
Thanks for sharing your info.
edge is offline   Reply With Quote
Old 04-08-2010, 12:19 AM   #4
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default

Just make sure that your training set is quite large, preferably experimentally validated genes and as heterogeneous as possible (single exon vs. multi exon genes; long vs. small genes; ...), otherwise you will create a biased annotation.
strob is offline   Reply With Quote
Old 04-08-2010, 12:28 AM   #5
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Hi strob,
What I will do is using fungi genome 1 which is closely related to my query fungi genome for creating the training set. After then, I used the created training set for my GlimmerHMM and query fungi genome.
I think it will create better gene prediction. What do you think?
Besides that, if based on the available model file in GlimmerHMM (arabidopsis,celegans,human,rice and zebrafish) which model you will use for query fungi genome?
I will choose celegans.
Thanks again for your sharing.
edge is offline   Reply With Quote
Old 12-08-2011, 04:40 PM   #6
ckuanglim
Junior Member
 
Location: Malaysia

Join Date: Aug 2011
Posts: 7
Default

Hi edge,

I am trying to build a species specific model using trainGlimmerHMM. I think that my condition is similar with you. May I know whether you solve your problem? Do you mind to share your experience and pipeline?

Thank you.
ckuanglim is offline   Reply With Quote
Old 12-08-2011, 11:19 PM   #7
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Quote:
Originally Posted by ckuanglim View Post
Hi edge,

I am trying to build a species specific model using trainGlimmerHMM. I think that my condition is similar with you. May I know whether you solve your problem? Do you mind to share your experience and pipeline?

Thank you.
Hi ckuanglim,
Any thing that you need me share with you?
I'm generate the exon file based on the gene prediction result by other gene prediction program such as GeneMark and Augustus.
After then, I use it as an input file to train my own modules in GlimmerHMM.
edge is offline   Reply With Quote
Old 12-09-2011, 03:34 PM   #8
ckuanglim
Junior Member
 
Location: Malaysia

Join Date: Aug 2011
Posts: 7
Default

Hi edge,
In my case, I use splign to compare mRNA sequences with genomics sequences, then I get the nucleotide location and convert into exon file. But, I have some question about the format of exon file.
Does the nucleotide location include the stop codon?
Can we put partial fragment (without start or stop codon) in the exon file?
Do you have any documentation about exon file format?
Thanks.
ckuanglim is offline   Reply With Quote
Old 10-22-2012, 02:14 PM   #9
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default GlimmerHMM error 69

This instruction might be helpful for novice users:

I was trying to run the glimmerHMM for novel fungal genome. I created exon file using the Glimmer for microbes (http://www.ncbi.nlm.nih.gov/genomes/ MICROBES/glimmer_3.cgi) at NCBI. Since this is microbial prediction, I am trying to self-train the Genemark-ES predictor with my genome sequence.

Important thing to highlight is about exon file. If you see the exon file format on GlimmerHMM website:
seq1 5 15
seq1 20 34

seq1 50 48
seq1 45 36

Notice 2 things:
1. The ORFs predicted on different strands are separated by new line. I got errors when I did not separate ORFs on different strands.

2. The order is important. Leading strand the ORFs should be mentioned in ascending order while lagging strand ORFs should be in descending order.

After this I got message that "Training dataset is correctly created" however it was followed by Error 69 which says exited funny: 35584. I am not able to resolve this error but still I will try to use this training set for predicting the final genes.

Last edited by sagarutturkar; 10-22-2012 at 02:17 PM. Reason: Spelling mistake
sagarutturkar is offline   Reply With Quote
Old 10-22-2012, 05:32 PM   #10
ckuanglim
Junior Member
 
Location: Malaysia

Join Date: Aug 2011
Posts: 7
Default

Hi sagarutturkar,

How many genes in your exon file? Error 69 might cause by the limitation on the array.
ckuanglim is offline   Reply With Quote
Old 02-19-2013, 02:03 PM   #11
svj
Junior Member
 
Location: USA

Join Date: Jul 2012
Posts: 8
Default

Hi,

You need to write your code to sort them acc to specified format. I did that with my exon files. I have 100 files with exon co-ordinates and I used few files to train my model but for some weird reason, I am unable to train Glimmer. For now I switched to HMMgene but I am still working on Glimmerhmm. Let me know if you get your model trained. Thanks.
svj is offline   Reply With Quote
Old 11-07-2013, 01:48 AM   #12
r_sitaram
Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 10
Default

Hi sagarutturkar, So I tried training GlimmerHMM and follow your lead and arrived at the same error 69. I found that atleast in my case, the score.c script was not functioning and I kept getting "Segmentation fault : Core dumped". I'm in talks with the authors about it. But did you manage to solve it?
r_sitaram is offline   Reply With Quote
Old 05-18-2015, 02:22 AM   #13
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

Hi,

Were you able to resolve the "ERROR 69: segmentation fault"? I am facing the same problem.
ersgupta is offline   Reply With Quote
Old 05-26-2015, 06:12 AM   #14
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default Genemark

Hi,

I was not able to resolve some later errors with GlimmerHMM. I ran my genome with GeneMark-ES which was simpler option. http://exon.gatech.edu/GeneMark/

Thanks
Sagar
sagarutturkar is offline   Reply With Quote
Old 05-26-2015, 06:26 AM   #15
r_sitaram
Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 10
Default

Hello ersgupta, we were finally able to solve it by trying to edit the code and modifying the array size to accommodate the large genome size that we were dealing with. If I remember correctly, our error was coming in the score.c and score2.c files. Hope this helps.
r_sitaram is offline   Reply With Quote
Old 05-26-2015, 10:25 PM   #16
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

@sagarutturkar.. I will try that out. Thanks.

@r_sitaram... Thanks for the reply, I also found the same issue, had changed the array size in score.c, then it gave error in another file, then another file. After changing 2-3 files I stopped, as I wasn't 100% sure, what algorithmic changes it might cause. I hope your predictions were not affected by all these changes.
ersgupta is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO