Hello, I have a question on GlimmerHMM.
Would you have a look at the description below to see if my understanding is right?
My organism is M. Chitwoodi.
Thank you in advance.
1. What I have as input data:
- contig.fasta(output of ABySS): 185,458 contigs
- cDNA.fasta(cDNA cluster from nematode.net): 5,880 genes
contig.fasta (output of ABySS, de novo assembler)
cDNA.fasta (M. Chitwoodi cDNA cluster from nematode.net)
2. There are two options to run GlimmerHMM:
2-1. glimmerhmm
2-2. trainGlimmerHMM
Would you have a look at the description below to see if my understanding is right?
My organism is M. Chitwoodi.
Thank you in advance.
1. What I have as input data:
- contig.fasta(output of ABySS): 185,458 contigs
- cDNA.fasta(cDNA cluster from nematode.net): 5,880 genes
contig.fasta (output of ABySS, de novo assembler)
> contig1
...
> contig2
...
> contig185458
...
...
> contig2
...
> contig185458
...
> MC1
...
> MC2
...
> MC5880
...
...
> MC2
...
> MC5880
...
2-1. glimmerhmm
Input: only one longest contig which is extracted from contig.fasta
If I run just "glimmerhmm", I do not need to use whole contig.fasta file.
Input sequence could be only the longest contig.
I can use built-in training directory (Celegans) to predict genes on the longest contig.
(+): easy, fast
(-): Result could be biased. Gene prediction can be done on only one contig.
If I run just "glimmerhmm", I do not need to use whole contig.fasta file.
Input sequence could be only the longest contig.
I can use built-in training directory (Celegans) to predict genes on the longest contig.
(+): easy, fast
(-): Result could be biased. Gene prediction can be done on only one contig.
Input: whole contig.fasta file, exon file
If I train whole contig file, this contig fasta file is used itself to be trained.
However, I need to create exon file.
Through alignment of one contig of contig.fasta and whole set of cDNA.fasta, find start and end site of exons.
Alignment can be done by blast or SIM4.
Repeat this 185,458 times.
Merge 185,458 exon files into one. (first column: contig ID, second column: start site, third column: end site)
Train contig.fasta file along with the exon file.
(+): reliable result, gene prediction on every contig
(-): too much time and computation when doing blast and creating exon file
If I train whole contig file, this contig fasta file is used itself to be trained.
However, I need to create exon file.
Through alignment of one contig of contig.fasta and whole set of cDNA.fasta, find start and end site of exons.
Alignment can be done by blast or SIM4.
Repeat this 185,458 times.
Merge 185,458 exon files into one. (first column: contig ID, second column: start site, third column: end site)
Train contig.fasta file along with the exon file.
(+): reliable result, gene prediction on every contig
(-): too much time and computation when doing blast and creating exon file
Comment