![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CEGMA error | flobpf | Bioinformatics | 12 | 01-27-2015 06:54 AM |
Trouble running CEGMA sample data | bioman1 | Bioinformatics | 3 | 09-08-2014 08:45 AM |
CEGMA error | condomitti | Bioinformatics | 12 | 05-07-2014 05:41 AM |
bfast index Fatal Error | kenietz | Bioinformatics | 18 | 03-14-2012 07:23 PM |
Bfast Fatal Error during indexing | fpruzius | Bioinformatics | 8 | 08-17-2011 08:26 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Scotland Join Date: Feb 2014
Posts: 27
|
![]()
Hello Everyone,
I am trying to get the CEGMA score for our new transcriptome assembly. But when I tried to run cegma on my assemled transcripts, I am getting following error:- "RUNNING: local_map -n local -f -h /sw/opt/CEGMA_v2.5/data/hmm_profiles -i KOG genome.chunks.fa 2>output.cegma.errors FATAL ERROR when running local map 6400: "No such file or directory" AND "genewise: error while loading shared libraries: libglib-1.2.so.0: cannot open shared object file: No such file or directory Can't run genewise -splice_gtag -quiet -gff -pretty -alb -hmmer /sw/opt/CEGMA_v2.5/data/hmm_profiles/KOG0002.hmm genomic12139.fa >genewise12139 " I am running after setting every dependency in the path. libglib-1.2.so.0 is also in the path,even I tried running with the libglib-2.0.so.0 [as mentioned in https://gist.github.com/robsyme/1153173]. Here's the commands which I am using for setting path:- source .bashrc export PATH=$PATH:/sw/opt/geneid/bin/ export PATH=$PATH:/sw/opt/blast+/bin/ export CEGMA=/sw/opt/CEGMA_v2.5 export PERL5LIB=/sw/opt/CEGMA_v2.5/lib/ export PATH=$PATH:/sw/opt/wise2.4.1/src/bin/ export WISECONFIGDIR=/sw/opt/wise2.4.1/wisecfg export LD_LIBRARY_PATH=/usr/lib64:${LD_LIBRARY_PATH} export PATH=$PATH:/sw/opt/CEGMA_v2.5/bin/ I tried so many times after making little changes, but still no luck. I wonder, if anyone come across with this kind of problem? Any suggestion would be very helpful. Many Thanks, Reema Singh |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Durham, NH Join Date: Sep 2009
Posts: 108
|
![]()
I made an amazon machine image for CEGMA - search CEGMA on AWS and you should find it. It is ami-18935a70. It's preconfigured, just upload your data and run.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Davis, CA Join Date: May 2011
Posts: 53
|
![]()
Genewise is usually the problem step in most errant runs of CEGMA. As well as the Amazon instance that peromhc mentioned there are many other ways of running CEGMA where you don't have to install it yourself (including a VM). Check the CEGMA FAQ:
http://korflab.ucdavis.edu/Datasets/...aq.html#link14 |
![]() |
![]() |
![]() |
#4 |
Member
Location: Scotland Join Date: Feb 2014
Posts: 27
|
![]()
Hello peromhc and kbradnam,
Thank you very much for your reply. Sorry for getting back late- As I first want to run cegma on my assembly before posting the reply. Here's the update:- 1) CEGMA works fine now on our cluster. As the problem was because of a missing library on the execution host and that has been updated by our IT people. 2) But I would like to ask one more question:- What is the best Cegma score? I was looking at the http://korflab.ucdavis.edu/Datasets/...faq.html#link4 , but couldn't relate it with our results. our assembly contains:- a) Assembly 1 = 202(complete) and 229(partial) b) Assembly 2 = 226(complete) and 239(partial) Do we have good score? Any explanation/suggestion would be very helpful. Thanks, Reema, |
![]() |
![]() |
![]() |
#5 |
Member
Location: Davis, CA Join Date: May 2011
Posts: 53
|
![]()
202 is better than 201 but not as good as 203. It's all relative. CEGMA is most useful in this regard only if you have made multiple assemblies from the same input data. This allows you to assess the relative performance of different assemblers and/or assembly parameters.
I have previously reported on variation in many different runs of CEGMA: http://figshare.com/articles/Variati...etrics/1011961 |
![]() |
![]() |
![]() |
#6 |
Member
Location: Scotland Join Date: Feb 2014
Posts: 27
|
![]()
Hello kbradnam
Thanks for sharing the link. But i have one more quick question. partial score is higher than complete score from both assembly(generated from two different samples). As far as i understand from http://korflab.ucdavis.edu/Datasets/cegma/, the number of partial set would be higher as it also include the complete set. So my question is :- If partial score is higher than complete score than is this indicates that assembly is fragmented? Also should partial score lower than complete score in ideal situation? Thanks, Reema, Last edited by reema; 09-12-2014 at 07:52 AM. |
![]() |
![]() |
![]() |
#7 | |
Member
Location: Davis, CA Join Date: May 2011
Posts: 53
|
![]() Quote:
Our ideal (fantasy) result — for the purpose of qualifying the completeness of the gene space — is to have 248 complete proteins present. This would also give a partial figure of 248 as this category is really a superset of complete + partial. Note that even if CEGMA says something is 'complete' there is still the possibility that parts of the protein is missing. You have to decide on some artificial cut-off as expecting 100% of the sequence to be present is a) unrealistic and b) not possible because you may not know what 100% means in a newly sequenced species (e.g. in that species there may have been a 3 bp insertion leading to 1 extra amino acid). So from CEGMA's point of view, 'complete' means about 70% present (I say 'about' because this is based on alignments to 6 different profile HMMs, which may each vary in length). What if you don't have 248 core genes 'completely' present. Well the next thing is to look at the partial results, how close to 248 are they? If you have 200 (complete) and 240 (complete + partial) then this at least suggests that most of the core gene set is present in your assembly, but some may be split across contigs or missing from the assembly. Remember, CEGMA only looks for genes that are inside individual contigs or scaffolds. You could have an assembly that splits every gene across contigs which might lead to a 'complete' result of zero, and a partial result of '248'. From looking at results of many different runs of CEGMA, it is common to see something like 90–95% of core gene present in the 'complete' category, and another 1–5% present as partial genes. On their own, the 'complete' and 'partial' figures are not that useful. But when you compare results from multiple genome assemblies (all using the same input data), then you might be able to say something about the differences. Update: just looking through some old CEGMA results, I also found one case where the results were 157/223. This is more unusual, suggesting that a relatively large number (27%) of the core genes were present as fragments. This might simply reflect lots of short contigs/scaffolds in the assembly. In contrast to this, one of the best results that I have seen is 245/248. It is rare to see all core genes present, even when you allow for partial matches. Last edited by kbradnam; 09-12-2014 at 04:32 PM. |
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Scotland Join Date: Feb 2014
Posts: 27
|
![]()
Thanks kbradnam for explaining this so clearly and nicely. I understand now
![]() Many Thanks, Reema Singh |
![]() |
![]() |
![]() |
Tags |
cegma score, libglib, local_map |
Thread Tools | |
|
|