SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CEGMA error flobpf Bioinformatics 12 01-27-2015 05:54 AM
Trouble running CEGMA sample data bioman1 Bioinformatics 3 09-08-2014 07:45 AM
CEGMA error condomitti Bioinformatics 12 05-07-2014 04:41 AM
bfast index Fatal Error kenietz Bioinformatics 18 03-14-2012 06:23 PM
Bfast Fatal Error during indexing fpruzius Bioinformatics 8 08-17-2011 07:26 PM

Reply
 
Thread Tools
Old 08-31-2014, 12:30 PM   #1
reema
Member
 
Location: Scotland

Join Date: Feb 2014
Posts: 27
Default CEGMA - FATAL ERROR when running local map 6400

Hello Everyone,

I am trying to get the CEGMA score for our new transcriptome assembly. But when I tried to run cegma on my assemled transcripts, I am getting following error:-

"RUNNING: local_map -n local -f -h /sw/opt/CEGMA_v2.5/data/hmm_profiles -i KOG genome.chunks.fa 2>output.cegma.errors
FATAL ERROR when running local map 6400: "No such file or directory"

AND

"genewise: error while loading shared libraries: libglib-1.2.so.0: cannot open shared object file: No such file or directory
Can't run genewise -splice_gtag -quiet -gff -pretty -alb -hmmer /sw/opt/CEGMA_v2.5/data/hmm_profiles/KOG0002.hmm genomic12139.fa >genewise12139 "

I am running after setting every dependency in the path. libglib-1.2.so.0 is also in the path,even I tried running with the libglib-2.0.so.0 [as mentioned in https://gist.github.com/robsyme/1153173]. Here's the commands which I am using for setting path:-

source .bashrc
export PATH=$PATH:/sw/opt/geneid/bin/
export PATH=$PATH:/sw/opt/blast+/bin/
export CEGMA=/sw/opt/CEGMA_v2.5
export PERL5LIB=/sw/opt/CEGMA_v2.5/lib/
export PATH=$PATH:/sw/opt/wise2.4.1/src/bin/
export WISECONFIGDIR=/sw/opt/wise2.4.1/wisecfg
export LD_LIBRARY_PATH=/usr/lib64:${LD_LIBRARY_PATH}
export PATH=$PATH:/sw/opt/CEGMA_v2.5/bin/

I tried so many times after making little changes, but still no luck. I wonder, if anyone come across with this kind of problem? Any suggestion would be very helpful.

Many Thanks,
Reema Singh
reema is offline   Reply With Quote
Old 08-31-2014, 06:48 PM   #2
peromhc
Senior Member
 
Location: Durham, NH

Join Date: Sep 2009
Posts: 108
Default

I made an amazon machine image for CEGMA - search CEGMA on AWS and you should find it. It is ami-18935a70. It's preconfigured, just upload your data and run.
peromhc is offline   Reply With Quote
Old 09-08-2014, 07:53 AM   #3
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

Genewise is usually the problem step in most errant runs of CEGMA. As well as the Amazon instance that peromhc mentioned there are many other ways of running CEGMA where you don't have to install it yourself (including a VM). Check the CEGMA FAQ:

http://korflab.ucdavis.edu/Datasets/...aq.html#link14
kbradnam is offline   Reply With Quote
Old 09-11-2014, 06:02 AM   #4
reema
Member
 
Location: Scotland

Join Date: Feb 2014
Posts: 27
Default

Hello peromhc and kbradnam,

Thank you very much for your reply. Sorry for getting back late- As I first want to run cegma on my assembly before posting the reply. Here's the update:-

1) CEGMA works fine now on our cluster. As the problem was because of a missing library on the execution host and that has been updated by our IT people.

2) But I would like to ask one more question:- What is the best Cegma score? I was looking at the http://korflab.ucdavis.edu/Datasets/...faq.html#link4 , but couldn't relate it with our results. our assembly contains:-

a) Assembly 1 = 202(complete) and 229(partial)
b) Assembly 2 = 226(complete) and 239(partial)

Do we have good score?

Any explanation/suggestion would be very helpful.

Thanks,
Reema,
reema is offline   Reply With Quote
Old 09-11-2014, 08:32 AM   #5
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

202 is better than 201 but not as good as 203. It's all relative. CEGMA is most useful in this regard only if you have made multiple assemblies from the same input data. This allows you to assess the relative performance of different assemblers and/or assembly parameters.

I have previously reported on variation in many different runs of CEGMA: http://figshare.com/articles/Variati...etrics/1011961
kbradnam is offline   Reply With Quote
Old 09-12-2014, 06:47 AM   #6
reema
Member
 
Location: Scotland

Join Date: Feb 2014
Posts: 27
Default

Hello kbradnam

Thanks for sharing the link. But i have one more quick question.

partial score is higher than complete score from both assembly(generated from two different samples). As far as i understand from http://korflab.ucdavis.edu/Datasets/cegma/, the number of partial set would be higher as it also include the complete set. So my question is :- If partial score is higher than complete score than is this indicates that assembly is fragmented?
Also should partial score lower than complete score in ideal situation?

Thanks,
Reema,

Last edited by reema; 09-12-2014 at 06:52 AM.
reema is offline   Reply With Quote
Old 09-12-2014, 03:27 PM   #7
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

Quote:
Originally Posted by reema View Post
If partial score is higher than complete score than is this indicates that assembly is fragmented?
Also should partial score lower than complete score in ideal situation?
Remember, these are not scores per se. 'Complete' and 'partial' refer to the number of full-length, or full-length *and* partial length core genes detected by the CEGMA pipeline.

Our ideal (fantasy) result — for the purpose of qualifying the completeness of the gene space — is to have 248 complete proteins present. This would also give a partial figure of 248 as this category is really a superset of complete + partial.

Note that even if CEGMA says something is 'complete' there is still the possibility that parts of the protein is missing. You have to decide on some artificial cut-off as expecting 100% of the sequence to be present is a) unrealistic and b) not possible because you may not know what 100% means in a newly sequenced species (e.g. in that species there may have been a 3 bp insertion leading to 1 extra amino acid).


So from CEGMA's point of view, 'complete' means about 70% present (I say 'about' because this is based on alignments to 6 different profile HMMs, which may each vary in length).

What if you don't have 248 core genes 'completely' present. Well the next thing is to look at the partial results, how close to 248 are they? If you have 200 (complete) and 240 (complete + partial) then this at least suggests that most of the core gene set is present in your assembly, but some may be split across contigs or missing from the assembly. Remember, CEGMA only looks for genes that are inside individual contigs or scaffolds. You could have an assembly that splits every gene across contigs which might lead to a 'complete' result of zero, and a partial result of '248'.

From looking at results of many different runs of CEGMA, it is common to see something like 90–95% of core gene present in the 'complete' category, and another 1–5% present as partial genes.

On their own, the 'complete' and 'partial' figures are not that useful. But when you compare results from multiple genome assemblies (all using the same input data), then you might be able to say something about the differences.

Update: just looking through some old CEGMA results, I also found one case where the results were 157/223. This is more unusual, suggesting that a relatively large number (27%) of the core genes were present as fragments. This might simply reflect lots of short contigs/scaffolds in the assembly. In contrast to this, one of the best results that I have seen is 245/248. It is rare to see all core genes present, even when you allow for partial matches.

Last edited by kbradnam; 09-12-2014 at 03:32 PM.
kbradnam is offline   Reply With Quote
Old 09-15-2014, 12:19 AM   #8
reema
Member
 
Location: Scotland

Join Date: Feb 2014
Posts: 27
Default

Thanks kbradnam for explaining this so clearly and nicely. I understand now

Many Thanks,
Reema Singh
reema is offline   Reply With Quote
Reply

Tags
cegma score, libglib, local_map

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO