View Single Post
Old 09-22-2014, 07:19 AM   #4
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

Basically, all output from CEGMA apart from the output.completeness_report file is referring to the larger set of 458 CEGs. The last file generated by CEGMA is output.completeness_report and that is the only place where results for the subset of 248 CEGs are calculated.

Your last point is mostly correct: the 248 CEGs are based on different filtering criteria than the 458 CEGs, but I think that everything counted in the 248 results should be in the 458 results (but not vice versa).

Are you trying to exact pattern matching? Note that KOG IDs get various numerical suffixes added during the CEGMA run. E.g. an ortholog of KOG0062 might be detected in your input file but CEGMA might give this a name of KOG0062.5 (I won't go into the reasons why this is the case). Might this explain your discrepancy in numbers?
kbradnam is offline   Reply With Quote