View Single Post
Old 09-23-2014, 08:14 AM   #8
Location: NYC

Join Date: Aug 2010
Posts: 48

Sorry, I wasn't really referring to the two /data files (kogs.fa and completeness_cutoff.tbl) I was referring to my output files. Let's call them my.completeness_report and my.cegma.fa

we've established that
-my.completeness_report reports matches to a master set of 248 CEGs (listed in completeness_cutoff.tbl).
-my.cegma.fa reports matches to a master set of 458 CEGs (listed in kogs.fa)
-The smaller master set is wholly contained within the larger master set

In my CEGMA run, I got these results
156 CEGs reported in my.completeness_report
205 CEGS reported in my.cegma.fa

I assumed that all 156 CEGS in the first set would be in the second set of 205 too.

But this is not the case.

If I compare the 156 CEG IDs (which are not explicitly reported, but can be derived from the complementary list of NON-matches in my.compleness_report) to the 205 CEG IDs in my.cegma.fa, I find just 101 CEGs shared by both.

If I compare the master set of 248 CEG IDs in /data/completeness_cutoff.tbl to the 205 CEG IDs in my.cegma.fa, I find 105 CEGs shared by both. (i.e., my.cegma.fa not only does not contain all 156 from the 248 masterset, it is also reporting 4 novel CEGs from that masterset!)

So I conclude that the my.completeness_report number (156 hits to a master set of 248 CEGs) is derived very differently from, and independently of, the set reported to me in my.cegma.fa (205 hits to a master set of 458 CEGs).

Last edited by ssully; 09-23-2014 at 09:30 AM.
ssully is offline   Reply With Quote