Seqanswers Leaderboard Ad

**dpryan** · 09-14-2013, 12:06 PM

Well, it makes sense to just use Fisher's test, which uses a hypergeometric distribution but more directly does what you apparently want to do. So something like:

Code:

d <- as.matrix(read.csv("help.csv", row.names=1))
totals <- apply(d, 2, sum)
pvals <- c(rep(NA, nrow(d)))
for(i in seq(nrow(d))) {
    pvals[i] <- fisher.test(matrix(c(d[i,1], totals[1]-d[i,1], d[i,2], totals[2]-d[i,2]), nrow=2))$p.value
}
padj <- p.adjust(pvals)

The adjusted p-values for OSP_8.100.Spring.Plain vs. CH_1.Crater.Hills are then in the padj vector (the order is the same as in your csv file). In R, loops are actually pretty slow, so you could use "apply" instead to make things faster if you have more data. For the other comparisons, just change the "pvals[i] <- ..." line appropriately.

**JackieBadger** · 09-14-2013, 02:17 PM

A collection of bioconductor methods to visualize gene-list annotations - BMC Research Notes

http://www.biomedcentral.com/1756-0500/3/10

Background Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes. Findings We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists. Conclusions These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.

**dpryan** · 09-16-2013, 12:14 AM

Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.

**SDPA_Pet** · 09-16-2013, 07:14 AM

Originally posted by dpryan View Post

Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.

Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.

**dpryan** · 09-16-2013, 07:25 AM

Originally posted by SDPA_Pet View Post

Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.

It doesn't really matter if these are categories or functions, my concern would hold either way.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Find representative genes

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News