SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to find fusion genes using bioinformatics, Need help? ketan_bnf Bioinformatics 30 01-04-2016 11:51 PM
How to find SNP's in various genes fizzle123456789 Bioinformatics 9 03-11-2014 05:36 AM
how to find the relationship of dozens genes xfh Bioinformatics 4 03-12-2013 04:32 AM
How to find DE genes using RPKM values? casshyr Bioinformatics 2 10-08-2010 07:03 AM

Reply
 
Thread Tools
Old 09-14-2013, 09:43 AM   #1
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default Find representative genes

Hi, I still don't know how to use the find over representative genes via phyper (someone recommends this command) or other R package. I attached a csv file as an example. Can anyone write a R scripts for me with my dataset.

In my dataset, the first row are my sample name. The first column is COG category ID. The numbers are gene counts.

Thank you.
Attached Files
File Type: zip help.zip (428 Bytes, 5 views)
SDPA_Pet is offline   Reply With Quote
Old 09-14-2013, 12:06 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Well, it makes sense to just use Fisher's test, which uses a hypergeometric distribution but more directly does what you apparently want to do. So something like:
Code:
d <- as.matrix(read.csv("help.csv", row.names=1))
totals <- apply(d, 2, sum)
pvals <- c(rep(NA, nrow(d)))
for(i in seq(nrow(d))) {
    pvals[i] <- fisher.test(matrix(c(d[i,1], totals[1]-d[i,1], d[i,2], totals[2]-d[i,2]), nrow=2))$p.value
}
padj <- p.adjust(pvals)
The adjusted p-values for OSP_8.100.Spring.Plain vs. CH_1.Crater.Hills are then in the padj vector (the order is the same as in your csv file). In R, loops are actually pretty slow, so you could use "apply" instead to make things faster if you have more data. For the other comparisons, just change the "pvals[i] <- ..." line appropriately.
dpryan is offline   Reply With Quote
Old 09-14-2013, 02:17 PM   #3
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

http://www.biomedcentral.com/1756-0500/3/10
JackieBadger is offline   Reply With Quote
Old 09-16-2013, 12:14 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.
dpryan is offline   Reply With Quote
Old 09-16-2013, 07:14 AM   #5
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default

Quote:
Originally Posted by dpryan View Post
Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.
Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.
SDPA_Pet is offline   Reply With Quote
Old 09-16-2013, 07:25 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by SDPA_Pet View Post
Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.
It doesn't really matter if these are categories or functions, my concern would hold either way.
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO