SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   How to extract Common Genes from 2 spreadsheets (http://seqanswers.com/forums/showthread.php?t=14494)

byou678 10-03-2011 07:53 AM

How to extract Common Genes from 2 spreadsheets
 
Now I have two Excel spreadsheets, parts of them are as below:

Spreadsheet1

Gene Change Padj
1 212 0.0006
2 99 0.09
3 15 0.7
4 59 0.456
5 418 0.00001
6 566 0

Spreadsheet2

Gene Statistic FDR
1 28 0.0005
2 17 0.007
3 3 0.81
4 8 0.28
5 48 0.002
6 111 0

Using Excel, how can I create one new spreadsheet that only includes the
Genes with both Padj and FDR are less than 0.05?

And how can I do this in any other statsitical software, Like R, SPSS?
Thanks very much for any idea for any question!

ECO 10-03-2011 08:51 AM

This is cake in R...

Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

Code:

> df1 <- read.table(pipe("pbpaste"),header=TRUE)
> df1

  Gene Change    Padj
1    1    212 0.00060
2    2    99 0.09000
3    3    15 0.70000
4    4    59 0.45600
5    5    418 0.00001
6    6    566 0.00000
> df2 <- read.table(pipe("pbpaste"),header=TRUE)
> df2

  Gene Statistic    FDR
1    1        28 0.0005
2    2        17 0.0070
3    3        3 0.8100
4    4        8 0.2800
5    5        48 0.0020
6    6      111 0.0000
> df3 <- merge(df1,df2)
> df3

  Gene Change    Padj Statistic    FDR
1    1    212 0.00060        28 0.0005
2    2    99 0.09000        17 0.0070
3    3    15 0.70000        3 0.8100
4    4    59 0.45600        8 0.2800
5    5    418 0.00001        48 0.0020
6    6    566 0.00000      111 0.0000
> df3[df3$Padj< 0.05 & df3$FDR < 0.05,]
  Gene Change  Padj Statistic  FDR
1    1    212 6e-04        28 5e-04
5    5    418 1e-05        48 2e-03
6    6    566 0e+00      111 0e+00
> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])
"","Gene","Change","Padj","Statistic","FDR"
"1",1,212,6e-04,28,5e-04
"5",5,418,1e-05,48,0.002
"6",6,566,0,111,0


byou678 10-03-2011 11:57 AM

Thanks ECO, I run the similar codes in R, too. And I can get the result like yours Just based on the Spreadsheets showed in this thread. Sorry for not telling more info about my data:But I have more Genes(37970) in Spreadsheet2 than Spreadsheet1 (17980). It supposed to have 17980 Genes left after merging [the 19990(37970-17980) Genes which only in Spreadsheet2 will not appear in the merged sheets]. Unfortunately, it messed up after merging. Any idea will be greatly appreciated.

Quote:

Originally Posted by ECO (Post 52868)
This is cake in R...

Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

Code:

> df1 <- read.table(pipe("pbpaste"),header=TRUE)
> df1

  Gene Change    Padj
1    1    212 0.00060
2    2    99 0.09000
3    3    15 0.70000
4    4    59 0.45600
5    5    418 0.00001
6    6    566 0.00000
> df2 <- read.table(pipe("pbpaste"),header=TRUE)
> df2

  Gene Statistic    FDR
1    1        28 0.0005
2    2        17 0.0070
3    3        3 0.8100
4    4        8 0.2800
5    5        48 0.0020
6    6      111 0.0000
> df3 <- merge(df1,df2)
> df3

  Gene Change    Padj Statistic    FDR
1    1    212 0.00060        28 0.0005
2    2    99 0.09000        17 0.0070
3    3    15 0.70000        3 0.8100
4    4    59 0.45600        8 0.2800
5    5    418 0.00001        48 0.0020
6    6    566 0.00000      111 0.0000
> df3[df3$Padj< 0.05 & df3$FDR < 0.05,]
  Gene Change  Padj Statistic  FDR
1    1    212 6e-04        28 5e-04
5    5    418 1e-05        48 2e-03
6    6    566 0e+00      111 0e+00
> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])
"","Gene","Change","Padj","Statistic","FDR"
"1",1,212,6e-04,28,5e-04
"5",5,418,1e-05,48,0.002
"6",6,566,0,111,0



ECO 10-03-2011 01:06 PM

Just need to tell merge what to do...(note I added two extra genes into df1)

Code:

> df1
  Gene Change    Padj
1    1    212 0.00060
2    2    99 0.09000
3    3    15 0.70000
4    4    59 0.45600
5    5    418 0.00001
6    6    566 0.00000
7    7    123 0.00000
8    8    125 0.04000
> df2
  Gene Statistic    FDR
1    1        28 0.0005
2    2        17 0.0070
3    3        3 0.8100
4    4        8 0.2800
5    5        48 0.0020
6    6      111 0.0000
> merge(df1,df2,on=Gene)
  Gene Statistic    FDR Change    Padj
1    1        28 0.0005    212 0.00060
2    2        17 0.0070    99 0.09000
3    3        3 0.8100    15 0.70000
4    4        8 0.2800    59 0.45600
5    5        48 0.0020    418 0.00001
6    6      111 0.0000    566 0.00000

edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

byou678 10-04-2011 07:42 PM

Thanks again ECO. Yes, I like using R too and I have figured it out.

Quote:

Originally Posted by ECO (Post 52893)
Just need to tell merge what to do...(note I added two extra genes into df1)

Code:

> df1
  Gene Change    Padj
1    1    212 0.00060
2    2    99 0.09000
3    3    15 0.70000
4    4    59 0.45600
5    5    418 0.00001
6    6    566 0.00000
7    7    123 0.00000
8    8    125 0.04000
> df2
  Gene Statistic    FDR
1    1        28 0.0005
2    2        17 0.0070
3    3        3 0.8100
4    4        8 0.2800
5    5        48 0.0020
6    6      111 0.0000
> merge(df1,df2,on=Gene)
  Gene Statistic    FDR Change    Padj
1    1        28 0.0005    212 0.00060
2    2        17 0.0070    99 0.09000
3    3        3 0.8100    15 0.70000
4    4        8 0.2800    59 0.45600
5    5        48 0.0020    418 0.00001
6    6      111 0.0000    566 0.00000

edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...



All times are GMT -8. The time now is 04:02 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.