Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • byou678
    Member
    • Aug 2011
    • 52

    How to extract Common Genes from 2 spreadsheets

    Now I have two Excel spreadsheets, parts of them are as below:

    Spreadsheet1

    Gene Change Padj
    1 212 0.0006
    2 99 0.09
    3 15 0.7
    4 59 0.456
    5 418 0.00001
    6 566 0

    Spreadsheet2

    Gene Statistic FDR
    1 28 0.0005
    2 17 0.007
    3 3 0.81
    4 8 0.28
    5 48 0.002
    6 111 0

    Using Excel, how can I create one new spreadsheet that only includes the
    Genes with both Padj and FDR are less than 0.05?

    And how can I do this in any other statsitical software, Like R, SPSS?
    Thanks very much for any idea for any question!
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    #2
    This is cake in R...

    Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

    Code:
    [B]> df1 <- read.table(pipe("pbpaste"),header=TRUE)
    > df1[/B]
      Gene Change    Padj
    1    1    212 0.00060
    2    2     99 0.09000
    3    3     15 0.70000
    4    4     59 0.45600
    5    5    418 0.00001
    6    6    566 0.00000
    [B]> df2 <- read.table(pipe("pbpaste"),header=TRUE)
    > df2[/B]
      Gene Statistic    FDR
    1    1        28 0.0005
    2    2        17 0.0070
    3    3         3 0.8100
    4    4         8 0.2800
    5    5        48 0.0020
    6    6       111 0.0000
    [B]> df3 <- merge(df1,df2)
    > df3[/B]
      Gene Change    Padj Statistic    FDR
    1    1    212 0.00060        28 0.0005
    2    2     99 0.09000        17 0.0070
    3    3     15 0.70000         3 0.8100
    4    4     59 0.45600         8 0.2800
    5    5    418 0.00001        48 0.0020
    6    6    566 0.00000       111 0.0000
    [B]> df3[df3$Padj< 0.05 & df3$FDR < 0.05,][/B]
      Gene Change  Padj Statistic   FDR
    1    1    212 6e-04        28 5e-04
    5    5    418 1e-05        48 2e-03
    6    6    566 0e+00       111 0e+00
    [B]> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])[/B]
    "","Gene","Change","Padj","Statistic","FDR"
    "1",1,212,6e-04,28,5e-04
    "5",5,418,1e-05,48,0.002
    "6",6,566,0,111,0

    Comment

    • byou678
      Member
      • Aug 2011
      • 52

      #3
      Thanks ECO, I run the similar codes in R, too. And I can get the result like yours Just based on the Spreadsheets showed in this thread. Sorry for not telling more info about my data:But I have more Genes(37970) in Spreadsheet2 than Spreadsheet1 (17980). It supposed to have 17980 Genes left after merging [the 19990(37970-17980) Genes which only in Spreadsheet2 will not appear in the merged sheets]. Unfortunately, it messed up after merging. Any idea will be greatly appreciated.

      Originally posted by ECO View Post
      This is cake in R...

      Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

      Code:
      [B]> df1 <- read.table(pipe("pbpaste"),header=TRUE)
      > df1[/B]
        Gene Change    Padj
      1    1    212 0.00060
      2    2     99 0.09000
      3    3     15 0.70000
      4    4     59 0.45600
      5    5    418 0.00001
      6    6    566 0.00000
      [B]> df2 <- read.table(pipe("pbpaste"),header=TRUE)
      > df2[/B]
        Gene Statistic    FDR
      1    1        28 0.0005
      2    2        17 0.0070
      3    3         3 0.8100
      4    4         8 0.2800
      5    5        48 0.0020
      6    6       111 0.0000
      [B]> df3 <- merge(df1,df2)
      > df3[/B]
        Gene Change    Padj Statistic    FDR
      1    1    212 0.00060        28 0.0005
      2    2     99 0.09000        17 0.0070
      3    3     15 0.70000         3 0.8100
      4    4     59 0.45600         8 0.2800
      5    5    418 0.00001        48 0.0020
      6    6    566 0.00000       111 0.0000
      [B]> df3[df3$Padj< 0.05 & df3$FDR < 0.05,][/B]
        Gene Change  Padj Statistic   FDR
      1    1    212 6e-04        28 5e-04
      5    5    418 1e-05        48 2e-03
      6    6    566 0e+00       111 0e+00
      [B]> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])[/B]
      "","Gene","Change","Padj","Statistic","FDR"
      "1",1,212,6e-04,28,5e-04
      "5",5,418,1e-05,48,0.002
      "6",6,566,0,111,0
      Last edited by byou678; 10-03-2011, 12:08 PM.

      Comment

      • ECO
        --Site Admin--
        • Oct 2007
        • 1360

        #4
        Just need to tell merge what to do...(note I added two extra genes into df1)

        Code:
        [B]> df1[/B]
          Gene Change    Padj
        1    1    212 0.00060
        2    2     99 0.09000
        3    3     15 0.70000
        4    4     59 0.45600
        5    5    418 0.00001
        6    6    566 0.00000
        7    7    123 0.00000
        8    8    125 0.04000
        [B]> df2[/B]
          Gene Statistic    FDR
        1    1        28 0.0005
        2    2        17 0.0070
        3    3         3 0.8100
        4    4         8 0.2800
        5    5        48 0.0020
        6    6       111 0.0000
        [B]> merge(df1,df2,on=Gene)[/B]
          Gene Statistic    FDR Change    Padj
        1    1        28 0.0005    212 0.00060
        2    2        17 0.0070     99 0.09000
        3    3         3 0.8100     15 0.70000
        4    4         8 0.2800     59 0.45600
        5    5        48 0.0020    418 0.00001
        6    6       111 0.0000    566 0.00000
        edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

        Comment

        • byou678
          Member
          • Aug 2011
          • 52

          #5
          Thanks again ECO. Yes, I like using R too and I have figured it out.

          Originally posted by ECO View Post
          Just need to tell merge what to do...(note I added two extra genes into df1)

          Code:
          [B]> df1[/B]
            Gene Change    Padj
          1    1    212 0.00060
          2    2     99 0.09000
          3    3     15 0.70000
          4    4     59 0.45600
          5    5    418 0.00001
          6    6    566 0.00000
          7    7    123 0.00000
          8    8    125 0.04000
          [B]> df2[/B]
            Gene Statistic    FDR
          1    1        28 0.0005
          2    2        17 0.0070
          3    3         3 0.8100
          4    4         8 0.2800
          5    5        48 0.0020
          6    6       111 0.0000
          [B]> merge(df1,df2,on=Gene)[/B]
            Gene Statistic    FDR Change    Padj
          1    1        28 0.0005    212 0.00060
          2    2        17 0.0070     99 0.09000
          3    3         3 0.8100     15 0.70000
          4    4         8 0.2800     59 0.45600
          5    5        48 0.0020    418 0.00001
          6    6       111 0.0000    566 0.00000
          edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          48 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          107 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          125 views
          0 reactions
          Last Post SEQadmin2  
          Working...