Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • filtering - find-if commands

    Hi, Could someone please tell me how to do this simple search. I don't know much programming and this is beyond me. But I think it should be a simple find-if loop?
    Thanks so much. File below has my query


    # need to perform
    # if for each value j (X, Y, Z) in column2, value i in column1 = A and B and C and D, then output value j
    # so in this example, I should only get 'X' as the output
    # as only X has rows with all four value i (A and B and C and D)

    A X
    A X
    B X
    C X
    D X
    D X
    D X
    A Y
    A Y
    B Y
    A Z
    B Z
    C Z
    C Z
    C Z
    C Z
    C Z
    C Z
    C Z

  • #2
    Saving that file as "foo.txt", one could just use R:

    Code:
    #Read in the file
    d <- read.table("foo.txt", sep=" ")
    #Split it according to the second column
    dl <- split(d, d$V2)
    #Create a function to return TRUE/FALSE
    foo <- function(x) {
         if(length(unique(x$V1)) != 4) {
             FALSE
         } else if(!all(c("A","B","C","D") %in% unique(x$V1))) {
             FALSE
         } else {
             TRUE
        }
    }
    #Run things
    sapply(dl,foo)
    The output is:
    Code:
        X     Y     Z 
     TRUE FALSE FALSE
    Last edited by dpryan; 01-23-2014, 03:14 PM. Reason: Swap order

    Comment


    • #3
      Hi dpryan,

      This was really useful, thanks! I tried it on my actual data which is a bit more complex than the example above. I had to switch the %in% operator to come after the ("A","B',"C","D") instead of before and then it ran fine

      Thanks again!!

      Comment


      • #4
        True, I missed that. I updated my reply to incorporate that fix, should someone need it later.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        48 views
        0 likes
        Last Post seqadmin  
        Working...
        X