Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching a column from a data frame on the columns of another data frame

    I got two big data frames, one (df1) has this structure

    V1 V2 V3
    1 Chr1 7507 10944
    2 Chr1 10944 13170
    3 Chr1 13170 20065
    4 Chr1 20065 28273
    5 Chr1 28273 29960
    6 Chr1 29960 36599
    7 Chr1 36599 37513
    8 Chr1 37513 40360
    9 Chr1 40360 48796
    10 Chr1 48796 50661

    The other (df2) has this

    V1 V2 V3 V4 V5
    1 Chr1 7507 7507 1 1
    2 Chr1 10944 10944 1 2
    3 Chr1 13170 13170 1 22
    4 Chr1 20065 20065 1 3
    5 Chr1 28273 28273 1 161
    6 Chr1 29960 29960 1 10
    7 Chr1 36599 36599 1 604
    8 Chr1 37513 37513 1 117
    9 Chr1 40360 40360 1 8
    10 Chr1 48796 48796 1 3
    what I'm trying to do is to check if the column V2 or V3 (is the same) of df2 is = or between the range of V2 and V3 of df1 then I want to write the value of V5 of df2 in a new column in df1 if not write 0. the result that i want would be like :

    Chr1 7507 10944 1
    Chr1 10944 13170 2
    Chr1 13170 20065 22
    Chr1 20065 28273 3
    Chr1 28273 29960 161
    Chr1 29960 36599 10
    Chr1 36599 37513 604
    Chr1 37513 40360 117
    Chr1 40360 48796 8
    .
    .
    .
    Do you know any good way to do this?
    Thank you very much.
    Last edited by zisis86; 05-28-2014, 05:38 AM.

  • #2
    The simplest solution is to make these GRanges objects and then use findOverlaps. You can then add meta information columns to the first dataset (just 2 columns of 0s) and then increment those values according to the overlap values. This has the benefit of taking care of cases where there are multiple overlaps.

    Comment


    • #3
      Do you want to compare these line by line? Also, are V2 and V3 always the same?

      If so, why not use an ifelse statement in R?

      df3<-cbind(df1,nrow(df1) ##just adds another column to the df equal to 0
      df3[,4]<-ifelse((df2[,2]>(df1[,2]-1) || df2[,2]<(df1[,3]+1)),df2[,5],df3[,4])

      not tested, but something like this should work if you are going line by line

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X