Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching a column from a data frame on the columns of another data frame

    I got two big data frames, one (df1) has this structure

    V1 V2 V3
    1 Chr1 7507 10944
    2 Chr1 10944 13170
    3 Chr1 13170 20065
    4 Chr1 20065 28273
    5 Chr1 28273 29960
    6 Chr1 29960 36599
    7 Chr1 36599 37513
    8 Chr1 37513 40360
    9 Chr1 40360 48796
    10 Chr1 48796 50661

    The other (df2) has this

    V1 V2 V3 V4 V5
    1 Chr1 7507 7507 1 1
    2 Chr1 10944 10944 1 2
    3 Chr1 13170 13170 1 22
    4 Chr1 20065 20065 1 3
    5 Chr1 28273 28273 1 161
    6 Chr1 29960 29960 1 10
    7 Chr1 36599 36599 1 604
    8 Chr1 37513 37513 1 117
    9 Chr1 40360 40360 1 8
    10 Chr1 48796 48796 1 3
    what I'm trying to do is to check if the column V2 or V3 (is the same) of df2 is = or between the range of V2 and V3 of df1 then I want to write the value of V5 of df2 in a new column in df1 if not write 0. the result that i want would be like :

    Chr1 7507 10944 1
    Chr1 10944 13170 2
    Chr1 13170 20065 22
    Chr1 20065 28273 3
    Chr1 28273 29960 161
    Chr1 29960 36599 10
    Chr1 36599 37513 604
    Chr1 37513 40360 117
    Chr1 40360 48796 8
    .
    .
    .
    Do you know any good way to do this?
    Thank you very much.
    Last edited by zisis86; 05-28-2014, 05:38 AM.

  • #2
    The simplest solution is to make these GRanges objects and then use findOverlaps. You can then add meta information columns to the first dataset (just 2 columns of 0s) and then increment those values according to the overlap values. This has the benefit of taking care of cases where there are multiple overlaps.

    Comment


    • #3
      Do you want to compare these line by line? Also, are V2 and V3 always the same?

      If so, why not use an ifelse statement in R?

      df3<-cbind(df1,nrow(df1) ##just adds another column to the df equal to 0
      df3[,4]<-ifelse((df2[,2]>(df1[,2]-1) || df2[,2]<(df1[,3]+1)),df2[,5],df3[,4])

      not tested, but something like this should work if you are going line by line

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      43 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      29 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Working...
      X