SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ExomeCNV - R - data.frame error sophiespo Bioinformatics 4 02-25-2016 01:16 AM
how can i parse lines of a huge .sam file into a data frame, table, list faster in R? zisis86 Bioinformatics 3 05-28-2014 03:48 PM
Creating a data.frame for DESeq2 KHubbard Bioinformatics 3 10-12-2013 12:16 AM
six frame translation Alagurajvelu Bioinformatics 1 08-12-2013 07:51 AM
Six reading frame question... why all contain ORF?? all_your_base Bioinformatics 4 04-04-2013 06:03 AM

Reply
 
Thread Tools
Old 05-28-2014, 06:19 AM   #1
zisis86
Member
 
Location: Poznan,Poland

Join Date: Mar 2013
Posts: 10
Exclamation Matching a column from a data frame on the columns of another data frame

I got two big data frames, one (df1) has this structure

V1 V2 V3
1 Chr1 7507 10944
2 Chr1 10944 13170
3 Chr1 13170 20065
4 Chr1 20065 28273
5 Chr1 28273 29960
6 Chr1 29960 36599
7 Chr1 36599 37513
8 Chr1 37513 40360
9 Chr1 40360 48796
10 Chr1 48796 50661

The other (df2) has this

V1 V2 V3 V4 V5
1 Chr1 7507 7507 1 1
2 Chr1 10944 10944 1 2
3 Chr1 13170 13170 1 22
4 Chr1 20065 20065 1 3
5 Chr1 28273 28273 1 161
6 Chr1 29960 29960 1 10
7 Chr1 36599 36599 1 604
8 Chr1 37513 37513 1 117
9 Chr1 40360 40360 1 8
10 Chr1 48796 48796 1 3
what I'm trying to do is to check if the column V2 or V3 (is the same) of df2 is = or between the range of V2 and V3 of df1 then I want to write the value of V5 of df2 in a new column in df1 if not write 0. the result that i want would be like :

Chr1 7507 10944 1
Chr1 10944 13170 2
Chr1 13170 20065 22
Chr1 20065 28273 3
Chr1 28273 29960 161
Chr1 29960 36599 10
Chr1 36599 37513 604
Chr1 37513 40360 117
Chr1 40360 48796 8
.
.
.
Do you know any good way to do this?
Thank you very much.

Last edited by zisis86; 05-28-2014 at 06:38 AM.
zisis86 is offline   Reply With Quote
Old 05-28-2014, 06:46 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The simplest solution is to make these GRanges objects and then use findOverlaps. You can then add meta information columns to the first dataset (just 2 columns of 0s) and then increment those values according to the overlap values. This has the benefit of taking care of cases where there are multiple overlaps.
dpryan is offline   Reply With Quote
Old 05-29-2014, 05:32 AM   #3
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Do you want to compare these line by line? Also, are V2 and V3 always the same?

If so, why not use an ifelse statement in R?

df3<-cbind(df1,nrow(df1) ##just adds another column to the df equal to 0
df3[,4]<-ifelse((df2[,2]>(df1[,2]-1) || df2[,2]<(df1[,3]+1)),df2[,5],df3[,4])

not tested, but something like this should work if you are going line by line
bioBob is offline   Reply With Quote
Reply

Tags
bioinformatics, column, matching, rscript

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO