R issues with merge/rbind/concatenate two data frames -
I am starting with R, if I am asking for questions then I apologize in advance. Here's my problem:
I have two data frames, DF1 and DF2, which have different numbers of rows and columns, in two frames there is only one customer (column) called "customer_no" . I want to record merged frames based on "customer_no" and only according to rows in DF2. All data for each client. The frame has multiple rows.
I tried the following:
merged df My questions are:
1) How can I tell the command to leave unmatched columns empty? 2) How can I see a merged file, which row was found? I think that if I solve the above question it should be easy to see it from the empty columns.
I am missing something in my order, but I do not know if the question has been answered somewhere, can you still write it again in English for the beginning of R?
Thank you!
Data example:
df1: customer_no country year 10 UK 2001 10 UK 2002 10 UK 2003 20 US 2007 30 AU 2006 DF 2: customer_on income 10 700 10 800 The merged file should look like this: merged df: customer_no income country year 10 UK 2001 10 UK 2002 10 UK 2003 10 700 10 800 10 900 30 AU 2006 30 1000 So: This column keeps all together; it is based on the same customer_NO after the last one of df2, df2 values And only matches df2 (merged Dd is not customer_no 20). In addition, it empties all other cells.
I add app to STATA but not sure in R ... maybe join?
Thank you !!
Try:
df1 $ id & lt; - Paste (df1 $ customer_no, 1, sep = "_") Df2 $ id & lt; - Paste (df2 $ customer_no, 2, sep = "_") res & lt; - Merge (df1, df2, by = c ('id', 'customer_no'), all = TRUE) [, - 1] race 1 and lieutenant [res $ customer_no %% df2 $ customer_no,] res1 # customer_ a country Year Revenue # 1 10 UK 2001 NA # 2 10 UK 2002 NA # 3 10 UK 2003 NA # 4 10 & LT; NA & gt; NA 700 # 5 10 & gt; NA & gt; NA 800 # 6 10 & gt; NA & gt; NA 900 # 8 30 AU 2006 NA # 9 30 and lieutenant; NA & gt; NA 1000 If you want to change NA to '' , res1 .na (res1)] & lt; - '' # # But, I will leave it as `NA` because the 'numeric' columns are. Or, use data.table from rbindlist (using original dataset) < Code> library (data capable) indx & lt; - df1 $ customer_no% df2 $ customer_no rbindlist (list (df1 [indx,], df2), fill ==] [order (customer_no)] # customer_one country year revenue # 1: 10 UK 2001 NA # 2: 10 UK 2002 NA # 3: 10 UK 2003 NA # 4: 10 NA No. 700 # 5: 10 NA 800 # 6: 10 NA NA 900 # 7: 30 AU 2006 NA # 8: 30 NA NA 1000
Comments
Post a Comment