piping in Execute R
Hello everybody,
I am trying to make use of the dplyr package for data manipulation in Execute R in Rapidminer. Every thing almost goes fine except the piping procedure!
I have this chunk of code that does not give rise to any output dataset and I get "Memory buffered file" as the output message. I check the Log file and every thing seems to be working flawlessly! The code reads:
rm_main = function(data)
{
library(datasets)
library(dplyr)
NewData <- (
data %>%
mutate (Product2= Tot_Product*2) %>%
filter(Life_Phase != "Family") %>%
group_by(City, P_PLUS, Life_Phase) %>%
summarise(AVG_Prem= mean(TOT_prem, na.rm=TRUE), AVG_Test= mean(Product2, na.rm=TRUE)) %>%
select(City, P_PLUS,Life_Phase,AVG_Prem, AVG_Test)
)
return(NewData)
}
This code actually works in RStudion.
I apprecitae your help.
Mahdi
Best Answers
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
To build on what @mschmitz said, the Execute R operator allows you plass R objects to another Execute R operator but to output the results into RapidMiner, you have to first convert it to a dataframe inside your function.
0 -
MahdiP Member Posts: 9 Contributor II
Thank you so much for such a quick reply!
It sounds like at the stage of applying group_by function, it messes up and destroy creation of the datafram. it could be simply solved by rendering the final result as dataframe to Rapidminer like;
rm_main = function(data)
{
library(datasets)
library(dplyr)
NewData <-
data.frame(
data %>%
mutate (Product2= Tot_Product*2) %>%
filter(Life_Phase != "Family") %>%
group_by(City, P_PLUS, Life_Phase) %>%
summarise(AVG_Prem= mean(TOT_prem, na.rm=TRUE), AVG_Test= mean(Product2, na.rm=TRUE)) %>%
select(City, P_PLUS,Life_Phase,AVG_Prem, AVG_Test)
)
return(NewData)
}Now it outputs a dataset to Rapidminer.
Thank you so much again.
Mahdi
0
Answers
Dear Mahdi,
RapidMiner can not interprete all types you can do in R. If you sent back something which is not interpretable it gets into RM as a memory buffered file. This cannot be used in RM but piped back into another R operator to be used there. This is very useful e.g. for modelling.
In your case your NewData seems not to be a R dataframe.
~Martin
Dortmund, Germany