What is the best VLOOKUP equivalent in RapidMiner?
Hi there,
I'm looking for an equivalent for Excel Vlookup in RapidMiner. I tried it with the Join Operator but it didn't solve my problem.
I have one table with two columns: Id and Person.
"0" ; "Person A"
I have a second table with two columns which show relationships between the persons :
"Person A" ; "Person B"
"Person C" ; "Person X"
Now I want to replace the names of the persons in the second table with just their Id. So it looks like:
"1" ; "0"
"3" ; "25"
I tried it with two different Join operators but it always mixes up the relationships.
Hoping for your help.
Greetings
Joshua
Best Answer
-
SGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
Hi,
Interesting question! What you need is a dictionary that maps id and name values. Then you can use the Replace (Dictionary) operator.
Here is a sample process:
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Create Id,Name Table" width="90" x="112" y="187">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function() { data = data.frame(id = 1:10) data$Name = paste("Person", data$id) # connect 2 output ports to see the results return(list(data)) } "/>
</operator>
<operator activated="true" class="numerical_to_polynominal" compatibility="8.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="187">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Create relationships Table" width="90" x="112" y="34">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function() { people = paste("Person", 1:10) data = data.frame(First = sample(people), Second = sample(people)) # connect 2 output ports to see the results return(list(data)) } "/>
</operator>
<operator activated="true" class="replace_dictionary" compatibility="8.1.000" expanded="true" height="103" name="Replace (Dictionary)" width="90" x="581" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="First"/>
<parameter key="attributes" value="Second|First"/>
<parameter key="from_attribute" value="Name"/>
<parameter key="to_attribute" value="id"/>
</operator>
<connect from_op="Create Id,Name Table" from_port="output 1" to_op="Numerical to Polynominal" to_port="example set input"/>
<connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Replace (Dictionary)" to_port="dictionary"/>
<connect from_op="Create relationships Table" from_port="output 1" to_op="Replace (Dictionary)" to_port="example set input"/>
<connect from_op="Replace (Dictionary)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Remember to convert id attributes to nominal, otherwise the replacement won't work.
Regards,
Sebastian
3