RapidMiner

Learner II joshua_gelhaar
Learner II

What is the best VLOOKUP equivalent in RapidMiner?

Hi there,

I'm looking for an equivalent for Excel Vlookup in RapidMiner. I tried it with the Join Operator but it didn't solve my problem.

I have one table with two columns: Id and Person.

"0" ; "Person A"

I have a second table with two columns which show relationships between the persons :

"Person A" ; "Person B"

"Person C" ; "Person X"

Now I want to replace the names of the persons in the second table with just their Id. So it looks like:

"1" ; "0"

"3" ; "25"

I tried it with two different Join operators but it always mixes up the relationships.

 

Hoping for your help.

Greetings

Joshua

1 REPLY
Highlighted
RM Staff
RM Staff

Re: What is the best VLOOKUP equivalent in RapidMiner?

Hi,

 

Interesting question! What you need is a dictionary that maps id and name values. Then you can use the Replace (Dictionary) operator.

 

Here is a sample process:

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Create Id,Name Table" width="90" x="112" y="187">
        <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10;    data = data.frame(id = 1:10)&#10;    data$Name = paste(&quot;Person&quot;, data$id)&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data))&#10;}&#10;"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="8.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="187">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="id"/>
      </operator>
      <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Create relationships Table" width="90" x="112" y="34">
        <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10;    people = paste(&quot;Person&quot;, 1:10)&#10;&#10;    data = data.frame(First = sample(people), Second = sample(people))&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data))&#10;}&#10;"/>
      </operator>
      <operator activated="true" class="replace_dictionary" compatibility="8.1.000" expanded="true" height="103" name="Replace (Dictionary)" width="90" x="581" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value="First"/>
        <parameter key="attributes" value="Second|First"/>
        <parameter key="from_attribute" value="Name"/>
        <parameter key="to_attribute" value="id"/>
      </operator>
      <connect from_op="Create Id,Name Table" from_port="output 1" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Replace (Dictionary)" to_port="dictionary"/>
      <connect from_op="Create relationships Table" from_port="output 1" to_op="Replace (Dictionary)" to_port="example set input"/>
      <connect from_op="Replace (Dictionary)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Remember to convert id attributes to nominal, otherwise the replacement won't work.

 

Regards,

Sebastian

"5 Minutes with Ingo" - Season 2