What is the best VLOOKUP equivalent in RapidMiner?

joshua_gelhaarjoshua_gelhaar Member Posts: 5 Contributor I
edited December 2018 in Help

Hi there,

I'm looking for an equivalent for Excel Vlookup in RapidMiner. I tried it with the Join Operator but it didn't solve my problem.

I have one table with two columns: Id and Person.

"0" ; "Person A"

I have a second table with two columns which show relationships between the persons :

"Person A" ; "Person B"

"Person C" ; "Person X"

Now I want to replace the names of the persons in the second table with just their Id. So it looks like:

"1" ; "0"

"3" ; "25"

I tried it with two different Join operators but it always mixes up the relationships.

 

Hoping for your help.

Greetings

Joshua

Best Answer

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Solution Accepted

    Hi,

     

    Interesting question! What you need is a dictionary that maps id and name values. Then you can use the Replace (Dictionary) operator.

     

    Here is a sample process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Create Id,Name Table" width="90" x="112" y="187">
    <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10; data = data.frame(id = 1:10)&#10; data$Name = paste(&quot;Person&quot;, data$id)&#10;&#10; # connect 2 output ports to see the results&#10; return(list(data))&#10;}&#10;"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="8.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="187">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="id"/>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Create relationships Table" width="90" x="112" y="34">
    <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10; people = paste(&quot;Person&quot;, 1:10)&#10;&#10; data = data.frame(First = sample(people), Second = sample(people))&#10;&#10; # connect 2 output ports to see the results&#10; return(list(data))&#10;}&#10;"/>
    </operator>
    <operator activated="true" class="replace_dictionary" compatibility="8.1.000" expanded="true" height="103" name="Replace (Dictionary)" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value="First"/>
    <parameter key="attributes" value="Second|First"/>
    <parameter key="from_attribute" value="Name"/>
    <parameter key="to_attribute" value="id"/>
    </operator>
    <connect from_op="Create Id,Name Table" from_port="output 1" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Replace (Dictionary)" to_port="dictionary"/>
    <connect from_op="Create relationships Table" from_port="output 1" to_op="Replace (Dictionary)" to_port="example set input"/>
    <connect from_op="Replace (Dictionary)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Remember to convert id attributes to nominal, otherwise the replacement won't work.

     

    Regards,

    Sebastian

Sign In or Register to comment.