Macros in Cost Matrix (Performance)

FBTFBT Member Posts: 106 Unicorn
edited December 2018 in Help

Hallo community,

 

I am trying to run an optimization of a model based on costs for wrong and correct classifications. However, instead of assigning fixed values in the cost matrix, I would like to use macros and loop over my example set to set the required cost values. To give a bit more color, imagine you are trying to make a classification on customer churn and want to assign different cost values for each customer, i.e. the customer's revenue in the past 6 months.

 

Building the logic is not a big problem (at least I believe it isn't), however, the operator "Performance (Costs)" does apparently not accept macros as input values. Is there anything I can do about it, or any other work around?

 

This would be a short sample process based on the Titanic data:

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
<parameter key="attribute_name" value="Survived"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
<list key="function_descriptions">
<parameter key="Scoring" value="(if(Age &lt; 5, 1.1, if(Age &gt; 5 &amp;&amp; Age &lt; 25, 1.2, 1.3)))-1"/>
</list>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="7.6.001" expanded="true" height="103" name="Optimize Parameters (Grid)" width="90" x="514" y="34">
<list key="parameters"/>
<process expanded="true">
<operator activated="true" class="split_validation" compatibility="7.6.001" expanded="true" height="124" name="Validation" width="90" x="246" y="34">
<parameter key="sampling_type" value="stratified sampling"/>
<process expanded="true">
<operator activated="true" class="naive_bayes" compatibility="7.6.001" expanded="true" height="82" name="Naive Bayes" width="90" x="112" y="30"/>
<connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="loop_examples" compatibility="7.6.001" expanded="true" height="103" name="Loop Examples" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="7.6.001" expanded="true" height="68" name="Extract Macro" width="90" x="112" y="34">
<parameter key="macro" value="Scoring"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="Scoring"/>
<parameter key="example_index" value="%{example}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="performance_costs" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="313" y="34">
<parameter key="cost_matrix" value="[0.0 1.0;1.0 0.0]"/>
<enumeration key="class_order_definition">
<parameter key="class_name" value="Yes"/>
<parameter key="class_name" value="No"/>
</enumeration>
</operator>
<connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_port="output 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="average" compatibility="7.6.001" expanded="true" height="82" name="Average" width="90" x="313" y="34"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Loop Examples" to_port="example set"/>
<connect from_op="Loop Examples" from_port="output 1" to_op="Average" to_port="averagable 1"/>
<connect from_op="Average" from_port="average" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Titanic" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

Alternatively, does anybody have an example process for the operator "Performance (User-Based)"? It does not have a tutorial process and I am having a hard time figuring out how exactly it works.

 

 

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi,

     

    you can take Generate Attributes and Extract Performance to get a similar result. Just build a "cost" attribute which is %{churnChurn} for churner who is churn and so on. Afterwards, you extract the average of this as performance.

     

    You are of course halfway through to take a customer-based performance (e.g. his Customer Lifetime Value).

     

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I've been thinking about this, and conceptually I don't believe the performance cost operator can utilize different values for different cases, unless you are literally building a separate model for each case (like inside a Loop Examples, for instance). Since the thing that is being minimized is the misclassification cost across all observations based on different models, if it needed to have a different calculation for each observation, then there would potentially be a different model required.  

     

    Having said that, I am not sure why it wouldn't accept a macro to set the values in the cost matrix---that's a question for the developers, I think.

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • FBTFBT Member Posts: 106 Unicorn

    Thanks @mschmitz! That is exactly what I was looking for and I officially found my new favorite performance operator. :-)

     

    Also thanks to @Telcontar120, you are probably right that my initial workaround proposal is flawed and would not work as intened. Luckily, RM has apparently a great solution for every possible problem.  

Sign In or Register to comment.