Multiple Labels for Binary Classification Problems in one model

rodienne_zammitrodienne_zammit Member Posts: 3 Contributor I
edited February 28 in Help
Hello,

sgenzer I read them to the end!

Current approach: Separate models for different binary labels
I have prepared a decision tree model which correctly predicts a binary label for product A when Product A is used as the label.

For product B, I re-run the process to train a similar model when B is the label.

Then I would need to train another model to predict product C, and use C as a label. This goes on as in reality I have more products.

Desired approach: One model to predict different binary labels
Is there a way I can combine this into one model so that the model can tell me the binary predictions (true/false) for Product A, B and C in one go? This would be ideal when applying the model on new data so that I don't need to run all separate product models. 

I tried to use "loop label" however this loops on the labels to create different models, and I did not find a way of how to use the models created to apply them to new data. I did not find a way how I could loop label on new data to apply "loop model" (this deosn't exist).

Maybe I could achieve this by combining the different binary classification values into one value? 

Appreciate feedback on how it is best to implement this problem.

Thank you!
sgenzer

Best Answer

  • rodienne_zammitrodienne_zammit Posts: 3 Contributor I
    edited March 2 Solution Accepted
    Thanks a lot @mschmitz for putting me on the right track. I looked into Polynominal by Binominal classification but I didn't manage to get what I want with it. 

    There might be other ways of doing this, but ..

    I got the desired approach by looping on the product attributes using "Loop Attributes", this gives a macro name to the label, then inside the loop I set the field %{loop_attribute} as the label, and saved the model using the product name in the file name of the output, for example, save model as "C:\Documents\%{loop_attribute}.mod. I also used the "Annotate" operator with the performance and model output so that I can refer to the Annotation on the results and know which product the performance relates to.

    To read and apply models on new data I used again the "Loop Label" and set the role of the product inside the loop and read the model from the file by using the macro value %{loop_label}.  Again applying Annotate to the performance output helps me recognise which performance I am looking at.

    XML sample for reading below:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">  <context>    <input/>    <output/>    <macros/>  </context>  <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">    <parameter key="logverbosity" value="init"/>    <parameter key="random_seed" value="2001"/>    <parameter key="send_mail" value="never"/>    <parameter key="notification_email" value=""/>    <parameter key="process_duration_for_mail" value="30"/>    <parameter key="encoding" value="SYSTEM"/>    <process expanded="true">      <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Products" width="90" x="112" y="34">        <parameter key="repository_entry" value="//Samples/data/Products"/>      </operator>      <operator activated="true" class="concurrency:loop_attributes" compatibility="9.2.000" expanded="true" height="103" name="Loop Attributes" width="90" x="313" y="34">        <parameter key="attribute_filter_type" value="subset"/>        <parameter key="attribute" value=""/>        <parameter key="attributes" value="Product ID"/>        <parameter key="use_except_expression" value="false"/>        <parameter key="value_type" value="attribute_value"/>        <parameter key="use_value_type_exception" value="false"/>        <parameter key="except_value_type" value="time"/>        <parameter key="block_type" value="attribute_block"/>        <parameter key="use_block_type_exception" value="false"/>        <parameter key="except_block_type" value="value_matrix_row_start"/>        <parameter key="invert_selection" value="false"/>        <parameter key="include_special_attributes" value="false"/>        <parameter key="attribute_name_macro" value="loop_attribute"/>        <parameter key="reuse_results" value="false"/>        <parameter key="enable_parallel_execution" value="true"/>        <process expanded="true">          <operator activated="true" class="set_role" compatibility="9.2.000" expanded="true" height="82" name="Set Role" width="90" x="45" y="34">            <parameter key="attribute_name" value="%{loop_attribute}"/>            <parameter key="target_role" value="label"/>            <list key="set_additional_roles"/>          </operator>          <operator activated="true" class="legacy:read_model" compatibility="9.2.000" expanded="true" height="68" name="Read Model" width="90" x="45" y="136">            <parameter key="model_file" value="%{loop_attribute}_NewFeatures.mod"/>          </operator>          <operator activated="true" class="apply_model" compatibility="9.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="179" y="85">            <list key="application_parameters"/>            <parameter key="create_view" value="false"/>          </operator>          <operator activated="true" class="annotate" compatibility="9.2.000" expanded="true" height="68" name="Annotate" width="90" x="313" y="85">            <list key="annotations">              <parameter key="Product" value="%{loop_attribute}"/>            </list>            <parameter key="duplicate_annotations" value="overwrite"/>          </operator>          <operator activated="true" class="performance_binominal_classification" compatibility="9.2.000" expanded="true" height="82" name="Performance (Test Set)" width="90" x="447" y="85">            <parameter key="main_criterion" value="first"/>            <parameter key="accuracy" value="true"/>            <parameter key="classification_error" value="false"/>            <parameter key="kappa" value="false"/>            <parameter key="AUC (optimistic)" value="false"/>            <parameter key="AUC" value="false"/>            <parameter key="AUC (pessimistic)" value="false"/>            <parameter key="precision" value="false"/>            <parameter key="recall" value="false"/>            <parameter key="lift" value="false"/>            <parameter key="fallout" value="false"/>            <parameter key="f_measure" value="false"/>            <parameter key="false_positive" value="false"/>            <parameter key="false_negative" value="false"/>            <parameter key="true_positive" value="false"/>            <parameter key="true_negative" value="false"/>            <parameter key="sensitivity" value="false"/>            <parameter key="specificity" value="false"/>            <parameter key="youden" value="false"/>            <parameter key="positive_predictive_value" value="false"/>            <parameter key="negative_predictive_value" value="false"/>            <parameter key="psep" value="false"/>            <parameter key="skip_undefined_labels" value="true"/>            <parameter key="use_example_weights" value="true"/>          </operator>          <operator activated="true" class="annotate" compatibility="9.2.000" expanded="true" height="68" name="Annotate (2)" width="90" x="581" y="34">            <list key="annotations">              <parameter key="Product" value="%{loop_attribute}"/>            </list>            <parameter key="duplicate_annotations" value="overwrite"/>          </operator>          <connect from_port="input 1" to_op="Set Role" to_port="example set input"/>          <connect from_op="Set Role" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>          <connect from_op="Read Model" from_port="output" to_op="Apply Model (2)" to_port="model"/>          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Annotate" to_port="input"/>          <connect from_op="Annotate" from_port="output" to_op="Performance (Test Set)" to_port="labelled data"/>          <connect from_op="Performance (Test Set)" from_port="performance" to_op="Annotate (2)" to_port="input"/>          <connect from_op="Performance (Test Set)" from_port="example set" to_port="output 2"/>          <connect from_op="Annotate (2)" from_port="output" to_port="output 1"/>          <portSpacing port="source_input 1" spacing="0"/>          <portSpacing port="source_input 2" spacing="0"/>          <portSpacing port="sink_output 1" spacing="0"/>          <portSpacing port="sink_output 2" spacing="0"/>          <portSpacing port="sink_output 3" spacing="0"/>        </process>        <description align="center" color="transparent" colored="false" width="126">I looped on attribute because all my products were in a different column</description>      </operator>      <connect from_op="Retrieve Products" from_port="output" to_op="Loop Attributes" to_port="input 1"/>      <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>      <connect from_op="Loop Attributes" from_port="output 2" to_port="result 2"/>      <portSpacing port="source_input 1" spacing="0"/>      <portSpacing port="sink_result 1" spacing="0"/>      <portSpacing port="sink_result 2" spacing="0"/>      <portSpacing port="sink_result 3" spacing="0"/>    </process>  </operator></process>

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,061  RM Data Scientist
    Hi,
    did you have a look at Polynominal by Binominal classification? otherwise you can build something with Loop Values.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    rodienne_zammit
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,351  Community Manager
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 327   Unicorn

    I'm happy that you could find an answer. I have some questions about the use case: Why do you need individual classifiers for each product? Is it possible for a sample to be a member of more than one product cathegory?

    Regards,
    Sebastian

Sign In or Register to comment.