Special role attribute in applying model

GottfriedGottfried Member Posts: 17 Maven
edited December 2018 in Help


I notice that the apply operator requires that attributes in the unlabelled dataset to correspond exactly to attributes in the training dataset used for defining the model, even if the attributes have a customized special role. I would have expected that only regular attributes have to match (plus possibly weight attributes) as in my understanding modelling works only with regular attributes, the label attribute and possibly some weight attributes. I would like to know whether this is correct, or if attributes with a customized role impact modelling... If they don't, why does the apply oprator request them? Thanks in advance for your hints!

Best Answer

  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Solution Accepted

    Hi @Gottfried,


    Yes, roles are part of this function signature. No, there is no way for Apply Model to pass through. However, I can give you a tip here.


    • Don't select the values out.
    • If you have an ID role, it will pass. If you don't, you can use Generate ID to generate one.
    • Once you Apply Model, you can join back using the id attribute as key.

    Please, see attached.


    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
      <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          <operator activated="true" class="select_attributes" compatibility="9.0.002" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Age|Sex|Survived|Passenger Class"/>
          <operator activated="true" class="h2o:deep_learning" compatibility="9.0.000" expanded="true" height="82" name="Deep Learning" width="90" x="313" y="34">
            <parameter key="activation" value="Maxout"/>
            <enumeration key="hidden_layer_sizes">
              <parameter key="hidden_layer_sizes" value="50"/>
              <parameter key="hidden_layer_sizes" value="50"/>
            <enumeration key="hidden_dropout_ratios"/>
            <list key="expert_parameters"/>
            <list key="expert_parameters_"/>
          <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Titanic Unlabeled" width="90" x="45" y="187">
            <parameter key="repository_entry" value="//Samples/data/Titanic Unlabeled"/>
          <operator activated="true" class="generate_id" compatibility="9.0.002" expanded="true" height="82" name="Generate ID" width="90" x="179" y="187">
            <parameter key="create_nominal_ids" value="true"/>
          <operator activated="true" class="multiply" compatibility="9.0.002" expanded="true" height="103" name="Multiply" width="90" x="313" y="187"/>
          <operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model" width="90" x="447" y="85">
            <list key="application_parameters"/>
          <operator activated="true" class="concurrency:join" compatibility="9.0.002" expanded="true" height="82" name="Join" width="90" x="581" y="187">
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Deep Learning" to_port="training set"/>
          <connect from_op="Deep Learning" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Retrieve Titanic Unlabeled" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="147"/>
          <portSpacing port="sink_result 2" spacing="0"/>

    Here is a visual representation of what I usually do when I need to score by a few results but want them all. Notice that I'm selecting attributes only for training the deep learning operator, generating the ID's because the Titanic Dataset doesn't have any (but you should double check if your dataset has an ID or not), and then joining the scored results with the rest of the table through these ID's.


    Hope this clarifies it!





  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hello @Gottfried,


    The Apply Model operator requires that your unlabelled data has the same function signature as the labelled one, with the exception of the label. If you trained your data with an id, three regular attributes and a weight attribute, your algorithm would consider ignoring the id and the attributes that aren't part of the model, and matching the regular attributes and consuming the weight attributes if the training algorithm uses these as inputs.


    Since the Apply Model has no logic to know what are the requirements for the models unless these are passed as parameters (and I have found no evidence that these are), it is much easier to ask for the same function signature (as in same names, types and roles), no matter what algorithm you are trying. Now, extra attributes are just ignored and attached to the resulting data.


    A function signature in this context is what a function asks as input and what will it return as output. Let's take an example from the C programming language:


    int sum(int a, int b) {
        return a + b;


    The function signature for this function will always be a and b as an integer. The program, however, won't guarantee that it will work if you pass a floating point number or a string, therefore it will fail. Basically, Apply Model works the same way: when you train a model you generate a function that takes certain data structure, and when you apply that model, you are applying a function to that same data structure (with different values).


    Hope this helps,


  • Options
    GottfriedGottfried Member Posts: 17 Maven
    Thanje for this detailed reply. Yes it helps. If I understand you right, roles are part of this function signature used by Apply and there is no way I can use a role to pass attributes through without them being considered as part of the signature. So I guess the only way is to select them out before applying the model and joining them back with the output of the Apply operator. Is that right?
  • Options
    GottfriedGottfried Member Posts: 17 Maven

    Thanks @rfuentealba,

    That was actually the trick I meant: selecting (attributes, not values) before and joining back (yes, using Id) afterwards. So be it, then. THanks for your help!   

  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Awesome @Gottfried, glad it helped!


    Have fun!


Sign In or Register to comment.