RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Set condition for cross distance

online360online360 Member Posts: 34 Contributor I
edited November 2018 in Help
Dear everyone!

As you may have seen (http://rapid-i.com/rapidforum/index.php/topic,9557.0.html), I'm working on a process that matches similar products.
Well, currently I use several text fields to calculate the distance like "manufacturer", "class", "article number", "longtext" and so on.

What I now want to do to receive even better results is to say "Ok cross distance operater: please only match those products that belong into the same category and are from the same brand".
Is that possible?

At the moment I just set k to "10" in cross distance operator, but if for example there is only a short description, a product might also get associated with a similar product, but from another brand.

The result should still be 10 matching products per product, but only if the category and the brand are the same.
In addition, I might also set a filter of how many percent a product has to be similar.

Thanks!

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,408  RM Data Scientist
    Hi,

    what about adding a loop value around?

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • online360online360 Member Posts: 34 Contributor I
    Hi!

    Please excuse my late reply, I've been trying many versions of the process but unfortunately, I didn't receive what I wanted.

    I added a loop to the process but the documents column also shows ids that are in other categories or from other manufacturers.
    Do I understand correctly that when using the "loop attributes" operator, it executes a subprocess again and again for each set of data that has the same value in the specified attribute?

    Any idea what's wrong with the following code? (Just to be sure: I want the process only to get the similar documents / ids that are in the same category as well as from the same manufacturer):
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve t123_product_import_23032016" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/data/t123_product_import_23032016"/>
          </operator>
          <operator activated="false" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples (3)" width="90" x="45" y="136">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="etim.equals.EC000011"/>
            </list>
          </operator>
          <operator activated="true" class="sample" compatibility="7.0.001" expanded="true" height="82" name="Sample" width="90" x="179" y="34">
            <list key="sample_size_per_class"/>
            <list key="sample_ratio_per_class"/>
            <list key="sample_probability_per_class"/>
          </operator>
          <operator activated="false" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="179" y="136">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="etim.equals.EC001909"/>
            </list>
            <parameter key="filters_logic_and" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="7.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="description"/>
            <parameter key="attributes" value="sku|description|etim|manufacturer|teg_prodnumber|short_description"/>
          </operator>
          <operator activated="true" class="trim" compatibility="7.0.001" expanded="true" height="82" name="Trim" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="etim"/>
          </operator>
          <operator activated="false" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="136">
            <parameter key="attribute_name" value="etim"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="7.0.001" expanded="true" height="82" name="Loop Attributes" width="90" x="179" y="289">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="etim"/>
            <parameter key="attributes" value="etim|manufacturer"/>
            <process expanded="true">
              <operator activated="true" class="nominal_to_text" compatibility="7.0.001" expanded="true" height="82" name="Nominal to Text (3)" width="90" x="246" y="289"/>
              <operator activated="true" class="text:process_document_from_data" compatibility="7.0.000" expanded="true" height="82" name="Process Documents from Data (3)" width="90" x="380" y="391">
                <parameter key="keep_text" value="true"/>
                <parameter key="prune_method" value="absolute"/>
                <parameter key="prune_below_absolute" value="2"/>
                <parameter key="prune_above_absolute" value="9999"/>
                <list key="specify_weights"/>
                <process expanded="true">
                  <operator activated="true" class="text:tokenize" compatibility="7.0.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="85"/>
                  <operator activated="true" class="text:transform_cases" compatibility="7.0.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="246" y="85"/>
                  <operator activated="true" class="text:filter_stopwords_german" compatibility="7.0.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="380" y="85"/>
                  <operator activated="true" class="text:stem_snowball" compatibility="7.0.000" expanded="true" height="68" name="Stem (3)" width="90" x="581" y="85">
                    <parameter key="language" value="German"/>
                  </operator>
                  <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
                  <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (3)" to_port="document"/>
                  <connect from_op="Transform Cases (3)" from_port="document" to_op="Filter Stopwords (3)" to_port="document"/>
                  <connect from_op="Filter Stopwords (3)" from_port="document" to_op="Stem (3)" to_port="document"/>
                  <connect from_op="Stem (3)" from_port="document" to_port="document 1"/>
                  <portSpacing port="source_document" spacing="0"/>
                  <portSpacing port="sink_document 1" spacing="0"/>
                  <portSpacing port="sink_document 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="103" name="Multiply (3)" width="90" x="514" y="391"/>
              <operator activated="true" class="cross_distances" compatibility="7.0.001" expanded="true" height="103" name="Cross Distances (3)" width="90" x="648" y="391">
                <parameter key="measure_types" value="NumericalMeasures"/>
                <parameter key="only_top_k" value="true"/>
                <parameter key="k" value="11"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples (4)" width="90" x="715" y="646">
                <parameter key="parameter_expression" value="request!=document"/>
                <parameter key="condition_class" value="expression"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role (3)" width="90" x="849" y="391">
                <parameter key="attribute_name" value="request"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles">
                  <parameter key="document" value="label"/>
                </list>
              </operator>
              <operator activated="true" class="numerical_to_polynominal" compatibility="7.0.001" expanded="true" height="82" name="Numerical to Polynominal (3)" width="90" x="983" y="391">
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="aggregate" compatibility="7.0.001" expanded="true" height="82" name="Aggregate (3)" width="90" x="1184" y="340">
                <parameter key="include_special_attributes" value="true"/>
                <parameter key="default_aggregation_function" value="concatenation"/>
                <list key="aggregation_attributes">
                  <parameter key="document" value="concatenation"/>
                </list>
                <parameter key="group_by_attributes" value="request"/>
              </operator>
              <connect from_port="example set" to_op="Nominal to Text (3)" to_port="example set input"/>
              <connect from_op="Nominal to Text (3)" from_port="example set output" to_op="Process Documents from Data (3)" to_port="example set"/>
              <connect from_op="Process Documents from Data (3)" from_port="example set" to_op="Multiply (3)" to_port="input"/>
              <connect from_op="Multiply (3)" from_port="output 1" to_op="Cross Distances (3)" to_port="request set"/>
              <connect from_op="Multiply (3)" from_port="output 2" to_op="Cross Distances (3)" to_port="reference set"/>
              <connect from_op="Cross Distances (3)" from_port="result set" to_op="Filter Examples (4)" to_port="example set input"/>
              <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
              <connect from_op="Set Role (3)" from_port="example set output" to_op="Numerical to Polynominal (3)" to_port="example set input"/>
              <connect from_op="Numerical to Polynominal (3)" from_port="example set output" to_op="Aggregate (3)" to_port="example set input"/>
              <connect from_op="Aggregate (3)" from_port="example set output" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="false" class="loop_labels" compatibility="7.0.001" expanded="true" height="82" name="Loop Labels" width="90" x="313" y="391">
            <process expanded="true">
              <operator activated="false" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
                <parameter key="parameter_string" value="EC000138"/>
                <parameter key="parameter_expression" value="EC000133"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="nominal_to_text" compatibility="7.0.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="45" y="187"/>
              <operator activated="true" class="text:process_document_from_data" compatibility="7.0.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="179" y="289">
                <parameter key="keep_text" value="true"/>
                <parameter key="prune_method" value="absolute"/>
                <parameter key="prune_below_absolute" value="2"/>
                <parameter key="prune_above_absolute" value="9999"/>
                <list key="specify_weights"/>
                <process expanded="true">
                  <operator activated="true" class="text:tokenize" compatibility="7.0.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="85"/>
                  <operator activated="true" class="text:transform_cases" compatibility="7.0.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="85"/>
                  <operator activated="true" class="text:filter_stopwords_german" compatibility="7.0.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="380" y="85"/>
                  <operator activated="true" class="text:stem_snowball" compatibility="7.0.000" expanded="true" height="68" name="Stem (2)" width="90" x="581" y="85">
                    <parameter key="language" value="German"/>
                  </operator>
                  <connect from_port="document" to_op="Tokenize" to_port="document"/>
                  <connect from_op="Tokenize" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
                  <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
                  <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
                  <connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
                  <portSpacing port="source_document" spacing="0"/>
                  <portSpacing port="sink_document 1" spacing="0"/>
                  <portSpacing port="sink_document 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="289"/>
              <operator activated="true" class="cross_distances" compatibility="7.0.001" expanded="true" height="103" name="Cross Distances (2)" width="90" x="447" y="289">
                <parameter key="measure_types" value="NumericalMeasures"/>
                <parameter key="only_top_k" value="true"/>
                <parameter key="k" value="11"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples (5)" width="90" x="514" y="544">
                <parameter key="parameter_expression" value="request!=document"/>
                <parameter key="condition_class" value="expression"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role (2)" width="90" x="648" y="289">
                <parameter key="attribute_name" value="request"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles">
                  <parameter key="document" value="label"/>
                </list>
              </operator>
              <operator activated="true" class="numerical_to_polynominal" compatibility="7.0.001" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="782" y="289">
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="false" class="aggregate" compatibility="7.0.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="1050" y="238">
                <parameter key="include_special_attributes" value="true"/>
                <parameter key="default_aggregation_function" value="concatenation"/>
                <list key="aggregation_attributes">
                  <parameter key="document" value="concatenation"/>
                </list>
                <parameter key="group_by_attributes" value="request"/>
              </operator>
              <operator activated="false" class="normalize" compatibility="7.0.001" expanded="true" height="103" name="Normalize (2)" width="90" x="380" y="85">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="distance"/>
              </operator>
              <connect from_port="example set" to_op="Nominal to Text (2)" to_port="example set input"/>
              <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
              <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Multiply (2)" to_port="input"/>
              <connect from_op="Multiply (2)" from_port="output 1" to_op="Cross Distances (2)" to_port="request set"/>
              <connect from_op="Multiply (2)" from_port="output 2" to_op="Cross Distances (2)" to_port="reference set"/>
              <connect from_op="Cross Distances (2)" from_port="result set" to_op="Filter Examples (5)" to_port="example set input"/>
              <connect from_op="Filter Examples (5)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
              <connect from_op="Set Role (2)" from_port="example set output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
              <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="store" compatibility="7.0.001" expanded="true" height="68" name="Store (2)" width="90" x="447" y="238">
            <parameter key="repository_entry" value="//Local Repository/data/t123_data_similarity_29032016"/>
          </operator>
          <operator activated="false" class="loop_attributes" compatibility="7.0.001" expanded="true" height="82" name="Loop Attributes (2)" width="90" x="112" y="391">
            <process expanded="true">
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="false" class="loop_attribute_subsets" compatibility="7.0.001" expanded="true" height="68" name="Loop Subsets" width="90" x="447" y="544">
            <process expanded="true">
              <portSpacing port="source_example set" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve t123_product_import_23032016" from_port="output" to_op="Sample" to_port="example set input"/>
          <connect from_op="Sample" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Trim" to_port="example set input"/>
          <connect from_op="Trim" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="example set" to_op="Store (2)" to_port="input"/>
          <connect from_op="Store (2)" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Thanks!
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,408  RM Data Scientist
    Hey,

    have a look on this process, this is how I would do it. Of course you can do your process documents etc. in the loop.

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="7.0.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34">
            <description align="center" color="transparent" colored="false" width="126">For Later Joins</description>
          </operator>
          <operator activated="true" class="loop_values" compatibility="7.0.001" expanded="true" height="82" name="Loop Values" width="90" x="514" y="34">
            <parameter key="attribute" value="Passenger Class"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="85">
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="Passenger Class.equals.%{loop_value}"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Filter for the current value</description>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="124" name="Multiply (2)" width="90" x="179" y="238">
                <description align="center" color="transparent" colored="false" width="126">Split to join the unnormalized data afterwards</description>
              </operator>
              <operator activated="true" class="normalize" compatibility="7.0.001" expanded="true" height="103" name="Normalize" width="90" x="313" y="34">
                <description align="center" color="transparent" colored="false" width="126">Normalize because of distance</description>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
              <operator activated="true" class="cross_distances" compatibility="7.0.001" expanded="true" height="103" name="Cross Distances" width="90" x="581" y="34">
                <parameter key="only_top_k" value="true"/>
                <parameter key="k" value="3"/>
                <description align="center" color="transparent" colored="false" width="126">Only take the 3 closest</description>
              </operator>
              <operator activated="true" class="join" compatibility="7.0.001" expanded="true" height="82" name="Join" width="90" x="715" y="187">
                <parameter key="remove_double_attributes" value="false"/>
                <parameter key="use_id_attribute_as_key" value="false"/>
                <list key="key_attributes">
                  <parameter key="document" value="id"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Join back for interpretability</description>
              </operator>
              <operator activated="true" class="join" compatibility="7.0.001" expanded="true" height="82" name="Join (2)" width="90" x="916" y="289">
                <parameter key="remove_double_attributes" value="false"/>
                <parameter key="use_id_attribute_as_key" value="false"/>
                <list key="key_attributes">
                  <parameter key="request" value="id"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Join the other back for interpretability</description>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
              <connect from_op="Multiply (2)" from_port="output 1" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Multiply (2)" from_port="output 2" to_op="Join" to_port="right"/>
              <connect from_op="Multiply (2)" from_port="output 3" to_op="Join (2)" to_port="right"/>
              <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Cross Distances" to_port="request set"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Cross Distances" to_port="reference set"/>
              <connect from_op="Cross Distances" from_port="result set" to_op="Join" to_port="left"/>
              <connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
              <connect from_op="Join (2)" from_port="join" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Loop values is iterating over a nominal attribute in this case a class. It is giving you the current value of this attribute as a macro. This macro can be used internally&lt;br/&gt;&lt;br/&gt;The result is given as a collection. This can be reduced to an example set with Append or be used with Loop Collection</description>
          </operator>
          <connect from_op="Retrieve Titanic" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • online360online360 Member Posts: 34 Contributor I
    Hi Martin!

    Thanks for your support again,
    I made a few changes to the process to further meet my needs:
    ., disabled "Generate ID" as this is already set in the example set
    ., sampled the data (to run the process faster for testing purposes)
    ., added "filter examples" after "cross distance" to filter out those where request and document are the same (using a regex)
    ., aggregated everything

    The only problem remaining is that when I use "manufacturer" to loop, that categories might not be the same in request and document.
    Is the only way to filter for those results that have the same category in request and document to add a filter after join (2)? (This would reduce the number of results of course; I'd prefer setting that filter before "cross distance" calculated the X nearest)

    Is it possible to merge the resulting example sets into one?
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="false" class="generate_id" compatibility="7.0.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34">
            <description align="center" color="transparent" colored="false" width="126">For Later Joins</description>
          </operator>
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve t123_product_import_23032016" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/data/t123_product_import_23032016"/>
          </operator>
          <operator activated="true" class="sample" compatibility="7.0.001" expanded="true" height="82" name="Sample" width="90" x="313" y="34">
            <list key="sample_size_per_class"/>
            <list key="sample_ratio_per_class"/>
            <list key="sample_probability_per_class"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="7.0.001" expanded="true" height="82" name="Loop Values" width="90" x="514" y="34">
            <parameter key="attribute" value="manufacturer"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="85">
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="manufacturer.equals.%{loop_value}"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Filter for the current value</description>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="124" name="Multiply (2)" width="90" x="179" y="238">
                <description align="center" color="transparent" colored="false" width="126">Split to join the unnormalized data afterwards</description>
              </operator>
              <operator activated="true" class="normalize" compatibility="7.0.001" expanded="true" height="103" name="Normalize" width="90" x="313" y="34">
                <description align="center" color="transparent" colored="false" width="126">Normalize because of distance</description>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
              <operator activated="true" class="cross_distances" compatibility="7.0.001" expanded="true" height="103" name="Cross Distances" width="90" x="581" y="34">
                <parameter key="only_top_k" value="true"/>
                <parameter key="k" value="5"/>
                <description align="center" color="transparent" colored="false" width="126">Only take the 3 closest</description>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="715" y="34">
                <parameter key="parameter_expression" value="request!=document"/>
                <parameter key="condition_class" value="expression"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="join" compatibility="7.0.001" expanded="true" height="82" name="Join" width="90" x="782" y="187">
                <parameter key="remove_double_attributes" value="false"/>
                <parameter key="use_id_attribute_as_key" value="false"/>
                <list key="key_attributes">
                  <parameter key="document" value="sku"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Join back for interpretability</description>
              </operator>
              <operator activated="true" class="join" compatibility="7.0.001" expanded="true" height="82" name="Join (2)" width="90" x="916" y="289">
                <parameter key="remove_double_attributes" value="false"/>
                <parameter key="use_id_attribute_as_key" value="false"/>
                <list key="key_attributes">
                  <parameter key="request" value="sku"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Join the other back for interpretability</description>
              </operator>
              <operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role (3)" width="90" x="1050" y="340">
                <parameter key="attribute_name" value="request"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles">
                  <parameter key="document" value="label"/>
                </list>
              </operator>
              <operator activated="true" class="numerical_to_polynominal" compatibility="7.0.001" expanded="true" height="82" name="Numerical to Polynominal (3)" width="90" x="1184" y="340">
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="aggregate" compatibility="7.0.001" expanded="true" height="82" name="Aggregate (3)" width="90" x="1385" y="289">
                <parameter key="include_special_attributes" value="true"/>
                <parameter key="default_aggregation_function" value="concatenation"/>
                <list key="aggregation_attributes">
                  <parameter key="document" value="concatenation"/>
                </list>
                <parameter key="group_by_attributes" value="request"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
              <connect from_op="Multiply (2)" from_port="output 1" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Multiply (2)" from_port="output 2" to_op="Join" to_port="right"/>
              <connect from_op="Multiply (2)" from_port="output 3" to_op="Join (2)" to_port="right"/>
              <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Cross Distances" to_port="request set"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Cross Distances" to_port="reference set"/>
              <connect from_op="Cross Distances" from_port="result set" to_op="Filter Examples (2)" to_port="example set input"/>
              <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Join" to_port="left"/>
              <connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
              <connect from_op="Join (2)" from_port="join" to_op="Set Role (3)" to_port="example set input"/>
              <connect from_op="Set Role (3)" from_port="example set output" to_op="Numerical to Polynominal (3)" to_port="example set input"/>
              <connect from_op="Numerical to Polynominal (3)" from_port="example set output" to_op="Aggregate (3)" to_port="example set input"/>
              <connect from_op="Aggregate (3)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Loop values is iterating over a nominal attribute in this case a class. It is giving you the current value of this attribute as a macro. This macro can be used internally&lt;br/&gt;&lt;br/&gt;The result is given as a collection. This can be reduced to an example set with Append or be used with Loop Collection</description>
          </operator>
          <connect from_op="Retrieve t123_product_import_23032016" from_port="output" to_op="Sample" to_port="example set input"/>
          <connect from_op="Sample" from_port="example set output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • online360online360 Member Posts: 34 Contributor I
    Regarding "merging the example sets":
    This seems to be easier then I thought by just adding the "append" operator:
    https://rapid-i.com/rapidforum/index.php?topic=4510.0
Sign In or Register to comment.