ANNOUNCEMENT: RAPIDMINER 9.1 BETA HAS BEEN RELEASED TODAY!   PLEASE DOWNLOAD AND GIVE FEEDBACK. ENJOY AND HAPPY RAPIDMINING!   -- @sgenzer – Community Manager

AUTOMODEL K-MEANS GRAPHIC RESULT PROBLEMS

zhuoyinzhuoyin Member Posts: 7 Contributor I
edited November 10 in Help

Answers

  • zhuoyinzhuoyin Member Posts: 7 Contributor I
    1. Somethin is "on average larger/smaller " than what?
    2. How to explain "on average"?on average of what?
    3. How to calculate the percentage numbers? what is the formula?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager Posts: 1,832  Community Manager

    tagging @IngoRM

     

    Scott

     

  • IngoRMIngoRM Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 1,406  RM Founder

    Hi,

     

    When it says "X is on average Y% higher" here is what it means:

     

    The average value for feature / attribute X for the examples / cases in the cluster is Y% higher than the average values of feature X for the examples which are not in the cluster.

     

    BTW, those statements are a textual summary of the top 3 contributors as shown in the Heat Map chart.

     

    Hope this helps,

    Ingo

     

     

    sgenzer
  • zhuoyinzhuoyin Member Posts: 7 Contributor I
    ."..Y higher than the features". The features are belongs to the original samples I entered, right? but it seems not work.
    2.PNG 42.2K
  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    i have tried the sample markt data. not working

    2.PNG 40.4K
  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    2.PNG

    How to understand this result?

  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    HOW TO CALCULATE THE AVERAGE?

  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    tagging @IngoRM

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 562   Unicorn

    Try to put your data into a process like this.  As you can see I added an aggregation step to the automodel results so we can see some additional averages. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" automodel="EXPORTED" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Data" width="90" x="45" y="85">
    <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
    <description align="center" color="transparent" colored="false" width="126">Load data.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Preprocessing" width="90" x="179" y="85">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Define Target?" width="90" x="45" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Define Target" width="90" x="45" y="34">
    <parameter key="attribute_name" value="Survived"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">Define the target column for the predictive model.</description>
    </operator>
    <connect from_port="input 1" to_op="Define Target" to_port="example set input"/>
    <connect from_op="Define Target" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should define a target column?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Should Discretize?" width="90" x="179" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="discretize_by_bins" compatibility="8.1.001" expanded="true" height="103" name="Binning" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Age"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="range_name_type" value="short"/>
    <description align="center" color="transparent" colored="false" width="126">Discretize by binning (same range per bin).</description>
    </operator>
    <connect from_port="input 1" to_op="Binning" to_port="example set input"/>
    <connect from_op="Binning" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="discretize_by_frequency" compatibility="8.1.001" expanded="true" height="103" name="Frequency" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Age"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="range_name_type" value="short"/>
    <description align="center" color="transparent" colored="false" width="126">Discretize by frequency (same count per bin).</description>
    </operator>
    <connect from_port="input 1" to_op="Frequency" to_port="example set input"/>
    <connect from_op="Frequency" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should discretize numerical target column?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Map Values?" width="90" x="313" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="map" compatibility="8.1.001" expanded="true" height="82" name="Map Values" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Survived"/>
    <parameter key="include_special_attributes" value="true"/>
    <list key="value_mappings"/>
    <description align="center" color="transparent" colored="false" width="126">Map some nominal target values to new values.</description>
    </operator>
    <connect from_port="input 1" to_op="Map Values" to_port="example set input"/>
    <connect from_op="Map Values" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should map nominal values?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Positive Class?" width="90" x="447" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="nominal_to_binominal" compatibility="8.1.001" expanded="true" height="103" name="Nominal to Binominal" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Survived"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Make sure that target is binary for positive class mapping.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="remap_binominals" compatibility="8.1.001" expanded="true" height="82" name="Define Positive Class" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Survived"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="negative_value" value="No"/>
    <parameter key="positive_value" value="Yes"/>
    <description align="center" color="transparent" colored="false" width="126">Potentially define which one should be the positive class.</description>
    </operator>
    <connect from_port="input 1" to_op="Nominal to Binominal" to_port="example set input"/>
    <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Define Positive Class" to_port="example set input"/>
    <connect from_op="Define Positive Class" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should define positive class?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Remove Columns?" width="90" x="581" y="34">
    <parameter key="select_which" value="2"/>
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Remove Columns" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="\Qlabel\E"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Potentially remove columns.</description>
    </operator>
    <connect from_port="input 1" to_op="Remove Columns" to_port="example set input"/>
    <connect from_op="Remove Columns" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should remove columns?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Unify Value Types" width="90" x="715" y="34">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Remove Dates" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="date_time"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Remove all date columns.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Transform all nominal columns to text so that we make sure that all will have polynominal type after the next transformation.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="text_to_nominal" compatibility="8.1.001" expanded="true" height="82" name="Text to Nominal" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Transform all text columns into polynominal columns.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="numerical_to_real" compatibility="8.1.001" expanded="true" height="82" name="Numerical to Real" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="use_value_type_exception" value="true"/>
    <parameter key="except_value_type" value="integer"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Turn all numerical columns (not integers though) into real columns.</description>
    </operator>
    <connect from_port="in 1" to_op="Remove Dates" to_port="example set input"/>
    <connect from_op="Remove Dates" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/>
    <connect from_op="Text to Nominal" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
    <connect from_op="Numerical to Real" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Unify all value types</description>
    </operator>
    <connect from_port="in 1" to_op="Define Target?" to_port="input 1"/>
    <connect from_op="Define Target?" from_port="output 1" to_op="Should Discretize?" to_port="input 1"/>
    <connect from_op="Should Discretize?" from_port="output 1" to_op="Map Values?" to_port="input 1"/>
    <connect from_op="Map Values?" from_port="output 1" to_op="Positive Class?" to_port="input 1"/>
    <connect from_op="Positive Class?" from_port="output 1" to_op="Remove Columns?" to_port="input 1"/>
    <connect from_op="Remove Columns?" from_port="output 1" to_op="Unify Value Types" to_port="in 1"/>
    <connect from_op="Unify Value Types" from_port="out 1" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">All general preprocessing steps happen inside this operator - double click on it to see the details.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Replace Missing Values" width="90" x="313" y="85">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Nominal Missings" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="default" value="value"/>
    <list key="columns"/>
    <parameter key="replenishment_value" value="MISSING"/>
    <description align="center" color="transparent" colored="false" width="126">Replace nominal missings with the word 'missing'.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="replace_infinite_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Pos Infinite Values" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="default" value="missing"/>
    <list key="columns"/>
    <description align="center" color="transparent" colored="false" width="126">Replace positive infinity values by missing.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="replace_infinite_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Neg Infinite Values" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="default" value="missing"/>
    <list key="columns"/>
    <parameter key="replenish_what" value="negative_infinity"/>
    <description align="center" color="transparent" colored="false" width="126">Replace negative infinity values by missing.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Numerical Missings" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <list key="columns"/>
    <description align="center" color="transparent" colored="false" width="126">Replace numerical missings with the average of the column.</description>
    </operator>
    <connect from_port="in 1" to_op="Replace Nominal Missings" to_port="example set input"/>
    <connect from_op="Replace Nominal Missings" from_port="example set output" to_op="Replace Pos Infinite Values" to_port="example set input"/>
    <connect from_op="Replace Pos Infinite Values" from_port="example set output" to_op="Replace Neg Infinite Values" to_port="example set input"/>
    <connect from_op="Replace Neg Infinite Values" from_port="example set output" to_op="Replace Numerical Missings" to_port="example set input"/>
    <connect from_op="Replace Numerical Missings" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Replace missing values.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="447" y="85">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="nominal"/>
    <description align="center" color="transparent" colored="false" width="126">Check if there are any nominal attributes in the data</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="branch" compatibility="8.1.001" expanded="true" height="103" name="Branch (2)" width="90" x="581" y="85">
    <parameter key="condition_type" value="min_attributes"/>
    <parameter key="condition_value" value="1"/>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="concurrency:loop_attributes" compatibility="8.1.001" expanded="true" height="82" name="Loop Attributes" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="reuse_results" value="true"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate" width="90" x="45" y="34">
    <list key="aggregation_attributes"/>
    <parameter key="group_by_attributes" value="%{loop_attribute}"/>
    <description align="center" color="transparent" colored="false" width="126">Create a new data set with one row for each nominal value of the current column (loop).</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="branch" compatibility="8.1.001" expanded="true" height="103" name="Branch" width="90" x="179" y="34">
    <parameter key="condition_type" value="min_examples"/>
    <parameter key="condition_value" value="10"/>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">More than 10 values? Remove current column.</description>
    </operator>
    <connect from_port="input 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="45" y="34">
    <list key="aggregation_attributes">
    <parameter key="%{loop_attribute}" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="%{loop_attribute}"/>
    <description align="center" color="transparent" colored="false" width="126">Count number of occurences for each value.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sort" compatibility="8.1.001" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
    <parameter key="attribute_name" value="count(%{loop_attribute})"/>
    <description align="center" color="transparent" colored="false" width="126">Sort counts.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="extract_macro" compatibility="8.1.001" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="34">
    <parameter key="macro" value="least_common"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="%{loop_attribute}"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    <description align="center" color="transparent" colored="false" width="126">Remember value with smallest count.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="nominal_to_numerical" compatibility="8.1.001" expanded="true" height="103" name="Nominal to Numerical (2)" width="90" x="447" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    <parameter key="use_comparison_groups" value="true"/>
    <list key="comparison_groups">
    <parameter key="%{loop_attribute}" value="%{least_common}"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Transform to binary using dummy coding and a comparison group for the least frequent value.</description>
    </operator>
    <connect from_port="input 1" to_op="Aggregate (2)" to_port="example set input"/>
    <connect from_op="Aggregate (2)" from_port="example set output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Aggregate (2)" from_port="original" to_op="Nominal to Numerical (2)" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_port="input 2"/>
    <connect from_op="Nominal to Numerical (2)" from_port="example set output" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="66" resized="false" width="126" x="40" y="210">Less than 10 values? Transform into binary.</description>
    </process>
    <description align="center" color="transparent" colored="false" width="126">If more than 10, remove column. If less, transform to binary.</description>
    </operator>
    <connect from_port="input 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Branch" to_port="condition"/>
    <connect from_op="Aggregate" from_port="original" to_op="Branch" to_port="input 1"/>
    <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Remove nominal columns with too many values, transform the others to binary.</description>
    </operator>
    <connect from_port="input 1" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <connect from_port="input 1" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">If there are nominal attributes, handle them inside</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sample_stratified" compatibility="8.1.001" expanded="true" height="82" name="Sample (Stratified)" width="90" x="715" y="85">
    <parameter key="sample_size" value="500000"/>
    <description align="center" color="transparent" colored="false" width="126">Sample down to 500,000 examples in case there are more.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="remove_useless_attributes" compatibility="8.1.001" expanded="true" height="82" name="Remove Useless Attributes" width="90" x="849" y="85">
    <description align="center" color="transparent" colored="false" width="126">Remove constant columns, can happen especially for MISSING columns after dummy coding.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="normalize" compatibility="8.1.001" expanded="true" height="103" name="Normalize" width="90" x="983" y="85">
    <description align="center" color="transparent" colored="false" width="126">Standardize all columns.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="order_attributes" compatibility="8.1.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="1117" y="85">
    <parameter key="sort_mode" value="alphabetically"/>
    <description align="center" color="transparent" colored="false" width="126">Order columns alphabetically.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="concurrency:k_means" compatibility="8.1.001" expanded="true" height="82" name="Clustering" width="90" x="1251" y="85"/>
    <operator activated="true" automodel="EXPORTED" class="multiply" compatibility="8.1.001" expanded="true" height="103" name="Multiply" width="90" x="1385" y="136">
    <description align="center" color="transparent" colored="false" width="126">Create a copy of the data for learning a tree.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="model_simulator:cluster_model_visualizer" compatibility="8.1.001" expanded="true" height="82" name="Cluster Model Visualizer" width="90" x="1519" y="85">
    <description align="center" color="transparent" colored="false" width="126">Creates the cluster model visualizations.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sort" compatibility="8.1.001" expanded="true" height="82" name="Sort (2)" width="90" x="1519" y="289">
    <parameter key="attribute_name" value="cluster"/>
    <description align="center" color="transparent" colored="false" width="126">Sort according to clusters so that decision tree colors match the cluster colors later on.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Attributes" width="90" x="1653" y="289">
    <list key="function_descriptions">
    <parameter key="cluster_label" value="cluster"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Generate a new label attribute from the sorted values to ensure consistent color schemes.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role" width="90" x="1787" y="289">
    <parameter key="attribute_name" value="cluster_label"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">Turn the newly generated column into the label for the decision tree.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="1921" y="187">
    <parameter key="apply_prepruning" value="false"/>
    <description align="center" color="transparent" colored="false" width="126">Learn a model explaining the cluster assignments.</description>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate (3)" width="90" x="2055" y="238">
    <list key="aggregation_attributes">
    <parameter key="a3" value="average"/>
    <parameter key="a1" value="average"/>
    <parameter key="a4" value="average"/>
    </list>
    <parameter key="group_by_attributes" value="cluster_label"/>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate (4)" width="90" x="2189" y="340">
    <list key="aggregation_attributes">
    <parameter key="a3" value="average"/>
    <parameter key="a1" value="average"/>
    <parameter key="a4" value="average"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="2323" y="289">
    <list key="function_descriptions">
    <parameter key="cluster_label" value="&quot;Total&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.001" expanded="true" height="103" name="Append" width="90" x="2457" y="187"/>
    <connect from_op="Retrieve Data" from_port="output" to_op="Preprocessing" to_port="in 1"/>
    <connect from_op="Preprocessing" from_port="out 1" to_op="Replace Missing Values" to_port="in 1"/>
    <connect from_op="Replace Missing Values" from_port="out 1" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Branch (2)" to_port="condition"/>
    <connect from_op="Select Attributes (2)" from_port="original" to_op="Branch (2)" to_port="input 1"/>
    <connect from_op="Branch (2)" from_port="input 1" to_op="Sample (Stratified)" to_port="example set input"/>
    <connect from_op="Sample (Stratified)" from_port="example set output" to_op="Remove Useless Attributes" to_port="example set input"/>
    <connect from_op="Remove Useless Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
    <connect from_op="Clustering" from_port="cluster model" to_op="Cluster Model Visualizer" to_port="model"/>
    <connect from_op="Clustering" from_port="clustered set" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Cluster Model Visualizer" to_port="clustered data"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Sort (2)" to_port="example set input"/>
    <connect from_op="Cluster Model Visualizer" from_port="visualizer output" to_port="result 1"/>
    <connect from_op="Sort (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Set Role" from_port="original" to_port="result 3"/>
    <connect from_op="Decision Tree" from_port="model" to_port="result 2"/>
    <connect from_op="Decision Tree" from_port="exampleSet" to_op="Aggregate (3)" to_port="example set input"/>
    <connect from_op="Aggregate (3)" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Aggregate (3)" from_port="original" to_op="Aggregate (4)" to_port="example set input"/>
    <connect from_op="Aggregate (4)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Append" from_port="merged set" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="42"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    <description align="left" color="yellow" colored="false" height="105" resized="true" width="263" x="1062" y="266">Results:&lt;br&gt;1. Cluster Model Visualization&lt;br&gt;2. Decision tree for cluster explanation&lt;br&gt;3. Clustered data</description>
    </process>
    </operator>
    </process>
    sgenzer
  • IngoRMIngoRM Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 1,406  RM Founder

    Also keep in mind that the features are normalized before the clustering (and hence also before the calculation of the averages).

    sgenzer
Sign In or Register to comment.