AUTOMODEL K-MEANS GRAPHIC RESULT PROBLEMS

zhuoyinzhuoyin Member Posts: 7 Contributor I
edited June 2019 in Help
AUTOMODEL K-MEANS GRAPHIC RESULT PROBLEMS
Tagged:

Answers

  • zhuoyinzhuoyin Member Posts: 7 Contributor I
    1. Somethin is "on average larger/smaller " than what?
    2. How to explain "on average"?on average of what?
    3. How to calculate the percentage numbers? what is the formula?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    tagging @IngoRM

     

    Scott

     

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    When it says "X is on average Y% higher" here is what it means:

     

    The average value for feature / attribute X for the examples / cases in the cluster is Y% higher than the average values of feature X for the examples which are not in the cluster.

     

    BTW, those statements are a textual summary of the top 3 contributors as shown in the Heat Map chart.

     

    Hope this helps,

    Ingo

     

     

  • zhuoyinzhuoyin Member Posts: 7 Contributor I
    ."..Y higher than the features". The features are belongs to the original samples I entered, right? but it seems not work.
    2.PNG 42.2K
  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    i have tried the sample markt data. not working

    2.PNG 40.4K
  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    2.PNG

    How to understand this result?

  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    HOW TO CALCULATE THE AVERAGE?

  • zhuoyinzhuoyin Member Posts: 7 Contributor I

    tagging @IngoRM

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Try to put your data into a process like this.  As you can see I added an aggregation step to the automodel results so we can see some additional averages. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" automodel="EXPORTED" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Data" width="90" x="45" y="85">
    <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
    <description align="center" color="transparent" colored="false" width="126">Load data.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Preprocessing" width="90" x="179" y="85">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Define Target?" width="90" x="45" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Define Target" width="90" x="45" y="34">
    <parameter key="attribute_name" value="Survived"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">Define the target column for the predictive model.</description>
    </operator>
    <connect from_port="input 1" to_op="Define Target" to_port="example set input"/>
    <connect from_op="Define Target" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should define a target column?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Should Discretize?" width="90" x="179" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="discretize_by_bins" compatibility="8.1.001" expanded="true" height="103" name="Binning" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Age"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="range_name_type" value="short"/>
    <description align="center" color="transparent" colored="false" width="126">Discretize by binning (same range per bin).</description>
    </operator>
    <connect from_port="input 1" to_op="Binning" to_port="example set input"/>
    <connect from_op="Binning" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="discretize_by_frequency" compatibility="8.1.001" expanded="true" height="103" name="Frequency" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Age"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="range_name_type" value="short"/>
    <description align="center" color="transparent" colored="false" width="126">Discretize by frequency (same count per bin).</description>
    </operator>
    <connect from_port="input 1" to_op="Frequency" to_port="example set input"/>
    <connect from_op="Frequency" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should discretize numerical target column?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Map Values?" width="90" x="313" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="map" compatibility="8.1.001" expanded="true" height="82" name="Map Values" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Survived"/>
    <parameter key="include_special_attributes" value="true"/>
    <list key="value_mappings"/>
    <description align="center" color="transparent" colored="false" width="126">Map some nominal target values to new values.</description>
    </operator>
    <connect from_port="input 1" to_op="Map Values" to_port="example set input"/>
    <connect from_op="Map Values" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should map nominal values?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Positive Class?" width="90" x="447" y="34">
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="nominal_to_binominal" compatibility="8.1.001" expanded="true" height="103" name="Nominal to Binominal" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Survived"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Make sure that target is binary for positive class mapping.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="remap_binominals" compatibility="8.1.001" expanded="true" height="82" name="Define Positive Class" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Survived"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="negative_value" value="No"/>
    <parameter key="positive_value" value="Yes"/>
    <description align="center" color="transparent" colored="false" width="126">Potentially define which one should be the positive class.</description>
    </operator>
    <connect from_port="input 1" to_op="Nominal to Binominal" to_port="example set input"/>
    <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Define Positive Class" to_port="example set input"/>
    <connect from_op="Define Positive Class" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should define positive class?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_subprocess" compatibility="8.1.001" expanded="true" height="82" name="Remove Columns?" width="90" x="581" y="34">
    <parameter key="select_which" value="2"/>
    <process expanded="true">
    <connect from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Remove Columns" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="\Qlabel\E"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Potentially remove columns.</description>
    </operator>
    <connect from_port="input 1" to_op="Remove Columns" to_port="example set input"/>
    <connect from_op="Remove Columns" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Should remove columns?</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Unify Value Types" width="90" x="715" y="34">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Remove Dates" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="date_time"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Remove all date columns.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Transform all nominal columns to text so that we make sure that all will have polynominal type after the next transformation.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="text_to_nominal" compatibility="8.1.001" expanded="true" height="82" name="Text to Nominal" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Transform all text columns into polynominal columns.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="numerical_to_real" compatibility="8.1.001" expanded="true" height="82" name="Numerical to Real" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="use_value_type_exception" value="true"/>
    <parameter key="except_value_type" value="integer"/>
    <parameter key="include_special_attributes" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Turn all numerical columns (not integers though) into real columns.</description>
    </operator>
    <connect from_port="in 1" to_op="Remove Dates" to_port="example set input"/>
    <connect from_op="Remove Dates" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/>
    <connect from_op="Text to Nominal" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
    <connect from_op="Numerical to Real" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Unify all value types</description>
    </operator>
    <connect from_port="in 1" to_op="Define Target?" to_port="input 1"/>
    <connect from_op="Define Target?" from_port="output 1" to_op="Should Discretize?" to_port="input 1"/>
    <connect from_op="Should Discretize?" from_port="output 1" to_op="Map Values?" to_port="input 1"/>
    <connect from_op="Map Values?" from_port="output 1" to_op="Positive Class?" to_port="input 1"/>
    <connect from_op="Positive Class?" from_port="output 1" to_op="Remove Columns?" to_port="input 1"/>
    <connect from_op="Remove Columns?" from_port="output 1" to_op="Unify Value Types" to_port="in 1"/>
    <connect from_op="Unify Value Types" from_port="out 1" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">All general preprocessing steps happen inside this operator - double click on it to see the details.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Replace Missing Values" width="90" x="313" y="85">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Nominal Missings" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="default" value="value"/>
    <list key="columns"/>
    <parameter key="replenishment_value" value="MISSING"/>
    <description align="center" color="transparent" colored="false" width="126">Replace nominal missings with the word 'missing'.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="replace_infinite_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Pos Infinite Values" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="default" value="missing"/>
    <list key="columns"/>
    <description align="center" color="transparent" colored="false" width="126">Replace positive infinity values by missing.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="replace_infinite_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Neg Infinite Values" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="default" value="missing"/>
    <list key="columns"/>
    <parameter key="replenish_what" value="negative_infinity"/>
    <description align="center" color="transparent" colored="false" width="126">Replace negative infinity values by missing.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Numerical Missings" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <list key="columns"/>
    <description align="center" color="transparent" colored="false" width="126">Replace numerical missings with the average of the column.</description>
    </operator>
    <connect from_port="in 1" to_op="Replace Nominal Missings" to_port="example set input"/>
    <connect from_op="Replace Nominal Missings" from_port="example set output" to_op="Replace Pos Infinite Values" to_port="example set input"/>
    <connect from_op="Replace Pos Infinite Values" from_port="example set output" to_op="Replace Neg Infinite Values" to_port="example set input"/>
    <connect from_op="Replace Neg Infinite Values" from_port="example set output" to_op="Replace Numerical Missings" to_port="example set input"/>
    <connect from_op="Replace Numerical Missings" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Replace missing values.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="447" y="85">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="nominal"/>
    <description align="center" color="transparent" colored="false" width="126">Check if there are any nominal attributes in the data</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="branch" compatibility="8.1.001" expanded="true" height="103" name="Branch (2)" width="90" x="581" y="85">
    <parameter key="condition_type" value="min_attributes"/>
    <parameter key="condition_value" value="1"/>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="concurrency:loop_attributes" compatibility="8.1.001" expanded="true" height="82" name="Loop Attributes" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="reuse_results" value="true"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate" width="90" x="45" y="34">
    <list key="aggregation_attributes"/>
    <parameter key="group_by_attributes" value="%{loop_attribute}"/>
    <description align="center" color="transparent" colored="false" width="126">Create a new data set with one row for each nominal value of the current column (loop).</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="branch" compatibility="8.1.001" expanded="true" height="103" name="Branch" width="90" x="179" y="34">
    <parameter key="condition_type" value="min_examples"/>
    <parameter key="condition_value" value="10"/>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">More than 10 values? Remove current column.</description>
    </operator>
    <connect from_port="input 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="45" y="34">
    <list key="aggregation_attributes">
    <parameter key="%{loop_attribute}" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="%{loop_attribute}"/>
    <description align="center" color="transparent" colored="false" width="126">Count number of occurences for each value.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sort" compatibility="8.1.001" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
    <parameter key="attribute_name" value="count(%{loop_attribute})"/>
    <description align="center" color="transparent" colored="false" width="126">Sort counts.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="extract_macro" compatibility="8.1.001" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="34">
    <parameter key="macro" value="least_common"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="%{loop_attribute}"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    <description align="center" color="transparent" colored="false" width="126">Remember value with smallest count.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="nominal_to_numerical" compatibility="8.1.001" expanded="true" height="103" name="Nominal to Numerical (2)" width="90" x="447" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    <parameter key="use_comparison_groups" value="true"/>
    <list key="comparison_groups">
    <parameter key="%{loop_attribute}" value="%{least_common}"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Transform to binary using dummy coding and a comparison group for the least frequent value.</description>
    </operator>
    <connect from_port="input 1" to_op="Aggregate (2)" to_port="example set input"/>
    <connect from_op="Aggregate (2)" from_port="example set output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Aggregate (2)" from_port="original" to_op="Nominal to Numerical (2)" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_port="input 2"/>
    <connect from_op="Nominal to Numerical (2)" from_port="example set output" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="66" resized="false" width="126" x="40" y="210">Less than 10 values? Transform into binary.</description>
    </process>
    <description align="center" color="transparent" colored="false" width="126">If more than 10, remove column. If less, transform to binary.</description>
    </operator>
    <connect from_port="input 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Branch" to_port="condition"/>
    <connect from_op="Aggregate" from_port="original" to_op="Branch" to_port="input 1"/>
    <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Remove nominal columns with too many values, transform the others to binary.</description>
    </operator>
    <connect from_port="input 1" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <connect from_port="input 1" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">If there are nominal attributes, handle them inside</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sample_stratified" compatibility="8.1.001" expanded="true" height="82" name="Sample (Stratified)" width="90" x="715" y="85">
    <parameter key="sample_size" value="500000"/>
    <description align="center" color="transparent" colored="false" width="126">Sample down to 500,000 examples in case there are more.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="remove_useless_attributes" compatibility="8.1.001" expanded="true" height="82" name="Remove Useless Attributes" width="90" x="849" y="85">
    <description align="center" color="transparent" colored="false" width="126">Remove constant columns, can happen especially for MISSING columns after dummy coding.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="normalize" compatibility="8.1.001" expanded="true" height="103" name="Normalize" width="90" x="983" y="85">
    <description align="center" color="transparent" colored="false" width="126">Standardize all columns.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="order_attributes" compatibility="8.1.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="1117" y="85">
    <parameter key="sort_mode" value="alphabetically"/>
    <description align="center" color="transparent" colored="false" width="126">Order columns alphabetically.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="concurrency:k_means" compatibility="8.1.001" expanded="true" height="82" name="Clustering" width="90" x="1251" y="85"/>
    <operator activated="true" automodel="EXPORTED" class="multiply" compatibility="8.1.001" expanded="true" height="103" name="Multiply" width="90" x="1385" y="136">
    <description align="center" color="transparent" colored="false" width="126">Create a copy of the data for learning a tree.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="model_simulator:cluster_model_visualizer" compatibility="8.1.001" expanded="true" height="82" name="Cluster Model Visualizer" width="90" x="1519" y="85">
    <description align="center" color="transparent" colored="false" width="126">Creates the cluster model visualizations.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sort" compatibility="8.1.001" expanded="true" height="82" name="Sort (2)" width="90" x="1519" y="289">
    <parameter key="attribute_name" value="cluster"/>
    <description align="center" color="transparent" colored="false" width="126">Sort according to clusters so that decision tree colors match the cluster colors later on.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Attributes" width="90" x="1653" y="289">
    <list key="function_descriptions">
    <parameter key="cluster_label" value="cluster"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Generate a new label attribute from the sorted values to ensure consistent color schemes.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role" width="90" x="1787" y="289">
    <parameter key="attribute_name" value="cluster_label"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">Turn the newly generated column into the label for the decision tree.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="1921" y="187">
    <parameter key="apply_prepruning" value="false"/>
    <description align="center" color="transparent" colored="false" width="126">Learn a model explaining the cluster assignments.</description>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate (3)" width="90" x="2055" y="238">
    <list key="aggregation_attributes">
    <parameter key="a3" value="average"/>
    <parameter key="a1" value="average"/>
    <parameter key="a4" value="average"/>
    </list>
    <parameter key="group_by_attributes" value="cluster_label"/>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.1.001" expanded="true" height="82" name="Aggregate (4)" width="90" x="2189" y="340">
    <list key="aggregation_attributes">
    <parameter key="a3" value="average"/>
    <parameter key="a1" value="average"/>
    <parameter key="a4" value="average"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="2323" y="289">
    <list key="function_descriptions">
    <parameter key="cluster_label" value="&quot;Total&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.001" expanded="true" height="103" name="Append" width="90" x="2457" y="187"/>
    <connect from_op="Retrieve Data" from_port="output" to_op="Preprocessing" to_port="in 1"/>
    <connect from_op="Preprocessing" from_port="out 1" to_op="Replace Missing Values" to_port="in 1"/>
    <connect from_op="Replace Missing Values" from_port="out 1" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Branch (2)" to_port="condition"/>
    <connect from_op="Select Attributes (2)" from_port="original" to_op="Branch (2)" to_port="input 1"/>
    <connect from_op="Branch (2)" from_port="input 1" to_op="Sample (Stratified)" to_port="example set input"/>
    <connect from_op="Sample (Stratified)" from_port="example set output" to_op="Remove Useless Attributes" to_port="example set input"/>
    <connect from_op="Remove Useless Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
    <connect from_op="Clustering" from_port="cluster model" to_op="Cluster Model Visualizer" to_port="model"/>
    <connect from_op="Clustering" from_port="clustered set" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Cluster Model Visualizer" to_port="clustered data"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Sort (2)" to_port="example set input"/>
    <connect from_op="Cluster Model Visualizer" from_port="visualizer output" to_port="result 1"/>
    <connect from_op="Sort (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Set Role" from_port="original" to_port="result 3"/>
    <connect from_op="Decision Tree" from_port="model" to_port="result 2"/>
    <connect from_op="Decision Tree" from_port="exampleSet" to_op="Aggregate (3)" to_port="example set input"/>
    <connect from_op="Aggregate (3)" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Aggregate (3)" from_port="original" to_op="Aggregate (4)" to_port="example set input"/>
    <connect from_op="Aggregate (4)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Append" from_port="merged set" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="42"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    <description align="left" color="yellow" colored="false" height="105" resized="true" width="263" x="1062" y="266">Results:&lt;br&gt;1. Cluster Model Visualization&lt;br&gt;2. Decision tree for cluster explanation&lt;br&gt;3. Clustered data</description>
    </process>
    </operator>
    </process>
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Also keep in mind that the features are normalized before the clustering (and hence also before the calculation of the averages).

Sign In or Register to comment.