Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
I need some help with agglomerative clustering please
Hello everyone! I am working with RapidMiner for a week now and I
cannot figure out how to solve my problem or to be more specific: I need
some inspiration for the work with RapidMiner.
Here is my starting point:
- I
have a csv-file which contains several examples of data from sensors of
a fictional production machine. The first row will be a timestamp which contains the time when the sensor collected data. The second one will be the name of the event which happened. Attached you will find some data example as I cannot upload it here.
- As
you can see, from time to time an error has accurred (yellow mark)
which I want to analyse why it happened. The assumption is that events
which happened in a short time before
"error occurred" have a higher possibility to cause this problem. Events
which happened a long time before the error occurred have a lesser
possibility.
- After doing the tutorial and reading some
questions from the community I decided to try an agglomerative cluster
to cluster all the events which occurred in the time before the event
"error occured".
- That is why I want to take the event
"error occurred" as my zero and measure the time distances between zero
and the events happened before in order to determine which failure of a
sensor will probably lend into the the event "error occurred".
- My thought was to maybe split the data at a first step after each "error occurred" into smaller sub-files and try to apply the agglomerative cluster.
Could you guys please give me an inspiration to
solve my problem or could you please tell me if this is possible like I
presented my ideas?
Thanks in advance and have a nice week!
Greetings
Janito
Tagged:
0
Best Answer
-
sgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Managerhi @Janito I'm sure there is an easier way to do this, but this works
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.2.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="America/New_York"/> <parameter key="input_csv_text" value="timestamp,event name 8:00,Sensor A false 8:15,Sensor B false 8:16,Sensor C false 8:34,Sensor A false 8:36,Sensor C false 8:40,Sensor A false 8:40,Error occurred 9:03,Sensor B false 9:10,Sensor D false 9:12,Sensor B false 9:15,Sensor A false 9:15,Error occurred 9:20,Sensor B false"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="nominal_to_date" compatibility="9.2.001" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="34"> <parameter key="attribute_name" value="timestamp"/> <parameter key="date_type" value="time"/> <parameter key="date_format" value="HH:mm"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34"> <list key="function_descriptions"> <parameter key="flag" value="if(prefix([event name],1)=="E",1,0)"/> </list> <parameter key="keep_all" value="true"/> </operator> <operator activated="true" class="operator_toolbox:generate_session_id" compatibility="2.0.001" expanded="true" height="82" name="Generate Session ID" width="90" x="447" y="34"> <parameter key="date_attribute" value="flag"/> <parameter key="gap_threshold" value="0.5"/> <parameter key="gap_unit" value="none"/> <parameter key="use_absolutes" value="false"/> </operator> <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Session id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="time_series:lag_series" compatibility="9.2.001" expanded="true" height="82" name="Lag" width="90" x="715" y="34"> <list key="attributes"> <parameter key="Session id" value="1"/> </list> <parameter key="overwrite_attributes" value="false"/> <parameter key="extend_exampleset" value="false"/> </operator> <operator activated="true" class="replace_missing_values" compatibility="9.2.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="849" y="34"> <parameter key="return_preprocessing_model" value="false"/> <parameter key="create_view" value="false"/> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Session id-1"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="default" value="value"/> <list key="columns"/> <parameter key="replenishment_value" value="0"/> </operator> <operator activated="true" class="concurrency:loop_values" compatibility="9.2.001" expanded="true" height="82" name="Loop Values" width="90" x="983" y="34"> <parameter key="attribute" value="Session id-1"/> <parameter key="iteration_macro" value="loop_value"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="filter_examples" compatibility="9.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="Session id-1.equals.%{loop_value}"/> </list> <parameter key="filters_logic_and" value="true"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.2.001" expanded="true" height="68" name="Extract Macro" width="90" x="179" y="34"> <parameter key="macro" value="min"/> <parameter key="macro_type" value="statistics"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="timestamp"/> <list key="additional_macros"/> <description align="center" color="transparent" colored="false" width="126">min</description> </operator> <operator activated="true" class="extract_macro" compatibility="9.2.001" expanded="true" height="68" name="Extract Macro (2)" width="90" x="313" y="34"> <parameter key="macro" value="max"/> <parameter key="macro_type" value="statistics"/> <parameter key="statistics" value="max"/> <parameter key="attribute_name" value="timestamp"/> <list key="additional_macros"/> <description align="center" color="transparent" colored="false" width="126">min</description> </operator> <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="447" y="34"> <list key="function_descriptions"> <parameter key="timeDifferenceInMinutes" value="(eval(%{max})-eval(%{min}))/(1000*60)"/> </list> <parameter key="keep_all" value="true"/> </operator> <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Extract Macro" to_port="example set"/> <connect from_op="Extract Macro" from_port="example set" to_op="Extract Macro (2)" to_port="example set"/> <connect from_op="Extract Macro (2)" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/> <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="append" compatibility="9.2.001" expanded="true" height="82" name="Append" width="90" x="1117" y="34"> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> <parameter key="merge_type" value="all"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Nominal to Date" to_port="example set input"/> <connect from_op="Nominal to Date" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Generate Session ID" to_port="exa"/> <connect from_op="Generate Session ID" from_port="exa" to_op="Numerical to Polynominal" to_port="example set input"/> <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Lag" to_port="example set input"/> <connect from_op="Lag" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/> <connect from_op="Replace Missing Values" from_port="example set output" to_op="Loop Values" to_port="input 1"/> <connect from_op="Loop Values" from_port="output 1" to_op="Append" to_port="example set 1"/> <connect from_op="Append" from_port="merged set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Scott1
Answers