Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Moving average for each ID
Cristina_daimiel
Member Posts: 2 Learner I
in Help
Hello all,
I have a dataset with the energy produced by several PV plants each 15 minutes across 1 year. Therefore, I have a column with the datetime (around 18000 examples for each ID), another one with the ID (each PV plant have a different ID, in total I have 4 IDs) and the energy produced. For each example, I'm calculating the moving average of the previous 3 hours with the Operator "Moving average filter". However, when the first year of the first ID ends, for the second ID, the moving average is is calculating the average for the last 3 hours of the previous ID, instead of starting the calculation from the beginning. Is there a way for me to take the ID into account to do this calculation? Or should I separate the exampleset into 4 different exampleset (one for each ID) and do the calculation separately?
Many thanks in advance
I have a dataset with the energy produced by several PV plants each 15 minutes across 1 year. Therefore, I have a column with the datetime (around 18000 examples for each ID), another one with the ID (each PV plant have a different ID, in total I have 4 IDs) and the energy produced. For each example, I'm calculating the moving average of the previous 3 hours with the Operator "Moving average filter". However, when the first year of the first ID ends, for the second ID, the moving average is is calculating the average for the last 3 hours of the previous ID, instead of starting the calculation from the beginning. Is there a way for me to take the ID into account to do this calculation? Or should I separate the exampleset into 4 different exampleset (one for each ID) and do the calculation separately?
Many thanks in advance
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data ScientistHi @Cristina_daimiel ,you can use Group into Collection to split the example set and then use Loop Collection to do it per plant. There are defenitly a few ways to do this, but that would be mine. Attached is an example.Can I maybe ask for what kind of project you are doing this? This sounds very cool.Best,Martin<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" breakpoints="after" class="subprocess" compatibility="9.6.000" expanded="true" height="82" name="Subprocess" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="concurrency:loop" compatibility="9.6.000" expanded="true" height="82" name="Loop" width="90" x="45" y="34">
<parameter key="number_of_iterations" value="5"/>
<parameter key="iteration_macro" value="iteration"/>
<parameter key="reuse_results" value="false"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="utility:create_exampleset" compatibility="9.6.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="380" y="34">
<parameter key="generator_type" value="attribute functions"/>
<parameter key="number_of_examples" value="100"/>
<parameter key="use_stepsize" value="false"/>
<list key="function_descriptions">
<parameter key="Consumption" value="round(rand()*1000)"/>
<parameter key="Date" value="date_add(date_now(),id,DATE_UNIT_DAY)"/>
<parameter key="Plant Id" value="%{a}"/>
</list>
<parameter key="add_id_attribute" value="true"/>
<list key="numeric_series_configuration"/>
<list key="date_series_configuration"/>
<list key="date_series_configuration (interval)"/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="column_separator" value=","/>
<parameter key="parse_all_as_nominal" value="false"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="trim_attribute_names" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<connect from_op="Create ExampleSet" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="9.6.000" expanded="true" height="82" name="Append" width="90" x="179" y="34">
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="merge_type" value="all"/>
</operator>
<connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Generate Dummy Data</description>
</operator>
<operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="34">
<parameter key="group_by_attribute" value="Plant Id"/>
<parameter key="group_by_attribute (numerical)" value=""/>
<parameter key="sorting_order" value="none"/>
<description align="center" color="transparent" colored="false" width="126">Split into 5 example sets, one plant each</description>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="715" y="34">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="time_series:moving_average_filter" compatibility="9.6.000" expanded="true" height="68" name="Moving Average Filter" width="90" x="112" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Consumption"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="overwrite_attributes" value="true"/>
<parameter key="new_attributes_postfix" value="_filtered"/>
<parameter key="filter_type" value="simple"/>
<parameter key="filter_size_left" value="1"/>
<parameter key="filter_size_right" value="1"/>
<parameter key="filter_size" value="1"/>
</operator>
<connect from_port="single" to_op="Moving Average Filter" to_port="example set"/>
<connect from_op="Moving Average Filter" from_port="example set" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Do moving average per plant</description>
</operator>
<operator activated="true" class="operator_toolbox:advanced_append" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Append (Superset)" width="90" x="849" y="34"/>
<connect from_op="Subprocess" from_port="out 1" to_op="Group Into Collection" to_port="exa"/>
<connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Append (Superset)" to_port="example set 1"/>
<connect from_op="Append (Superset)" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
After the loop Connection operator, the example set have been split into 4 different dataset (one per PV plant) within an IOObjectcollection. Do you happen to know how can I combine again the data into the same exampleset?
The project I'm working on has as its objective the prediction of failures in a photovoltaic plant. For this I have data from different variables, together with enviroment conditions (irradiation, temperature, humidity, etc) every 15 minutes and for a full year.
Dortmund, Germany