RapidMiner

RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2017

Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Boom!  Well done, @Andrew!  Anyone else coming in?  There are prizes for 2nd and 3rd prizes.

 

Scott

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Contributor II jacobcybulski
Contributor II

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hi Scott,

Just a question on the submission method. I am sure it has been addressed in the original competition document and your post above, I just want to ensure I am following your instructions to the letter. Here are some of my assumptions:

 

  1. the submission is simply a post to this discussion area;
  2. the submission can include multiple processes, including those called by "Execute Process";
  3. we should refrain from using Python or R and instead focus on the pure RM solution;
  4. the post needs to explain how to run the included processes;
  5. all included XML inserts would be saved into the same folder with correct names;
  6. you are not happy accepting a zipped directory of all RMP files;
  7. we need to explain the method of data pre-processing and that we do not violate any rules;
  8. you are going to penalise any copy-cats and any attempts of plagiarism of the submitted solutions;
  9. finally, can I assume that the data provided to us has been nicely unzipped into two folders, or must we rely on the zipped data as it was provided to the competitors?

Jacob

 

P.S. Lots of questions and I am yet to get some good results to submit Smiley Happy

Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hello @jacobcybulski - all good questions.  Let me answer below.

 


the submission is simply a post to this discussion area;

 

YES.

 

the submission can include multiple processes, including those called by "Execute Process";

 

YES, as long as the processes that "Execute Process" calls are also included in your submission.

 

we should refrain from using Python or R and instead focus on the pure RM solution;

 

NO.  You can use Python and R if you like.  There were no rules that stipulated otherwise.  Of course all scripts need to be executable, open-source, etc...  We are not going to spend time ensuring that dependencies are there and so forth.

 

the post needs to explain how to run the included processes;

 

I would expect that it is fairly self-evident how to run your process.  If you think that it needs some explanation, by all means go ahead.  Otherwise we may reach out to you (see my previous post) if we cannot get it to run or do not understand something.

 

all included XML inserts would be saved into the same folder with correct names;

 

You can submit via XML posted directly in this thread using the </> tool, attached as .rmp files, or one attached .zip file with XML or .rmp inside.  Any of these methods are fine.

 

you are not happy accepting a zipped directory of all RMP files;

 

NO.  It is perfectly fine to attach a .zip file with all your .rmp files as long as the zip is able to be opened by anyone.

 

we need to explain the method of data pre-processing

 

No.  We are not requiring any explanation.  We will of course be looking at your process and ensuring that you are not gaming the systems (e.g. gaming the process so your score is high).  I always assume that people are honest and have integrity until proven otherwise.

 

and that we do not violate any rules;

 

Yes - for all rules stated in this thread by me.

 

you are going to penalise any copy-cats and any attempts of plagiarism of the submitted solutions;

 

So again this is a collegial competition and I always assume that people are honest and have integrity.  In addition, all RM processes are rather similar (we all use the same operators) so trying to examine millions of subprocesses for code snippets is not feasible nor desired.  All submissions are public and open for the purposes of transparency and so that we can learn from one another (the main objective of these competitions). 

 

That said, the sponsor and I have reserved the right to disqualify a submission if we deem it necessary, and if someone really does something dishonest, I absolutely have the right to disquality the submission and permanently ban the user from this communitySmiley Happy

 

finally, can I assume that the data provided to us has been nicely unzipped into two folders, or must we rely on the zipped data as it was provided to the competitors?

 

We will have the data in both zipped and unzipped forms - it does not matter.  However your process needs to grab the data as was originally posted.

 

Jacob

 

P.S. Lots of questions and I am yet to get some good results to submit Smiley Happy

 

Wahoo!  Well done, Jacob.  Six days left!

 

Scott


 

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hello all competitors - FYI the Competition Server is currently locked up so any jobs sent to the server will not be queued.  I will ask my colleagues to do a hard reboot tomorrow morning first thing.

 

UPDATED - COMPETITION SERVER IS BACK UP AND RUNNING (9:30AM EST).

 

Thanks for your understanding.  Lots of lessons learned here for me too.  Three more days to go!


Scott

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Learner III 16B543J
Learner III

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve RC2_TestData_178" width="90" x="45" y="289">
        <parameter key="repository_entry" value="//Local Repository/data/RC2_TestData_178"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples (test)" width="90" x="179" y="289">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="no_missing_attributes"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes (test)" width="90" x="313" y="289">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="yieldIncrease|sensor9|sensor8|sensor7|sensor6|sensor5|sensor41|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1|hour|Label"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (test)" width="90" x="447" y="289">
        <parameter key="attribute_name" value="yieldIncrease"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="7.6.001" expanded="true" height="103" name="Nominal to Numerical (test)" width="90" x="581" y="289">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Label"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="coding_type" value="dummy coding"/>
        <parameter key="use_comparison_groups" value="false"/>
        <list key="comparison_groups"/>
        <parameter key="unexpected_value_handling" value="all 0 and warning"/>
        <parameter key="use_underscore_in_name" value="false"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../data/RC2_TrainDate_1475"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="no_missing_attributes"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="yieldIncrease|Label|hour|sensor9|sensor8|sensor7|sensor6|sensor5|sensor41|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
        <parameter key="attribute_name" value="yieldIncrease"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="7.6.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="581" y="34">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Label"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="coding_type" value="dummy coding"/>
        <parameter key="use_comparison_groups" value="false"/>
        <list key="comparison_groups"/>
        <parameter key="unexpected_value_handling" value="all 0 and warning"/>
        <parameter key="use_underscore_in_name" value="false"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize" width="90" x="715" y="34">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
      </operator>
      <operator activated="true" class="local_polynomial_regression" compatibility="7.6.001" expanded="true" height="82" name="Local Polynomial Regression" width="90" x="715" y="187">
        <parameter key="degree" value="2"/>
        <parameter key="ridge_factor" value="1.0E-9"/>
        <parameter key="use_robust_estimation" value="false"/>
        <parameter key="use_weights" value="true"/>
        <parameter key="iterations" value="20"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="neighborhood_type" value="Fixed Number"/>
        <parameter key="k" value="5"/>
        <parameter key="fixed_distance" value="5.0"/>
        <parameter key="distance" value="10.0"/>
        <parameter key="at_least" value="20"/>
        <parameter key="smoothing_kernel" value="Triweight"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize (test)" width="90" x="715" y="289">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="849" y="187">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="performance_regression" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="983" y="187">
        <parameter key="main_criterion" value="first"/>
        <parameter key="root_mean_squared_error" value="true"/>
        <parameter key="absolute_error" value="false"/>
        <parameter key="relative_error" value="false"/>
        <parameter key="relative_error_lenient" value="false"/>
        <parameter key="relative_error_strict" value="false"/>
        <parameter key="normalized_absolute_error" value="false"/>
        <parameter key="root_relative_squared_error" value="false"/>
        <parameter key="squared_error" value="false"/>
        <parameter key="correlation" value="false"/>
        <parameter key="squared_correlation" value="false"/>
        <parameter key="prediction_average" value="false"/>
        <parameter key="spearman_rho" value="false"/>
        <parameter key="kendall_tau" value="false"/>
        <parameter key="skip_undefined_labels" value="true"/>
        <parameter key="use_example_weights" value="true"/>
      </operator>
      <operator activated="false" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="983" y="289">
        <list key="function_descriptions">
          <parameter key="DIFF" value="yieldIncrease-[prediction(yieldIncrease)]"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <connect from_op="Retrieve RC2_TestData_178" from_port="output" to_op="Filter Examples (test)" to_port="example set input"/>
      <connect from_op="Filter Examples (test)" from_port="example set output" to_op="Select Attributes (test)" to_port="example set input"/>
      <connect from_op="Select Attributes (test)" from_port="example set output" to_op="Set Role (test)" to_port="example set input"/>
      <connect from_op="Set Role (test)" from_port="example set output" to_op="Nominal to Numerical (test)" to_port="example set input"/>
      <connect from_op="Nominal to Numerical (test)" from_port="example set output" to_op="Normalize (test)" to_port="example set input"/>
      <connect from_op="Retrieve" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
      <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Local Polynomial Regression" to_port="training set"/>
      <connect from_op="Local Polynomial Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Normalize (test)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 2"/>
      <connect from_op="Performance" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

hello @16B543J - thank you for your submission!  Unfortunately you are not pulling from the original data sets.  I have put your model in the previously posted "scoring process".  Can you please look and resubmit?

 

Thank you.


Scott

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="random_seed" value="-1"/>
<process expanded="true">
<operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="train" width="90" x="45" y="136">
<parameter key="directory" value="/Users/genzerconsulting/OneDrive - RapidMiner/OneDrive Repository/RM Competitions/Comp1-Mars Farming-Sept 2017/RM_Competition_TestData_random"/>
<parameter key="filter_by_glob" value="test*.xlsx"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="112" y="34">
<parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
<parameter key="imported_cell_range" value="A1:AT38"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Id.true.integer.attribute"/>
<parameter key="1" value="hour.true.integer.attribute"/>
<parameter key="2" value="Label.true.polynominal.attribute"/>
<parameter key="3" value="sensor1.true.integer.attribute"/>
<parameter key="4" value="sensor2.true.integer.attribute"/>
<parameter key="5" value="sensor3.true.integer.attribute"/>
<parameter key="6" value="sensor4.true.integer.attribute"/>
<parameter key="7" value="sensor5.true.integer.attribute"/>
<parameter key="8" value="sensor6.true.integer.attribute"/>
<parameter key="9" value="sensor7.true.integer.attribute"/>
<parameter key="10" value="sensor8.true.integer.attribute"/>
<parameter key="11" value="sensor9.true.integer.attribute"/>
<parameter key="12" value="sensor10.true.integer.attribute"/>
<parameter key="13" value="sensor11.true.integer.attribute"/>
<parameter key="14" value="sensor12.true.integer.attribute"/>
<parameter key="15" value="sensor13.true.integer.attribute"/>
<parameter key="16" value="sensor14.true.integer.attribute"/>
<parameter key="17" value="sensor15.true.integer.attribute"/>
<parameter key="18" value="sensor16.true.integer.attribute"/>
<parameter key="19" value="sensor17.true.integer.attribute"/>
<parameter key="20" value="sensor18.true.integer.attribute"/>
<parameter key="21" value="sensor19.true.integer.attribute"/>
<parameter key="22" value="sensor20.true.integer.attribute"/>
<parameter key="23" value="sensor21.true.integer.attribute"/>
<parameter key="24" value="sensor22.true.integer.attribute"/>
<parameter key="25" value="sensor23.true.integer.attribute"/>
<parameter key="26" value="sensor24.true.integer.attribute"/>
<parameter key="27" value="sensor25.true.integer.attribute"/>
<parameter key="28" value="sensor26.true.integer.attribute"/>
<parameter key="29" value="sensor27.true.integer.attribute"/>
<parameter key="30" value="sensor28.true.integer.attribute"/>
<parameter key="31" value="sensor29.true.integer.attribute"/>
<parameter key="32" value="sensor30.true.integer.attribute"/>
<parameter key="33" value="sensor31.true.integer.attribute"/>
<parameter key="34" value="sensor32.true.integer.attribute"/>
<parameter key="35" value="sensor33.true.integer.attribute"/>
<parameter key="36" value="sensor34.true.integer.attribute"/>
<parameter key="37" value="sensor35.true.integer.attribute"/>
<parameter key="38" value="sensor36.true.integer.attribute"/>
<parameter key="39" value="sensor37.true.integer.attribute"/>
<parameter key="40" value="sensor38.true.integer.attribute"/>
<parameter key="41" value="sensor39.true.numeric.attribute"/>
<parameter key="42" value="sensor40.true.numeric.attribute"/>
<parameter key="43" value="sensor41.true.numeric.attribute"/>
<parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
<parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
</list>
</operator>
<connect from_port="file object" to_op="Read Excel (3)" to_port="file"/>
<connect from_op="Read Excel (3)" from_port="output" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (3)" width="90" x="179" y="136"/>
<operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="0"/>
<description align="center" color="transparent" colored="false" width="126">replace missing yield values with zero</description>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess (4)" width="90" x="514" y="187">
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
<parameter key="condition_class" value="no_missing_attributes"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncrease|Label|hour|sensor9|sensor8|sensor7|sensor6|sensor5|sensor41|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="yieldIncrease"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="7.6.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Label"/>
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize" width="90" x="581" y="34">
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="local_polynomial_regression" compatibility="7.6.001" expanded="true" height="82" name="Local Polynomial Regression" width="90" x="715" y="34"/>
<connect from_port="in 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Local Polynomial Regression" to_port="training set"/>
<connect from_op="Local Polynomial Regression" from_port="model" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">MODELING</description>
</operator>
<operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="test" width="90" x="45" y="544">
<parameter key="directory" value="/Users/genzerconsulting/OneDrive - RapidMiner/OneDrive Repository/RM Competitions/Comp1-Mars Farming-Sept 2017/RM_Competition_TestData_random"/>
<parameter key="filter_by_glob" value="test*.xlsx"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="34">
<parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
<parameter key="imported_cell_range" value="A1:AT38"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Id.true.integer.attribute"/>
<parameter key="1" value="hour.true.integer.attribute"/>
<parameter key="2" value="Label.true.polynominal.attribute"/>
<parameter key="3" value="sensor1.true.integer.attribute"/>
<parameter key="4" value="sensor2.true.integer.attribute"/>
<parameter key="5" value="sensor3.true.integer.attribute"/>
<parameter key="6" value="sensor4.true.integer.attribute"/>
<parameter key="7" value="sensor5.true.integer.attribute"/>
<parameter key="8" value="sensor6.true.integer.attribute"/>
<parameter key="9" value="sensor7.true.integer.attribute"/>
<parameter key="10" value="sensor8.true.integer.attribute"/>
<parameter key="11" value="sensor9.true.integer.attribute"/>
<parameter key="12" value="sensor10.true.integer.attribute"/>
<parameter key="13" value="sensor11.true.integer.attribute"/>
<parameter key="14" value="sensor12.true.integer.attribute"/>
<parameter key="15" value="sensor13.true.integer.attribute"/>
<parameter key="16" value="sensor14.true.integer.attribute"/>
<parameter key="17" value="sensor15.true.integer.attribute"/>
<parameter key="18" value="sensor16.true.integer.attribute"/>
<parameter key="19" value="sensor17.true.integer.attribute"/>
<parameter key="20" value="sensor18.true.integer.attribute"/>
<parameter key="21" value="sensor19.true.integer.attribute"/>
<parameter key="22" value="sensor20.true.integer.attribute"/>
<parameter key="23" value="sensor21.true.integer.attribute"/>
<parameter key="24" value="sensor22.true.integer.attribute"/>
<parameter key="25" value="sensor23.true.integer.attribute"/>
<parameter key="26" value="sensor24.true.integer.attribute"/>
<parameter key="27" value="sensor25.true.integer.attribute"/>
<parameter key="28" value="sensor26.true.integer.attribute"/>
<parameter key="29" value="sensor27.true.integer.attribute"/>
<parameter key="30" value="sensor28.true.integer.attribute"/>
<parameter key="31" value="sensor29.true.integer.attribute"/>
<parameter key="32" value="sensor30.true.integer.attribute"/>
<parameter key="33" value="sensor31.true.integer.attribute"/>
<parameter key="34" value="sensor32.true.integer.attribute"/>
<parameter key="35" value="sensor33.true.integer.attribute"/>
<parameter key="36" value="sensor34.true.integer.attribute"/>
<parameter key="37" value="sensor35.true.integer.attribute"/>
<parameter key="38" value="sensor36.true.integer.attribute"/>
<parameter key="39" value="sensor37.true.integer.attribute"/>
<parameter key="40" value="sensor38.true.integer.attribute"/>
<parameter key="41" value="sensor39.true.numeric.attribute"/>
<parameter key="42" value="sensor40.true.numeric.attribute"/>
<parameter key="43" value="sensor41.true.numeric.attribute"/>
<parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
<parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="0"/>
</operator>
<connect from_port="file object" to_op="Read Excel (2)" to_port="file"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Replace Missing Values (2)" to_port="example set input"/>
<connect from_op="Replace Missing Values (2)" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (2)" width="90" x="179" y="544"/>
<operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values (3)" width="90" x="313" y="544">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="0"/>
<description align="center" color="transparent" colored="false" width="126">replace missing yield values with zero</description>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="Subprocess (3)" width="90" x="514" y="544">
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples (test)" width="90" x="45" y="136">
<parameter key="condition_class" value="no_missing_attributes"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes (test)" width="90" x="179" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncrease|sensor9|sensor8|sensor7|sensor6|sensor5|sensor41|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1|hour|Label"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (test)" width="90" x="313" y="136">
<parameter key="attribute_name" value="yieldIncrease"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="7.6.001" expanded="true" height="103" name="Nominal to Numerical (test)" width="90" x="447" y="136">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Label"/>
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize (test)" width="90" x="581" y="136">
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="715" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="849" y="34"/>
<connect from_port="in 1" to_op="Apply Model" to_port="model"/>
<connect from_port="in 2" to_op="Filter Examples (test)" to_port="example set input"/>
<connect from_op="Filter Examples (test)" from_port="example set output" to_op="Select Attributes (test)" to_port="example set input"/>
<connect from_op="Select Attributes (test)" from_port="example set output" to_op="Set Role (test)" to_port="example set input"/>
<connect from_op="Set Role (test)" from_port="example set output" to_op="Nominal to Numerical (test)" to_port="example set input"/>
<connect from_op="Nominal to Numerical (test)" from_port="example set output" to_op="Normalize (test)" to_port="example set input"/>
<connect from_op="Normalize (test)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="out 2"/>
<connect from_op="Performance" from_port="example set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">APPLY MODEL</description>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="Subprocess (2)" width="90" x="715" y="544">
<process expanded="true">
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="45" y="34">
<list key="function_descriptions">
<parameter key="nutrientCorrect" value="if(Label==nutrientPrediction,TRUE,FALSE)"/>
</list>
<description align="center" color="transparent" colored="false" width="126">nutrientCorrect and hourMatch</description>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="SCORE" value="if(nutrientCorrect==TRUE&amp;&amp;Label==&quot;A&quot;,yieldIncreaseA,&#10;if(nutrientCorrect==TRUE&amp;&amp;Label==&quot;B&quot;,yieldIncreaseB,-100))"/>
</list>
<description align="center" color="transparent" colored="false" width="126">SCORE</description>
</operator>
<operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="Reorder Attributes (2)" width="90" x="313" y="34">
<parameter key="attribute_ordering" value="Label|hour|hourPrediction|nutrientPrediction|nutrientCorrect|yieldIncreaseA|yieldIncreaseB|SCORE"/>
</operator>
<operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
<list key="aggregation_attributes">
<parameter key="SCORE" value="sum"/>
</list>
</operator>
<connect from_port="in 1" to_op="Generate Attributes (3)" to_port="example set input"/>
<connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Generate Attributes (4)" to_port="example set input"/>
<connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Reorder Attributes (2)" to_port="example set input"/>
<connect from_op="Reorder Attributes (2)" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="out 1"/>
<connect from_op="Aggregate" from_port="original" to_port="out 2"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">SCORING</description>
</operator>
<connect from_op="train" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
<connect from_op="Append (3)" from_port="merged set" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Subprocess (4)" to_port="in 1"/>
<connect from_op="Subprocess (4)" from_port="out 1" to_op="Subprocess (3)" to_port="in 1"/>
<connect from_op="test" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
<connect from_op="Append (2)" from_port="merged set" to_op="Replace Missing Values (3)" to_port="example set input"/>
<connect from_op="Replace Missing Values (3)" from_port="example set output" to_op="Subprocess (3)" to_port="in 2"/>
<connect from_op="Subprocess (3)" from_port="out 1" to_op="Subprocess (2)" to_port="in 1"/>
<connect from_op="Subprocess (2)" from_port="out 1" to_port="result 1"/>
<connect from_op="Subprocess (2)" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<description align="center" color="yellow" colored="false" height="52" resized="true" width="454" x="324" y="23">User 16B543J - submitted Oct 12 11:43am EST</description>
<description align="center" color="yellow" colored="false" height="254" resized="true" width="194" x="670" y="480">DO NOT CHANGE</description>
<description align="center" color="yellow" colored="false" height="266" resized="true" width="462" x="16" y="84">DO NOT CHANGE</description>
<description align="center" color="yellow" colored="false" height="268" resized="true" width="465" x="11" y="483">DO NOT CHANGE</description>
</process>
</operator>
</process>
Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Certified Expert
RM Certified Expert

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Here is my process. It needs a folder containing the training files and another containing the test files. 

 


<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="concurrency:loop_files" compatibility="7.5.000" expanded="true" height="82" name="training" width="90" x="45" y="34"> <parameter key="directory" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random"/> <parameter key="filter_by_glob" value="training*.xlsx"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="7.5.000" expanded="true" height="68" name="Read Excel" width="90" x="246" y="34"> <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/> <parameter key="imported_cell_range" value="A1:AT38"/> <parameter key="first_row_as_names" value="false"/> <list key="annotations"> <parameter key="0" value="Name"/> </list> <list key="data_set_meta_data_information"> <parameter key="0" value="Id.true.integer.attribute"/> <parameter key="1" value="hour.true.integer.attribute"/> <parameter key="2" value="Label.true.polynominal.attribute"/> <parameter key="3" value="sensor1.true.integer.attribute"/> <parameter key="4" value="sensor2.true.integer.attribute"/> <parameter key="5" value="sensor3.true.integer.attribute"/> <parameter key="6" value="sensor4.true.integer.attribute"/> <parameter key="7" value="sensor5.true.integer.attribute"/> <parameter key="8" value="sensor6.true.integer.attribute"/> <parameter key="9" value="sensor7.true.integer.attribute"/> <parameter key="10" value="sensor8.true.integer.attribute"/> <parameter key="11" value="sensor9.true.integer.attribute"/> <parameter key="12" value="sensor10.true.integer.attribute"/> <parameter key="13" value="sensor11.true.integer.attribute"/> <parameter key="14" value="sensor12.true.integer.attribute"/> <parameter key="15" value="sensor13.true.integer.attribute"/> <parameter key="16" value="sensor14.true.integer.attribute"/> <parameter key="17" value="sensor15.true.integer.attribute"/> <parameter key="18" value="sensor16.true.integer.attribute"/> <parameter key="19" value="sensor17.true.integer.attribute"/> <parameter key="20" value="sensor18.true.integer.attribute"/> <parameter key="21" value="sensor19.true.integer.attribute"/> <parameter key="22" value="sensor20.true.integer.attribute"/> <parameter key="23" value="sensor21.true.integer.attribute"/> <parameter key="24" value="sensor22.true.integer.attribute"/> <parameter key="25" value="sensor23.true.integer.attribute"/> <parameter key="26" value="sensor24.true.integer.attribute"/> <parameter key="27" value="sensor25.true.integer.attribute"/> <parameter key="28" value="sensor26.true.integer.attribute"/> <parameter key="29" value="sensor27.true.integer.attribute"/> <parameter key="30" value="sensor28.true.integer.attribute"/> <parameter key="31" value="sensor29.true.integer.attribute"/> <parameter key="32" value="sensor30.true.integer.attribute"/> <parameter key="33" value="sensor31.true.integer.attribute"/> <parameter key="34" value="sensor32.true.integer.attribute"/> <parameter key="35" value="sensor33.true.integer.attribute"/> <parameter key="36" value="sensor34.true.integer.attribute"/> <parameter key="37" value="sensor35.true.integer.attribute"/> <parameter key="38" value="sensor36.true.integer.attribute"/> <parameter key="39" value="sensor37.true.integer.attribute"/> <parameter key="40" value="sensor38.true.integer.attribute"/> <parameter key="41" value="sensor39.true.numeric.attribute"/> <parameter key="42" value="sensor40.true.numeric.attribute"/> <parameter key="43" value="sensor41.true.numeric.attribute"/> <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/> <parameter key="45" value="yieldIncreaseB.true.real.attribute"/> </list> </operator> <connect from_port="file object" to_op="Read Excel" to_port="file"/> <connect from_op="Read Excel" from_port="output" to_port="output 1"/> <portSpacing port="source_file object" spacing="0"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="concurrency:loop_files" compatibility="7.5.000" expanded="true" height="82" name="test" width="90" x="45" y="391"> <parameter key="directory" value="D:\RMCompetition\RM_Competition_TestData_random\RM_Competition_TestData_random"/> <parameter key="filter_by_glob" value="test*.xlsx"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="7.5.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="246" y="34"> <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/> <parameter key="imported_cell_range" value="A1:AT38"/> <parameter key="first_row_as_names" value="false"/> <list key="annotations"> <parameter key="0" value="Name"/> </list> <list key="data_set_meta_data_information"> <parameter key="0" value="Id.true.integer.attribute"/> <parameter key="1" value="hour.true.integer.attribute"/> <parameter key="2" value="Label.true.polynominal.attribute"/> <parameter key="3" value="sensor1.true.integer.attribute"/> <parameter key="4" value="sensor2.true.integer.attribute"/> <parameter key="5" value="sensor3.true.integer.attribute"/> <parameter key="6" value="sensor4.true.integer.attribute"/> <parameter key="7" value="sensor5.true.integer.attribute"/> <parameter key="8" value="sensor6.true.integer.attribute"/> <parameter key="9" value="sensor7.true.integer.attribute"/> <parameter key="10" value="sensor8.true.integer.attribute"/> <parameter key="11" value="sensor9.true.integer.attribute"/> <parameter key="12" value="sensor10.true.integer.attribute"/> <parameter key="13" value="sensor11.true.integer.attribute"/> <parameter key="14" value="sensor12.true.integer.attribute"/> <parameter key="15" value="sensor13.true.integer.attribute"/> <parameter key="16" value="sensor14.true.integer.attribute"/> <parameter key="17" value="sensor15.true.integer.attribute"/> <parameter key="18" value="sensor16.true.integer.attribute"/> <parameter key="19" value="sensor17.true.integer.attribute"/> <parameter key="20" value="sensor18.true.integer.attribute"/> <parameter key="21" value="sensor19.true.integer.attribute"/> <parameter key="22" value="sensor20.true.integer.attribute"/> <parameter key="23" value="sensor21.true.integer.attribute"/> <parameter key="24" value="sensor22.true.integer.attribute"/> <parameter key="25" value="sensor23.true.integer.attribute"/> <parameter key="26" value="sensor24.true.integer.attribute"/> <parameter key="27" value="sensor25.true.integer.attribute"/> <parameter key="28" value="sensor26.true.integer.attribute"/> <parameter key="29" value="sensor27.true.integer.attribute"/> <parameter key="30" value="sensor28.true.integer.attribute"/> <parameter key="31" value="sensor29.true.integer.attribute"/> <parameter key="32" value="sensor30.true.integer.attribute"/> <parameter key="33" value="sensor31.true.integer.attribute"/> <parameter key="34" value="sensor32.true.integer.attribute"/> <parameter key="35" value="sensor33.true.integer.attribute"/> <parameter key="36" value="sensor34.true.integer.attribute"/> <parameter key="37" value="sensor35.true.integer.attribute"/> <parameter key="38" value="sensor36.true.integer.attribute"/> <parameter key="39" value="sensor37.true.integer.attribute"/> <parameter key="40" value="sensor38.true.integer.attribute"/> <parameter key="41" value="sensor39.true.numeric.attribute"/> <parameter key="42" value="sensor40.true.numeric.attribute"/> <parameter key="43" value="sensor41.true.numeric.attribute"/> <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/> <parameter key="45" value="yieldIncreaseB.true.real.attribute"/> </list> </operator> <connect from_port="file object" to_op="Read Excel (2)" to_port="file"/> <connect from_op="Read Excel (2)" from_port="output" to_port="output 1"/> <portSpacing port="source_file object" spacing="0"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="append" compatibility="7.5.000" expanded="true" height="82" name="Append (2)" width="90" x="179" y="391"/> <operator activated="true" class="append" compatibility="7.5.000" expanded="true" height="82" name="Append" width="90" x="179" y="34"/> <operator activated="true" class="sort" compatibility="7.5.000" expanded="true" height="82" name="Sort" width="90" x="179" y="136"> <parameter key="attribute_name" value="Id"/> </operator> <operator activated="true" class="replace_missing_values" compatibility="7.5.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB|sensor41"/> <parameter key="default" value="value"/> <list key="columns"/> <parameter key="replenishment_value" value="0"/> </operator> <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples (3)" width="90" x="447" y="34"> <list key="filters_list"> <parameter key="filters_entry_key" value="hour.eq.8"/> </list> </operator> <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="Label|sensor41|yieldIncreaseA|yieldIncreaseB"/> </operator> <operator activated="true" class="set_role" compatibility="7.5.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="34"> <parameter key="attribute_name" value="Label"/> <parameter key="target_role" value="label"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.000" expanded="true" height="145" name="Validation" width="90" x="849" y="34"> <parameter key="sampling_type" value="shuffled sampling"/> <parameter key="use_local_random_seed" value="true"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="neural_net" compatibility="7.5.000" expanded="true" height="82" name="Neural Net" width="90" x="313" y="34"> <list key="hidden_layers"/> <parameter key="use_local_random_seed" value="true"/> </operator> <operator activated="false" class="logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression (3)" width="90" x="313" y="238"/> <connect from_port="training set" to_op="Neural Net" to_port="training set"/> <connect from_op="Neural Net" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="7.5.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"> <list key="application_parameters"/> </operator> <operator activated="true" class="performance" compatibility="7.5.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/> <connect from_op="Performance" from_port="performance" to_port="performance 1"/> <connect from_op="Performance" from_port="example set" to_port="test set results"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <operator activated="true" class="find_threshold" compatibility="7.5.000" expanded="true" height="82" name="Find Threshold" width="90" x="983" y="34"> <parameter key="define_labels" value="true"/> <parameter key="first_label" value="A"/> <parameter key="second_label" value="B"/> <parameter key="misclassification_costs_second" value="3.53"/> <parameter key="use_example_weights" value="false"/> <parameter key="roc_bias" value="neutral"/> </operator> <operator activated="true" class="sort" compatibility="7.5.000" expanded="true" height="82" name="Sort (2)" width="90" x="179" y="493"> <parameter key="attribute_name" value="Id"/> </operator> <operator activated="true" class="replace_missing_values" compatibility="7.5.000" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="313" y="391"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB|sensor41"/> <parameter key="default" value="value"/> <list key="columns"/> <parameter key="replenishment_value" value="0"/> </operator> <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples (4)" width="90" x="447" y="391"> <list key="filters_list"> <parameter key="filters_entry_key" value="hour.eq.8"/> </list> </operator> <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="581" y="391"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="Label|sensor41|yieldIncreaseA|yieldIncreaseB"/> </operator> <operator activated="true" class="set_role" compatibility="7.5.000" expanded="true" height="82" name="Set Role (2)" width="90" x="715" y="391"> <parameter key="attribute_name" value="Label"/> <parameter key="target_role" value="label"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="apply_model" compatibility="7.5.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="849" y="391"> <list key="application_parameters"/> </operator> <operator activated="true" class="apply_threshold" compatibility="7.5.000" expanded="true" height="82" name="Apply Threshold" width="90" x="983" y="391"/> <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="1184" y="391"> <list key="function_descriptions"> <parameter key="Score" value="if(Label != [prediction(Label)], -100, if(Label == &quot;A&quot;, yieldIncreaseA, yieldIncreaseB))"/> </list> </operator> <operator activated="true" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="1184" y="493"> <list key="aggregation_attributes"> <parameter key="Score" value="sum"/> </list> </operator> <connect from_op="training" from_port="output 1" to_op="Append" to_port="example set 1"/> <connect from_op="test" from_port="output 1" to_op="Append (2)" to_port="example set 1"/> <connect from_op="Append (2)" from_port="merged set" to_op="Sort (2)" to_port="example set input"/> <connect from_op="Append" from_port="merged set" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/> <connect from_op="Replace Missing Values" from_port="example set output" to_op="Filter Examples (3)" to_port="example set input"/> <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="example set"/> <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/> <connect from_op="Validation" from_port="test result set" to_op="Find Threshold" to_port="example set"/> <connect from_op="Find Threshold" from_port="threshold" to_op="Apply Threshold" to_port="threshold"/> <connect from_op="Sort (2)" from_port="example set output" to_op="Replace Missing Values (2)" to_port="example set input"/> <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Filter Examples (4)" to_port="example set input"/> <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/> <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/> <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Threshold" to_port="example set"/> <connect from_op="Apply Threshold" from_port="example set" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/> <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="center" color="yellow" colored="false" height="50" resized="true" width="786" x="51" y="594">Enhancement: Combine this processing into a single process called using Execute Process to avoid having to duplicate and possibly make errors</description> <description align="center" color="yellow" colored="false" height="215" resized="true" width="141" x="298" y="158">Naively replace any missing values with 0. This means that there will be no yield increase if classification is correct, and -100 if it is incorrect. Imputation may yield a minor improvement.</description> <description align="center" color="yellow" colored="false" height="120" resized="false" width="180" x="74" y="244">Loop through all files simply reading them in and Appending to make a single example set. Sorting is important to get a reproducible result</description> <description align="center" color="yellow" colored="false" height="90" resized="true" width="145" x="502" y="203">Select only sensor 41, yieldIncreaseA and yieldIncreaseB at the selected hour</description> <description align="center" color="yellow" colored="false" height="199" resized="true" width="180" x="670" y="132">Build a model that assumes that the selected hour is when the additional nutrient is added. Setting the random seed ensures reproducibility when using Optimize Parameters to find the best hour and threshold that can be used in this process.</description> <description align="center" color="yellow" colored="false" height="120" resized="false" width="180" x="1090" y="588">Calculate the final score assuming the yield values at the selected hour are the ones to use with correct classification, and -100 otherwise</description> <description align="center" color="yellow" colored="false" height="156" resized="false" width="187" x="1104" y="41">Finding a threshold allows the final score to be optimized along with the best hour using an Optimize Parameters approach. It is vital to specify the explicit labels to get a reproducible result.</description> </process> </operator> </process>

 

Highlighted
Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

** JUST A REMINDER TO ALL - WE ARE AT APPROXIMATELY 1 DAY REMAINING FOR SUBMISSION.  I RECOMMEND THAT YOU USE THE POSTED "SCORING PROCESS" WITH SUBMISSION TO MAKE IT EASIER FOR US TO SCORE.  THANK YOU AND GOOD LUCK! **

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="random_seed" value="-1"/>
<process expanded="true">
<operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="train" width="90" x="45" y="136">
<parameter key="directory" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/RM Competitions/Comp1-Mars Farming-Sept 2017/RM_Competition_TrainingData_random"/>
<parameter key="filter_type" value="regex"/>
<parameter key="filter_by_glob" value="tra*.xlsx"/>
<parameter key="filter_by_regex" value="train.*.xlsx"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="112" y="34">
<parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
<parameter key="imported_cell_range" value="A1:AT38"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Id.true.integer.attribute"/>
<parameter key="1" value="hour.true.integer.attribute"/>
<parameter key="2" value="Label.true.polynominal.attribute"/>
<parameter key="3" value="sensor1.true.integer.attribute"/>
<parameter key="4" value="sensor2.true.integer.attribute"/>
<parameter key="5" value="sensor3.true.integer.attribute"/>
<parameter key="6" value="sensor4.true.integer.attribute"/>
<parameter key="7" value="sensor5.true.integer.attribute"/>
<parameter key="8" value="sensor6.true.integer.attribute"/>
<parameter key="9" value="sensor7.true.integer.attribute"/>
<parameter key="10" value="sensor8.true.integer.attribute"/>
<parameter key="11" value="sensor9.true.integer.attribute"/>
<parameter key="12" value="sensor10.true.integer.attribute"/>
<parameter key="13" value="sensor11.true.integer.attribute"/>
<parameter key="14" value="sensor12.true.integer.attribute"/>
<parameter key="15" value="sensor13.true.integer.attribute"/>
<parameter key="16" value="sensor14.true.integer.attribute"/>
<parameter key="17" value="sensor15.true.integer.attribute"/>
<parameter key="18" value="sensor16.true.integer.attribute"/>
<parameter key="19" value="sensor17.true.integer.attribute"/>
<parameter key="20" value="sensor18.true.integer.attribute"/>
<parameter key="21" value="sensor19.true.integer.attribute"/>
<parameter key="22" value="sensor20.true.integer.attribute"/>
<parameter key="23" value="sensor21.true.integer.attribute"/>
<parameter key="24" value="sensor22.true.integer.attribute"/>
<parameter key="25" value="sensor23.true.integer.attribute"/>
<parameter key="26" value="sensor24.true.integer.attribute"/>
<parameter key="27" value="sensor25.true.integer.attribute"/>
<parameter key="28" value="sensor26.true.integer.attribute"/>
<parameter key="29" value="sensor27.true.integer.attribute"/>
<parameter key="30" value="sensor28.true.integer.attribute"/>
<parameter key="31" value="sensor29.true.integer.attribute"/>
<parameter key="32" value="sensor30.true.integer.attribute"/>
<parameter key="33" value="sensor31.true.integer.attribute"/>
<parameter key="34" value="sensor32.true.integer.attribute"/>
<parameter key="35" value="sensor33.true.integer.attribute"/>
<parameter key="36" value="sensor34.true.integer.attribute"/>
<parameter key="37" value="sensor35.true.integer.attribute"/>
<parameter key="38" value="sensor36.true.integer.attribute"/>
<parameter key="39" value="sensor37.true.integer.attribute"/>
<parameter key="40" value="sensor38.true.integer.attribute"/>
<parameter key="41" value="sensor39.true.numeric.attribute"/>
<parameter key="42" value="sensor40.true.numeric.attribute"/>
<parameter key="43" value="sensor41.true.numeric.attribute"/>
<parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
<parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
</list>
</operator>
<connect from_port="file object" to_op="Read Excel (3)" to_port="file"/>
<connect from_op="Read Excel (3)" from_port="output" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (3)" width="90" x="179" y="136"/>
<operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="0"/>
<description align="center" color="transparent" colored="false" width="126">replace missing yield values with zero</description>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess (4)" width="90" x="514" y="187">
<process expanded="true">
<operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="45" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Id"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (3)" width="90" x="179" y="34">
<parameter key="attribute_name" value="Id"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values (2)" width="90" x="313" y="34">
<parameter key="attribute" value="Id"/>
<parameter key="iteration_macro" value="id"/>
<process expanded="true">
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (6)" width="90" x="45" y="34">
<list key="function_descriptions">
<parameter key="nutrientPrediction" value="&quot;A&quot;"/>
<parameter key="hourPrediction" value="13"/>
<parameter key="hourPredictionMatch" value="if(hour==hourPrediction,TRUE,FALSE)"/>
</list>
<description align="center" color="transparent" colored="false" width="126">THIS IS WHAT YOUR MODEL SHOULD DO - THIS OPERATOR IS JUST SELECTING THE NUTRIENT AND HOUR AT RANDOM</description>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples (3)" width="90" x="179" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="Id.equals.%{id}"/>
<parameter key="filters_entry_key" value="hourPredictionMatch.equals.true"/>
</list>
</operator>
<connect from_port="input 1" to_op="Generate Attributes (6)" to_port="example set input"/>
<connect from_op="Generate Attributes (6)" from_port="example set output" to_op="Filter Examples (3)" to_port="example set input"/>
<connect from_op="Filter Examples (3)" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (4)" width="90" x="447" y="34"/>
<operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="Reorder Attributes (4)" width="90" x="581" y="34">
<parameter key="attribute_ordering" value="Label|hour|hourPrediction|nutrientPrediction|yieldIncreaseA|yieldIncreaseB"/>
</operator>
<connect from_port="in 1" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
<connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
<connect from_op="Set Role (3)" from_port="example set output" to_op="Loop Values (2)" to_port="input 1"/>
<connect from_op="Loop Values (2)" from_port="output 1" to_op="Append (4)" to_port="example set 1"/>
<connect from_op="Append (4)" from_port="merged set" to_op="Reorder Attributes (4)" to_port="example set input"/>
<connect from_op="Reorder Attributes (4)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">MODELING</description>
</operator>
<operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="test" width="90" x="45" y="544">
<parameter key="directory" value="/Users/genzerconsulting/OneDrive - RapidMiner/OneDrive Repository/RM Competitions/Comp1-Mars Farming-Sept 2017/RM_Competition_TestData_random"/>
<parameter key="filter_type" value="regex"/>
<parameter key="filter_by_glob" value="test*.xlsx"/>
<parameter key="filter_by_regex" value="test.*.xlsx"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="34">
<parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
<parameter key="imported_cell_range" value="A1:AT38"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Id.true.integer.attribute"/>
<parameter key="1" value="hour.true.integer.attribute"/>
<parameter key="2" value="Label.true.polynominal.attribute"/>
<parameter key="3" value="sensor1.true.integer.attribute"/>
<parameter key="4" value="sensor2.true.integer.attribute"/>
<parameter key="5" value="sensor3.true.integer.attribute"/>
<parameter key="6" value="sensor4.true.integer.attribute"/>
<parameter key="7" value="sensor5.true.integer.attribute"/>
<parameter key="8" value="sensor6.true.integer.attribute"/>
<parameter key="9" value="sensor7.true.integer.attribute"/>
<parameter key="10" value="sensor8.true.integer.attribute"/>
<parameter key="11" value="sensor9.true.integer.attribute"/>
<parameter key="12" value="sensor10.true.integer.attribute"/>
<parameter key="13" value="sensor11.true.integer.attribute"/>
<parameter key="14" value="sensor12.true.integer.attribute"/>
<parameter key="15" value="sensor13.true.integer.attribute"/>
<parameter key="16" value="sensor14.true.integer.attribute"/>
<parameter key="17" value="sensor15.true.integer.attribute"/>
<parameter key="18" value="sensor16.true.integer.attribute"/>
<parameter key="19" value="sensor17.true.integer.attribute"/>
<parameter key="20" value="sensor18.true.integer.attribute"/>
<parameter key="21" value="sensor19.true.integer.attribute"/>
<parameter key="22" value="sensor20.true.integer.attribute"/>
<parameter key="23" value="sensor21.true.integer.attribute"/>
<parameter key="24" value="sensor22.true.integer.attribute"/>
<parameter key="25" value="sensor23.true.integer.attribute"/>
<parameter key="26" value="sensor24.true.integer.attribute"/>
<parameter key="27" value="sensor25.true.integer.attribute"/>
<parameter key="28" value="sensor26.true.integer.attribute"/>
<parameter key="29" value="sensor27.true.integer.attribute"/>
<parameter key="30" value="sensor28.true.integer.attribute"/>
<parameter key="31" value="sensor29.true.integer.attribute"/>
<parameter key="32" value="sensor30.true.integer.attribute"/>
<parameter key="33" value="sensor31.true.integer.attribute"/>
<parameter key="34" value="sensor32.true.integer.attribute"/>
<parameter key="35" value="sensor33.true.integer.attribute"/>
<parameter key="36" value="sensor34.true.integer.attribute"/>
<parameter key="37" value="sensor35.true.integer.attribute"/>
<parameter key="38" value="sensor36.true.integer.attribute"/>
<parameter key="39" value="sensor37.true.integer.attribute"/>
<parameter key="40" value="sensor38.true.integer.attribute"/>
<parameter key="41" value="sensor39.true.numeric.attribute"/>
<parameter key="42" value="sensor40.true.numeric.attribute"/>
<parameter key="43" value="sensor41.true.numeric.attribute"/>
<parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
<parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="0"/>
</operator>
<connect from_port="file object" to_op="Read Excel (2)" to_port="file"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Replace Missing Values (2)" to_port="example set input"/>
<connect from_op="Replace Missing Values (2)" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (2)" width="90" x="179" y="544"/>
<operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values (3)" width="90" x="313" y="544">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="0"/>
<description align="center" color="transparent" colored="false" width="126">replace missing yield values with zero</description>
</operator>
<operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="514" y="544">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="Subprocess (2)" width="90" x="715" y="544">
<process expanded="true">
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="45" y="34">
<list key="function_descriptions">
<parameter key="nutrientCorrect" value="if(Label==nutrientPrediction,TRUE,FALSE)"/>
</list>
<description align="center" color="transparent" colored="false" width="126">nutrientCorrect and hourMatch</description>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="SCORE" value="if(nutrientCorrect==TRUE&amp;&amp;Label==&quot;A&quot;,yieldIncreaseA,&#10;if(nutrientCorrect==TRUE&amp;&amp;Label==&quot;B&quot;,yieldIncreaseB,-100))"/>
</list>
<description align="center" color="transparent" colored="false" width="126">SCORE</description>
</operator>
<operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="Reorder Attributes (2)" width="90" x="313" y="34">
<parameter key="attribute_ordering" value="Label|hour|hourPrediction|nutrientPrediction|nutrientCorrect|yieldIncreaseA|yieldIncreaseB|SCORE"/>
</operator>
<operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
<list key="aggregation_attributes">
<parameter key="SCORE" value="sum"/>
</list>
</operator>
<connect from_port="in 1" to_op="Generate Attributes (3)" to_port="example set input"/>
<connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Generate Attributes (4)" to_port="example set input"/>
<connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Reorder Attributes (2)" to_port="example set input"/>
<connect from_op="Reorder Attributes (2)" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="out 1"/>
<connect from_op="Aggregate" from_port="original" to_port="out 2"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">SCORING</description>
</operator>
<connect from_op="train" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
<connect from_op="Append (3)" from_port="merged set" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Subprocess (4)" to_port="in 1"/>
<connect from_op="Subprocess (4)" from_port="out 1" to_op="Apply Model" to_port="model"/>
<connect from_op="test" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
<connect from_op="Append (2)" from_port="merged set" to_op="Replace Missing Values (3)" to_port="example set input"/>
<connect from_op="Replace Missing Values (3)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Subprocess (2)" to_port="in 1"/>
<connect from_op="Subprocess (2)" from_port="out 1" to_port="result 1"/>
<connect from_op="Subprocess (2)" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<description align="center" color="yellow" colored="false" height="309" resized="false" width="183" x="458" y="11">this is your model that you have built from the training set - it will generate two new attributes: nutrientPredicted and hourPredicted - my &amp;quot;model&amp;quot; here always picks nutrient A at hour 13.</description>
<description align="center" color="yellow" colored="false" height="267" resized="true" width="178" x="671" y="447">this is the scoring of my &amp;quot;model&amp;quot; - pretty terrible. The goal is to get this aggregate score &amp;#8805; 1000</description>
</process>
</operator>
</process>
Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Partner
RM Partner

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

I also managed to get above the baseline. I'll integrate my model into the scoring process and post later after work.


Marius Helf
LinkedInTwitter
RM Certified Expert
RM Certified Expert

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

I managed to improve mine. Here it is...

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="concurrency:loop_files" compatibility="7.5.000" expanded="true" height="82" name="training" width="90" x="45" y="34">
        <parameter key="directory" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random"/>
        <parameter key="filter_by_glob" value="training*.xlsx"/>
        <parameter key="enable_parallel_execution" value="false"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="7.5.000" expanded="true" height="68" name="Read Excel" width="90" x="246" y="34">
            <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
            <parameter key="imported_cell_range" value="A1:AT38"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Id.true.integer.attribute"/>
              <parameter key="1" value="hour.true.integer.attribute"/>
              <parameter key="2" value="Label.true.polynominal.attribute"/>
              <parameter key="3" value="sensor1.true.integer.attribute"/>
              <parameter key="4" value="sensor2.true.integer.attribute"/>
              <parameter key="5" value="sensor3.true.integer.attribute"/>
              <parameter key="6" value="sensor4.true.integer.attribute"/>
              <parameter key="7" value="sensor5.true.integer.attribute"/>
              <parameter key="8" value="sensor6.true.integer.attribute"/>
              <parameter key="9" value="sensor7.true.integer.attribute"/>
              <parameter key="10" value="sensor8.true.integer.attribute"/>
              <parameter key="11" value="sensor9.true.integer.attribute"/>
              <parameter key="12" value="sensor10.true.integer.attribute"/>
              <parameter key="13" value="sensor11.true.integer.attribute"/>
              <parameter key="14" value="sensor12.true.integer.attribute"/>
              <parameter key="15" value="sensor13.true.integer.attribute"/>
              <parameter key="16" value="sensor14.true.integer.attribute"/>
              <parameter key="17" value="sensor15.true.integer.attribute"/>
              <parameter key="18" value="sensor16.true.integer.attribute"/>
              <parameter key="19" value="sensor17.true.integer.attribute"/>
              <parameter key="20" value="sensor18.true.integer.attribute"/>
              <parameter key="21" value="sensor19.true.integer.attribute"/>
              <parameter key="22" value="sensor20.true.integer.attribute"/>
              <parameter key="23" value="sensor21.true.integer.attribute"/>
              <parameter key="24" value="sensor22.true.integer.attribute"/>
              <parameter key="25" value="sensor23.true.integer.attribute"/>
              <parameter key="26" value="sensor24.true.integer.attribute"/>
              <parameter key="27" value="sensor25.true.integer.attribute"/>
              <parameter key="28" value="sensor26.true.integer.attribute"/>
              <parameter key="29" value="sensor27.true.integer.attribute"/>
              <parameter key="30" value="sensor28.true.integer.attribute"/>
              <parameter key="31" value="sensor29.true.integer.attribute"/>
              <parameter key="32" value="sensor30.true.integer.attribute"/>
              <parameter key="33" value="sensor31.true.integer.attribute"/>
              <parameter key="34" value="sensor32.true.integer.attribute"/>
              <parameter key="35" value="sensor33.true.integer.attribute"/>
              <parameter key="36" value="sensor34.true.integer.attribute"/>
              <parameter key="37" value="sensor35.true.integer.attribute"/>
              <parameter key="38" value="sensor36.true.integer.attribute"/>
              <parameter key="39" value="sensor37.true.integer.attribute"/>
              <parameter key="40" value="sensor38.true.integer.attribute"/>
              <parameter key="41" value="sensor39.true.numeric.attribute"/>
              <parameter key="42" value="sensor40.true.numeric.attribute"/>
              <parameter key="43" value="sensor41.true.numeric.attribute"/>
              <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
              <parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
            </list>
          </operator>
          <connect from_port="file object" to_op="Read Excel" to_port="file"/>
          <connect from_op="Read Excel" from_port="output" to_port="output 1"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:loop_files" compatibility="7.5.000" expanded="true" height="82" name="test" width="90" x="45" y="391">
        <parameter key="directory" value="D:\RMCompetition\RM_Competition_TestData_random\RM_Competition_TestData_random"/>
        <parameter key="filter_by_glob" value="test*.xlsx"/>
        <parameter key="enable_parallel_execution" value="false"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="7.5.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="246" y="34">
            <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
            <parameter key="imported_cell_range" value="A1:AT38"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Id.true.integer.attribute"/>
              <parameter key="1" value="hour.true.integer.attribute"/>
              <parameter key="2" value="Label.true.polynominal.attribute"/>
              <parameter key="3" value="sensor1.true.integer.attribute"/>
              <parameter key="4" value="sensor2.true.integer.attribute"/>
              <parameter key="5" value="sensor3.true.integer.attribute"/>
              <parameter key="6" value="sensor4.true.integer.attribute"/>
              <parameter key="7" value="sensor5.true.integer.attribute"/>
              <parameter key="8" value="sensor6.true.integer.attribute"/>
              <parameter key="9" value="sensor7.true.integer.attribute"/>
              <parameter key="10" value="sensor8.true.integer.attribute"/>
              <parameter key="11" value="sensor9.true.integer.attribute"/>
              <parameter key="12" value="sensor10.true.integer.attribute"/>
              <parameter key="13" value="sensor11.true.integer.attribute"/>
              <parameter key="14" value="sensor12.true.integer.attribute"/>
              <parameter key="15" value="sensor13.true.integer.attribute"/>
              <parameter key="16" value="sensor14.true.integer.attribute"/>
              <parameter key="17" value="sensor15.true.integer.attribute"/>
              <parameter key="18" value="sensor16.true.integer.attribute"/>
              <parameter key="19" value="sensor17.true.integer.attribute"/>
              <parameter key="20" value="sensor18.true.integer.attribute"/>
              <parameter key="21" value="sensor19.true.integer.attribute"/>
              <parameter key="22" value="sensor20.true.integer.attribute"/>
              <parameter key="23" value="sensor21.true.integer.attribute"/>
              <parameter key="24" value="sensor22.true.integer.attribute"/>
              <parameter key="25" value="sensor23.true.integer.attribute"/>
              <parameter key="26" value="sensor24.true.integer.attribute"/>
              <parameter key="27" value="sensor25.true.integer.attribute"/>
              <parameter key="28" value="sensor26.true.integer.attribute"/>
              <parameter key="29" value="sensor27.true.integer.attribute"/>
              <parameter key="30" value="sensor28.true.integer.attribute"/>
              <parameter key="31" value="sensor29.true.integer.attribute"/>
              <parameter key="32" value="sensor30.true.integer.attribute"/>
              <parameter key="33" value="sensor31.true.integer.attribute"/>
              <parameter key="34" value="sensor32.true.integer.attribute"/>
              <parameter key="35" value="sensor33.true.integer.attribute"/>
              <parameter key="36" value="sensor34.true.integer.attribute"/>
              <parameter key="37" value="sensor35.true.integer.attribute"/>
              <parameter key="38" value="sensor36.true.integer.attribute"/>
              <parameter key="39" value="sensor37.true.integer.attribute"/>
              <parameter key="40" value="sensor38.true.integer.attribute"/>
              <parameter key="41" value="sensor39.true.numeric.attribute"/>
              <parameter key="42" value="sensor40.true.numeric.attribute"/>
              <parameter key="43" value="sensor41.true.numeric.attribute"/>
              <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
              <parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
            </list>
          </operator>
          <connect from_port="file object" to_op="Read Excel (2)" to_port="file"/>
          <connect from_op="Read Excel (2)" from_port="output" to_port="output 1"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="7.5.000" expanded="true" height="82" name="Append (2)" width="90" x="179" y="391"/>
      <operator activated="true" class="append" compatibility="7.5.000" expanded="true" height="82" name="Append" width="90" x="179" y="34"/>
      <operator activated="true" class="sort" compatibility="7.5.000" expanded="true" height="82" name="Sort" width="90" x="179" y="136">
        <parameter key="attribute_name" value="Id"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="7.5.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB|sensor41"/>
        <parameter key="default" value="value"/>
        <list key="columns"/>
        <parameter key="replenishment_value" value="0"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples (3)" width="90" x="447" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="hour.eq.29"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Label|sensor41|yieldIncreaseA|yieldIncreaseB|sensor9|sensor8|sensor7|sensor6|sensor5|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.5.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="34">
        <parameter key="attribute_name" value="Label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.000" expanded="true" height="145" name="Validation" width="90" x="849" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="enable_parallel_execution" value="false"/>
        <process expanded="true">
          <operator activated="true" class="logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression (SVM)" width="90" x="322" y="34"/>
          <operator activated="false" class="logistic_regression" compatibility="7.5.000" expanded="true" height="103" name="Logistic Regression (3)" width="90" x="313" y="238"/>
          <connect from_port="training set" to_op="Logistic Regression (SVM)" to_port="training set"/>
          <connect from_op="Logistic Regression (SVM)" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.5.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="7.5.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="find_threshold" compatibility="7.5.000" expanded="true" height="82" name="Find Threshold" width="90" x="983" y="34">
        <parameter key="define_labels" value="true"/>
        <parameter key="first_label" value="A"/>
        <parameter key="second_label" value="B"/>
        <parameter key="misclassification_costs_second" value="3.53"/>
        <parameter key="use_example_weights" value="false"/>
        <parameter key="roc_bias" value="neutral"/>
      </operator>
      <operator activated="true" class="sort" compatibility="7.5.000" expanded="true" height="82" name="Sort (2)" width="90" x="179" y="493">
        <parameter key="attribute_name" value="Id"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="7.5.000" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="313" y="391">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB|sensor41"/>
        <parameter key="default" value="value"/>
        <list key="columns"/>
        <parameter key="replenishment_value" value="0"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples (4)" width="90" x="447" y="391">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="hour.eq.29"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="581" y="391">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Label|sensor41|yieldIncreaseA|yieldIncreaseB|sensor9|sensor8|sensor7|sensor6|sensor5|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.5.000" expanded="true" height="82" name="Set Role (2)" width="90" x="715" y="391">
        <parameter key="attribute_name" value="Label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.5.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="849" y="391">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="apply_threshold" compatibility="7.5.000" expanded="true" height="82" name="Apply Threshold" width="90" x="983" y="391"/>
      <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="1184" y="391">
        <list key="function_descriptions">
          <parameter key="Score" value="if(Label != [prediction(Label)], -100, if(Label == &quot;A&quot;, yieldIncreaseA, yieldIncreaseB))"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="1184" y="493">
        <list key="aggregation_attributes">
          <parameter key="Score" value="sum"/>
        </list>
      </operator>
      <connect from_op="training" from_port="output 1" to_op="Append" to_port="example set 1"/>
      <connect from_op="test" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
      <connect from_op="Append (2)" from_port="merged set" to_op="Sort (2)" to_port="example set input"/>
      <connect from_op="Append" from_port="merged set" to_op="Sort" to_port="example set input"/>
      <connect from_op="Sort" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Filter Examples (3)" to_port="example set input"/>
      <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="example set"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="test result set" to_op="Find Threshold" to_port="example set"/>
      <connect from_op="Find Threshold" from_port="threshold" to_op="Apply Threshold" to_port="threshold"/>
      <connect from_op="Sort (2)" from_port="example set output" to_op="Replace Missing Values (2)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Filter Examples (4)" to_port="example set input"/>
      <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
      <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Threshold" to_port="example set"/>
      <connect from_op="Apply Threshold" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <description align="center" color="yellow" colored="false" height="50" resized="true" width="786" x="51" y="594">Enhancement: Combine this processing into a single process called using Execute Process to avoid having to duplicate and possibly make errors</description>
      <description align="center" color="yellow" colored="false" height="215" resized="true" width="141" x="298" y="158">Naively replace any missing values with 0. This means that there will be no yield increase if classification is correct, and -100 if it is incorrect. Imputation may yield a minor improvement.</description>
      <description align="center" color="yellow" colored="false" height="120" resized="false" width="180" x="74" y="244">Loop through all files simply reading them in and Appending to make a single example set. Sorting is important to get a reproducible result</description>
      <description align="center" color="yellow" colored="false" height="90" resized="true" width="145" x="502" y="203">Select all attributes at the selected hour - the first hour should not be used.</description>
      <description align="center" color="yellow" colored="false" height="199" resized="true" width="180" x="670" y="132">Build a model that assumes that the selected hour is when the additional nutrient is added. Setting the random seed ensures reproducibility when using Optimize Parameters to find the best hour and threshold that can be used in this process.</description>
      <description align="center" color="yellow" colored="false" height="120" resized="false" width="180" x="1090" y="588">Calculate the final score assuming the yield values at the selected hour are the ones to use with correct classification, and -100 otherwise</description>
      <description align="center" color="yellow" colored="false" height="156" resized="false" width="187" x="1104" y="41">Finding a threshold allows the final score to be optimized along with the best hour using an Optimize Parameters approach. It is vital to specify the explicit labels to get a reproducible result.</description>
    </process>
  </operator>
</process>

 

Andrew