RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Prediction Analysis of Equipment Sales

draina1481draina1481 Member Posts: 4 Contributor I
edited November 2018 in Help

Hello! I want to predict the sales of Indian construction equipments such as Cranes and Compactors. I used the windowing operator and sliding window validation for my prediction. I got the accuracy 59%. My mentor suggessted i read another excel file having unknown values, this file will be conected to another windowing operator and then connected to the Apply model Operator along with the first one. However, after doing so, i am not getting any values in my Example Set in Results. No error or warning is shown yet am not getting the desired values. Please somebody guide me on this? Where am i going wrong? Have attached the screen shot of my model. 2017-03-31 (1).pngMy Model2017-03-31 (2).pngMy example set

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn
    Solution Accepted

    Reading in the unknown values to get a prediction is what is called "scoring."  For this to work, you'll need a brand new set of input data so it can predict the label from those input values (ie. cranes-1, cranes-0). You can't predict your label if you don't have any inputs. 

     

    Take a look at the sample attached and follow the scoring part, you'll see that you need input data without a label to get your predictions. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="target_function" value="driller oscillation timeseries"/>
    <parameter key="number_examples" value="500"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.4.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
    <parameter key="attribute_name" value="att5"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="7.4.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="34">
    <list key="columns"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="447" y="34">
    <parameter key="window_size" value="10"/>
    <parameter key="create_label" value="true"/>
    <parameter key="label_attribute" value="att5"/>
    </operator>
    <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="581" y="34">
    <parameter key="training_window_width" value="10"/>
    <parameter key="test_window_width" value="10"/>
    <process expanded="true">
    <operator activated="true" class="support_vector_machine" compatibility="7.4.000" expanded="true" height="124" name="SVM" width="90" x="179" y="34">
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="0.01"/>
    <parameter key="C" value="1000.0"/>
    </operator>
    <connect from_port="training" to_op="SVM" to_port="training set"/>
    <connect from_op="SVM" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.4.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="horizon" value="1"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="238">
    <parameter key="target_function" value="driller oscillation timeseries"/>
    <parameter key="number_examples" value="500"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="238">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att5"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="7.4.000" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="313" y="238">
    <list key="columns"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="447" y="238">
    <parameter key="window_size" value="10"/>
    <parameter key="label_attribute" value="att5"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.4.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="782" y="187">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Windowing" to_port="example set input"/>
    <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
    <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values (2)" to_port="example set input"/>
    <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
    <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    So a few things, your forecast performance could possibly be optimized. The size of the Training and Testing window widths and the algorithm you use for training will greatly affect how well you forecast the trend. I typically use an Optimize Parameters to optimize the training and testing widths and the gamma and C value of a SVM algo with an RBF kernel.  That typically gives me good results for a time series.

     

    Somethings not right with your scoring process (the lower branch of the process). I can't tell because you didn't post the process but for hte scoring data set you should not include a "label" role. I see a Set Role and wonder if you created a label by mistake. Also, in the second windowing operator, you shouldn't toggle on the label attribute, because that's what you're trying to predict.

     

    Double check this and try again. 

  • draina1481draina1481 Member Posts: 4 Contributor I

    Hello Thomas_Ott ! Thanks for your prompt reply! I did as you suggested. I didn't toggle on the label attribute neither did i create a label . I used set role and defined date as my attribute with the role of id. Yet, the result is the same. No values are getting generated. Please guide me.2017-03-31 (2).pngExample Set

     

     

    2017-03-31 (3).pngMY Set Role variables

     

     

    2017-03-31 (4).pngMy Windowing variables. I didn't togge the Label operator in the 2nd windowing operator.

     

     

    2017-03-31 (6).pngThe validation parameters

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Your windowed data is all blank too. I see nothing by "?'s" in your input data too.  Your scoring data is not being passed to the model, check upstream. 

  • draina1481draina1481 Member Posts: 4 Contributor I

    Umm, am actually a beginner, this is my major project for my final year in college,  hence might not be able to understand all the terms. In my first excel file i had the data sets of the sales of equipments from 2011-2016. I set Cranes ( an equipment) as my label and set my horizon to 1. I got my prediction trend accuracy as 65.1%. Here is the screen shot :- Fig . 1, 2 and 3. 2017-03-31 (8).pngFirst Model2017-03-31 (9).pngMy prediction accuracy2017-03-31 (10).pngMy Example SetIn my label attribute i got the values. Now, my mentor suggested that i should read another excel files which can have unknown values and these are the values which we need to predict. Hence, lets say for example, i want to predict the sales of 2017. So i set up my date and donot enter any values in the equipments columns. Now, as you already know that there are no values being generated in my example set. I am also attaching the screenshots of my complete model. Sorry to bother again, but could you pl explain to me where am i going wrong?2017-03-31 (4).pngMy Windowing variables. I didn't togge the Label operator in the 2nd windowing operator.2017-03-31 (6).pngThe validation parameters2017-03-31 (2).pngExample Set2017-03-31 (3).pngMY Set Role variablesSorry to cause any inconvinience. Really need some more enlightenment in this!

  • draina1481draina1481 Member Posts: 4 Contributor I

    Okay! I think i got it cleared up in my mind. Thanks a ton sir!

    Thomas_Ott
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Awesome! Hope to see you around the Community now!

    draina1481
Sign In or Register to comment.