Getting started help: predict sales based on several attributes for several products

brandjoe · April 2019

Hi Community

Disclaimer:

first-timer here, data science newbie, unfamiliar with the correct technical terminology. I'm somewhat good with concepts but neither strong in statistics, nor higher math nor programming

but I try

doing a bachelors degree of course I have basic statistics and programming knowledge, but am very untrained since years.

Background:

As part of my business information technology studies I'm working on my bachelor thesis "improved future sales forecasting by applying machine learning" (as opposed to simple compare-to-last-year-figures based prediction) together with a company operating convenience stores.

I have access to their BI system to pull historical sales data with several attributes, for example: date, shop, article, number sold.

Data preparation:

To develop a model, I have selected two customer contexts which may trigger a visit to the store to buy very specific goods: "grill party at lake" and "students breakfast".
I then looked at a handful shops close to lakes ("grill party") and/or universities ("students breakfast") and pulled the BI data of affected articles (Chips, Beers, Sausages, Bagels, Coffee, etc).

I then added several hopefully relevant attributes such as HasLake (is shop close to a lake), HasUniversity (is shop close to university), HasSemester (is transaction during or inbetween university semesters), HasHoliday (is it a public holiday) and weather figures (temp., amount sunshine, amount rain).

My current (anonymized simplified) example dataset is attached as Excel.

Trying my luck:

I am asking for help now, how to proceed best.

I remodelled my exampleset several times (articles as rows, articles as columns; more attributes, less attributes; ...) and tried to put together a process but failed horribly every time.

I then went for Auto Model. Deep learning and Gradient Boosted Trees yielded quite good results but a) produces a "black box model" difficult to get away with in a bachelor thesis and b) the automated feature selection seems to primarily target attributes which are not "generic" but highly specific to the exampleset, e.g. a single shop. This makes sense, as in the data, one specific shop has very high numbers for beer. But this makes the model not applicable to other customer contexts in other shops (which are not included in the exampleset; there's ~200 shops in total with 3000 articles each and at least a dozen contexts for some but not other shops, e.g. high volume highway petrol station has nothing to do with neither university nor grill party at lake).

I tried to get inspired by the Auto Models created and reproduce the results to a degree, but they are way too complex for me to properly understand what's happening and why certain parameters are tuned the way they are.

I figured setting "Shop" to cluster and setting "quarter" or "week" to either batch (I also tried vice versa, shop as batch and timeperiod as cluster) should improve feature selection. Apparently not, as set roles and special attributes are being purged when automodelling. Is deep learning or GBT the wrong approach? Should I do something with "forecast" given the exampleset? I'm at a loss.

Could I ask you guys and gals to support me to get off the starting line? Many many thanks in advance!

brandjoe · April 2019

I just had an idea:
could I loop through the exampleset shop by shop (all transactions, all articles, all dates, one shop only) and create a separate model for each shop? Then adding/removing shops and/or articles to the exampleset wouldn't play a role.
Or as an analogy, loop through an article at a time (all transactions, all dates, all shops, one article only)?
Or more generally speaking, loop through one specific attribute at a time and generate a model fitting to each specific loop.

That sounds a lot like clustering, but for clustering wouldn't I need to know in advance how many distinct articles there are? Something I can't know, in regards to articles available, each shop is (slightly or massively) different to the next one...

Telcontar120 · April 2019

Automodel is definitely a good starting point. For your problem I don't think you want to do clustering at all.
If you used AM then you had the option to try many different ML algorithms. GBT and DL are more powerful but as you say they produce black box models that are hard to interpret, although the "Explain Predictions" operator is helpful in identifying patterns. Only you can decide whether the tradeoff in performance vs simpler methods like Naive Bayes or Decision Trees is worthwhile.
If the shops really are very different, then looping through those and building a separate model would be sensible. But you should compare that to a global model to see whether the performance difference is worth it.

brandjoe · April 2019

Thanks for your contribution. Based on it, I have worked my way forward and tried to go with GBT and have since approached the problem from many different sides: extensive feature weighting, I have run optimization on the operator for hours trying to come up with optimal parameters for GBT (and ran into many aborted processes due to null pointer exceptions along the way

) but it's just not satisfactory .

So I took a step back and looked at my problem again. I now found it's not so much a regression problem as it is a time series forecast (with some regressionisque cherries as topping). I have since studied several blogs and some papers on the topic, and came up with the idea to build a LSTM (long short-term memory neural network) to tackle the task. But I struggle with the setup, namely the layers and their parameters. My main trigger for the idea has been Jason Brownlee:

Yes. Vanilla LSTMs are poor at time sreies forecasting. I have a ton of results to show this. More here:

CANT-POST-LINKS-machinelearningmastery.com/suitability-long-short-term-memory-networks-timeseries-forecasting/

CNNs are often better. CNN+LSTM better again, and ConvLSTMs are very good. I cover these here:

CANT-POST-LINKS-machinelearningmastery.com/deep-learning-for-time-series-forecasting/

So, I have installed the deep learning extension and went through the tutorial.
I have looked at the Airline Passengers LSTM sample process.
I have setup a process but this is where I'm at at the moment. Could anyone of you guys provide some rough guidance to get me on the right track?
Below my XML, the example set is very close to the one in the opening post. As it should not go into the wild, I can provide the original I am actually working with by PM.

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Bier Chips Würste &amp; Studiartikel (V1)" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//BachelorThesis/Sales Data/Bier Chips Würste &amp; Studiartikel (V1)"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="136">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="Abverkauf|Artikel|Datum Wochenbeginn|HatHochschule|HatSee|IstFerien|IstVorlesung|Jahr|KalWoche|Monat|Niederschlag|Saison|Shop|Sonnenschein|TemperaturMax|TemperaturMin|TemperaturMit|TrxID"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role" width="90" x="45" y="238">
        <parameter key="attribute_name" value="Abverkauf"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="TrxID" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="guess_types" compatibility="9.2.001" expanded="true" height="82" name="Guess Types" width="90" x="45" y="340">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="HatHochschule|HatSee|IstFerien|IstVorlesung"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="decimal_point_character" value="."/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="442">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="all"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Shop.eq.7029"/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="time_series:normalization" compatibility="9.2.001" expanded="true" height="68" name="Normalize (Series)" width="90" x="179" y="442">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="Abverkauf|Niederschlag|Sonnenschein|TemperaturMax|TemperaturMin|TemperaturMit"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="overwrite_attributes" value="true"/>
        <parameter key="new_attributes_postfix" value="_normalized"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="9.2.001" expanded="true" height="103" name="Split Data" width="90" x="313" y="136">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.2"/>
          <parameter key="ratio" value="0.8"/>
        </enumeration>
        <parameter key="sampling_type" value="automatic"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="collect" compatibility="9.2.001" expanded="true" height="82" name="Collect Validation" width="90" x="447" y="34">
        <parameter key="unfold" value="false"/>
      </operator>
      <operator activated="true" class="deeplearning:dl4j_timeseries_converter" compatibility="0.9.000" expanded="true" height="68" name="TimeSeries to Tensor Validation" width="90" x="581" y="34"/>
      <operator activated="true" class="collect" compatibility="9.2.001" expanded="true" height="82" name="Collect Training" width="90" x="447" y="187">
        <parameter key="unfold" value="false"/>
      </operator>
      <operator activated="true" class="deeplearning:dl4j_timeseries_converter" compatibility="0.9.000" expanded="true" height="68" name="TimeSeries to Tensor Training" width="90" x="581" y="187"/>
      <operator activated="true" class="deeplearning:dl4j_tensor_sequential_neural_network" compatibility="0.9.000" expanded="true" height="103" name="Deep Learning (Tensor) Training" width="90" x="715" y="187">
        <parameter key="loss_function" value="Mean Squared Error (Linear Regression)"/>
        <parameter key="epochs" value="10"/>
        <parameter key="use_miniBatch" value="false"/>
        <parameter key="batch_size" value="32"/>
        <parameter key="updater" value="Adam"/>
        <parameter key="learning_rate" value="0.001"/>
        <parameter key="momentum" value="0.9"/>
        <parameter key="rho" value="0.95"/>
        <parameter key="epsilon" value="1.0E-6"/>
        <parameter key="beta1" value="0.9"/>
        <parameter key="beta2" value="0.999"/>
        <parameter key="RMSdecay" value="0.95"/>
        <parameter key="weight_initialization" value="Normal"/>
        <parameter key="bias_initialization" value="0.0"/>
        <parameter key="use_regularization" value="false"/>
        <parameter key="l1_strength" value="0.1"/>
        <parameter key="l2_strength" value="0.1"/>
        <parameter key="optimization_method" value="Stochastic Gradient Descent"/>
        <parameter key="backpropagation" value="Standard"/>
        <parameter key="backpropagation_length" value="50"/>
        <parameter key="infer_input_shape" value="true"/>
        <parameter key="network_type" value="Recurrent with TimeSeries"/>
        <parameter key="log_each_epoch" value="true"/>
        <parameter key="epochs_per_log" value="10"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <process expanded="true">
          <operator activated="true" class="deeplearning:dl4j_lstm_layer" compatibility="0.9.000" expanded="true" height="68" name="Add LSTM Layer" width="90" x="112" y="34">
            <parameter key="neurons" value="8"/>
            <parameter key="gate_activation" value="ReLU (Rectified Linear Unit)"/>
            <parameter key="forget_gate_bias_initialization" value="1.0"/>
          </operator>
          <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer" width="90" x="380" y="34">
            <parameter key="number_of_neurons" value="1"/>
            <parameter key="activation_function" value="Softmax"/>
            <parameter key="use_dropout" value="false"/>
            <parameter key="dropout_rate" value="0.25"/>
            <parameter key="overwrite_networks_weight_initialization" value="false"/>
            <parameter key="weight_initialization" value="Normal"/>
            <parameter key="overwrite_networks_bias_initialization" value="false"/>
            <parameter key="bias_initialization" value="0.0"/>
          </operator>
          <connect from_port="layerArchitecture" to_op="Add LSTM Layer" to_port="layerArchitecture"/>
          <connect from_op="Add LSTM Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer" to_port="layerArchitecture"/>
          <connect from_op="Add Fully-Connected Layer" from_port="layerArchitecture" to_port="layerArchitecture"/>
          <portSpacing port="source_layerArchitecture" spacing="0"/>
          <portSpacing port="sink_layerArchitecture" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="deeplearning:dl4j_apply_tensor_model" compatibility="0.9.000" expanded="true" height="82" name="Apply Model (Tensor)" width="90" x="849" y="34"/>
      <connect from_op="Retrieve Bier Chips Würste &amp; Studiartikel (V1)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Guess Types" to_port="example set input"/>
      <connect from_op="Guess Types" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Normalize (Series)" to_port="example set"/>
      <connect from_op="Normalize (Series)" from_port="example set" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Collect Validation" to_port="input 1"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Collect Training" to_port="input 1"/>
      <connect from_op="Collect Validation" from_port="collection" to_op="TimeSeries to Tensor Validation" to_port="collection"/>
      <connect from_op="TimeSeries to Tensor Validation" from_port="tensor" to_op="Apply Model (Tensor)" to_port="unlabelled tensor"/>
      <connect from_op="Collect Training" from_port="collection" to_op="TimeSeries to Tensor Training" to_port="collection"/>
      <connect from_op="TimeSeries to Tensor Training" from_port="tensor" to_op="Deep Learning (Tensor) Training" to_port="training set"/>
      <connect from_op="Deep Learning (Tensor) Training" from_port="model" to_op="Apply Model (Tensor)" to_port="model"/>
      <connect from_op="Deep Learning (Tensor) Training" from_port="history" to_port="result 2"/>
      <connect from_op="Apply Model (Tensor)" from_port="labeled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Getting started help: predict sales based on several attributes for several products

Answers