Prediction for next orders, any ideas?

EJECTYEJECTY Member Posts: 3 Contributor I
edited November 2018 in Help

Dear Community!

 

I have a .csv file with 100.000 rows and 439 columns. This spreadsheet represents the customers' habits for using a specific service. For each rows there is an ID for every customer and every transaction date with the following format: 1 for Monday, 2 for Tuesday... etc. I need to predict the next date of transaction for every customer, using these past records.

 

Here's an example for the format of the database:

customer_id                      transaction1 transaction2 ... transaction438
1                                       1 2 3 4 5 6 7 ... 745 746 747
2                                       2 7 16 20 21 23 28 ... 412
3                                       1 2 3 4 5 6 7 ... 285 322
4                                       5 7 8 12 14 19 21 ... 924 925 926

 

Any ideas what model should I use for this prediction for the best accuracy?

 

NOTE: The database have lots of missing values depends on the frequency of ordering.

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    This looks like some sort of sales projection analysis. I would look at the process I shared here: http://community.rapidminer.com/t5/RapidMiner-Studio/How-to-get-forecast-values-of-future-from-time-series-data/m-p/37698

     

    You would need to do a bit of missing value replacements using the Replace Missing Values operator and need to install the Series extension from our marketplace.  Is there seasonality involved?

  • EJECTYEJECTY Member Posts: 3 Contributor I

    It is a homework at the university, we are learning the basics of RapidMiner. We needed to do similar examples earlier, but there was a label column for the learning database, but this time I have no clue, how I could predict the possible outcome without that special column. I thinked about some sort of pattern analysis, or converting the database to a range from 1 to 7 to simplify the problem, but I couldn't move along to a real solution.

     

    I think seasonality doesn't matter, because it's just an example.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    If it's sales, you could sum up the values and do a Total Sales per month or week? You can use the dates as your ID and then the Total Sales as you Label. 

  • EJECTYEJECTY Member Posts: 3 Contributor I

    Because the database contains the days of transaction in a code format, not the quantity, making totals is not possible or making sense. 

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    AH! Did you try the Generalized Sequential Patterns operator?

Sign In or Register to comment.