How do I predict a single value for a user if the user has many rows of data

NGRichterNGRichter Member Posts: 1 Newbie

I am doing my bachelor thesis on process mining in a game and I am a bit stuck.
I want to predict the skill level of a certain user based on his/her event log.
The data I have is in this format:

User Level ActionID Action Timestamp Other
User1 1 1 Started 2019-05-22 13:22:55 x
User1 1 2 Modified 2019-05-22 12:22:57 x
User1 1 3 Executed 2019-05-22 12:23:05 x
... ... ... ... ... ...
User1 7 65 Modified 2019-05-22 12:29:50 x
User1 7 66 Modified 2019-05-22 12:29:57 x
User1 7 67 Executed 2019-05-22 12:30:02 x
... ... ... ... ...
User2 5 168 Executed 2019-05-23 15:03:15 x
User2 5 169 Failed 2019-05-23 15:03:15 x
... ... ... ... ... ...

I also have a separate data table containing the User values and the skill level, this can be included in the data table is there is a need.
Can someone help me point me in the right direction as I have no clue how to proceed.
The preferable output would be this table:

User Skill
User1 8.2
User2 6.4
User3 2.9
... ...

The skill values can also be exchanged with Beginner, Intermediate and Expert.
If you need anymore information or data, please ask.




  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    I assume that the user skill changes over time? If so, this is a time series problem, i.e. you can attach the skill to every example and create a forecasting model. If the skill does not change over time but is derived from a set number of examples, say 10, then you can copy your predictor attributes 10 times, each row appended as a new set of attributes and then train the model on these new attributes using the skill as your label. I am sure we could be smarter than that, for example you could try to create association or sequence analysis with the skill level somehow embedded in it but this would be quite complex as this is a multivariate case.

  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    I did not want to suggest this but if you wanted to go very deep, have a look at the Deep Learning extension, which uses a tensor model of learning and its help features an example, which looks almost like yours. I'd be hesitant to use it unless you are more advanced in machine learning, so your judgment applies here.

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    In either case, the very first thing you are going to have to do is come up with some kind of a performance definition and a way of measuring it.  In RapidMiner, this is called a "label" and you will need to create/append it to your records and use Set Role operator to define it.  None of the supervised machine learning operators will work without a defined performance label.
    In the case of a game, you might define the skill level simply as something like the win/loss ratio of games played by the user.  But there are many more complex systems available, such as the Elo rating system for chess or similar games, in which case the skill of your opponents is also taken into consideration.  Coming up with the right performance definition to use is going to be critical for solving your problem.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.