Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Getting started with predictions of probabilities

AndySchneyderAndySchneyder Member Posts: 5 Contributor II
edited November 2018 in Help
Hello,

I'd like to learn more about datamining/predictions with RM. Therefore I created a little scenario which I want to learn with... and well, I need a some help to get started and to understand working with RM a little better:

I want to predict the outcome of a card game which consists always of 3 Players.
Each player has certain attributes which should have an indication of which player has the best chance to win.

My goal is to predict the probability of each player to win:
Player 1:  10%
Player 2:  50%
Player 3:  40%

As a data basis I have a spreadsheet with training and testing data, including all the games played and the attributes of the players in one row:

P1_Name, P1_Att1, P1_Att2, ..., P3_Name, P3_Att1, P3_Att2, ..., P3_Name, P3_Att1, P3_Att2, ..., OUTCOME of the game(1,2 or 3 wins)

Question 1: How do I declare the attributes right?

So far, I have following understand. The spreadsheets usually consist of following structure (attributes/label):
att1, att2, att3.., outcome (label)
By the naming of the attributes the maschine is able to distinct them in training data as well as in the testing data.
Further, all the attributes combined have an impact on the outcome. The impact may differ by setting/calculating a weight.

This observation brings me to following difficulties/questions in my example:
1)
P1_att1, P2_att1, P3_att1 are the same attributes types and but are seen differently due to the naming. Therefore RM will interprete them differently which can lead to different outcomes if you switch Player1 and Player 2 in one game. So each att1 of the Players should be interpreted as the same regardless of there position. Is it possible to declare that in RM?
2)
All the attributes of each Player should be analyzed individually for each player because my thesis is that only if u consider all the data of one player together, you can make a good estimation of the outcome. Is it also possible to declare which attibutes belong to which player?   

Question 2: How can I generate the 3 probabilities of my testing data?
So far, I only get a certain confidence for the player win. Is it possible to determine the probability to win for the second one. (third can be calculated out if first and second)

I really appreciate any help that will get me started.
greetz, Andy
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Andy,
    I will try to answer your questions, although the answer might not make you happy. Since both questions are dependent on the same problem, I won't answer separately:
    I think you understood very good how to define the attributes. The problem is, that you want them to carry more semantics than possible in the usual matrix-like organisation of values that's needed for common learning algorithms. So we have a problem, but don't worry: That's quite normal! There's hardly any real world data mining application, where you get the data in a form, that you can feed to your algorithms and apply the results. If you take a look at RapidMiner's operator list, you will see, that learning operators are only a small fraction of operations possible. That's exactly because of the same problem: Data needs to be preprocessed, needs to be transformed into a suitable format. And here's where the real problem lies, and why learning algorithms aren't applied everywhere.
    To solve this problem, you will have to squeez your mind and think about possible transformations into a suitable table format, somehow expressing what you want by certain column/value combinations. For example one solution could be to try a one vs one approach and transform each game into 6 examples. This way you could avoid the doubled attributes with same semantic.
    You see how this affects your second question? You will have to do some smart postprocessing after applying the learned model to calculate the probabilities from the results...


    So, welcome to the data mining community,
    where you have to solve problems, you not even imagined before :)

    Greetings,
      Sebastian
  • AndySchneyderAndySchneyder Member Posts: 5 Contributor II
    Hi,

    thx for the welcome and the great explanation. It definitely changed my perspective of data mining.
    Just to see if it changed to the better, I am want to lay down my understanding and of course some new questions that have been raised while writing this ;) So please correct me if I am wrong.

    The main mistake I did was to define attributes which refer to the players rather than to the game itself. I therefore have to determine attributes that describe the game and not so much the players. And this is done in the preprocessing. I can change the view to the players and determine their outcome individually but it necessary to find attributes that allow a determination of the outcome.  So first I have to choose a point of view.

    Afterwards I feed the data into my model. By training the model, RM will find some correlation in the training data which allows to making assumptions/ predictions of the outcome of the testing data. RM compares than the predicted results with the actual results of the testing data, which results again in an estimation of the overall confidence of the trained model.

    So next to the predicted outcome, I only get the overall confidence and not the confidence for each prediction, right? Is there any way for RM to calculate/define the confidence/probability for each result? Otherwise the postprocessing seems kind of hard to do.

    My simple conclusion is that the result of the prediction depends on the significance of the attributes, to choose the meaningful training data and of course the best model and its adjustments. Which brings me to following questions:

    If I want to do a prediction for each player in the game can I define a dedicated neuron for each player? So that each neuron has different data sets which it is trained with? If so, is it possible the save the current learning status the neuron, or do I need to train every neuron before game? Is such scenario within RM GUI possible or is it better to use JAVA with RM integration?

    I hope I'm not asking too much… but its hard to find advanced tutorials or similar examples where I could learn from.

    Thx for ur time!
    Greetz, Andy
  • wesselwessel Member Posts: 537 Maven
    Hmm, I think your problem is more reinforcement learning.

    Maybe this paper can help you:
    R. Cattral, F. Oppacher, D. Deugo. Evolutionary Data Mining with Automatic Rule Generalization. Recent Advances in Computers, Computing and Communications, pp.296-300, WSEAS Press, 2002.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Andy,
    I think you got it correct, beside the fact that RapidMiner won't use just correlations but general dependencies. Correlations are only linear dependencies... :)
    To answer your first question: Each prediction model of RapidMiner will give you more or less reliable confidence values for each possible label per example (which might be a game in your setting). After applying the model there will be confidence attributes like "confidence(player1)" "confidence(player2)" "confidence(player3)" if player1 to player3 are the possible label values.

    To your second question: You cannot do this within RapidMiner, RapidMiner is not a specialized Neural net modeler. But anyway I'm not sure how you could benefit from such a setting. Usually doing a good and reasonable preprocessing and then apply and tune standard learning algorithms outperforms a complex neural net setting in speed, ease and most of the time quality of results. But of course you might extend the neural net modeling in this way. If you are going to extend RapidMiner, I suggest buying the White paper in our shop. It's not too expensive and will ease your life a lot.

    Greetings,
      Sebastian
  • AndySchneyderAndySchneyder Member Posts: 5 Contributor II
    Sebastian Land wrote:

    After applying the model there will be confidence attributes like "confidence(player1)" "confidence(player2)" "confidence(player3)" if player1 to player3 are the possible label values.
    I set up an X-Validation, with a SVM on the learning data and afterwards applying the model as well as a validator to the testing/ scoring data... but I wont get these confidences for the three labels, as you mentioned? Am i missing something?
    Are there any tutorials / books or videos that will help me understand that better?

    Thx for the tip... I am gonna look into it once I need to extend RM. 
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the XValidation will remove the confidences / predictions are completion. You could use the XVPrediction if you want to keep it.
    There are several videos available. See here for example: http://rapid-i.com/content/view/189/198/

    Greetings,
      Sebastian
  • AndySchneyderAndySchneyder Member Posts: 5 Contributor II
    Hi,

    Thank you for the input. It works fine so far. I got to know RM a little better but I can’t find a way to setup up my next process.

    I would like to make predictions for different game situations and analyze the outcome of each prediction and it confidences. I therefore extended my data set with the playerID and situation IDs:
    PlayerID; ID1; ID2; Attr1; Attr2; label
    Since RM can’t distinguish between these IDs (as far as I know) I thought of splitting the training data for the prediction and apply it to the score data set. It should accomplish something like that:

    - Train the model with the data that has the same playerID and do a prediction with confidences
    - Train the model with the data that has the situation ID1 and do a prediction with confidences
    - Train the model with the data that has the situation ID2 and do a prediction with confidences
    As a result I would have 3 different predictions which I can use for analyzing and post processing.
    I haven’t found anything in the RM tutorials. So can this be done in the RM GUI and if it is possible what would the set up look like?

    I appreciate any suggestions that will help me figuring this out!

    Greetz,
    Andy
  • wesselwessel Member Posts: 537 Maven
    Can you give me an example of your data?

    Its hard to help you without knowing what your data looks like.
  • AndySchneyderAndySchneyder Member Posts: 5 Contributor II
    here is what the data looks like:       PlayerID    Sit_ID1    Sit_ID2    Attr1    Attr2    Label        1    1    2    50    80    first        3    2    4    120    150    second        2    2    3    40    50    third        4    1    2    115    98    second        3    1    2    135    85    first        2    1    3    79    13    third        1    2    4    80    69    third        3    3    3    130    125    second        4    2    2    65    90    ? 
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    this is of course possible. To start with: Roles, like id, are flexible: You can assign them to another attribute, by first setting the original id attribute's role to regular and then setting another attribute's role to id. THis is possible with two Set Role operators.
    Unfortunately your setup is a little bit too complicated to solve in the few minutes I have for answering this forum, but it seems there already volunteered some other forum members to tackle this interesting task :)

    Greetings,
      Sebastian
  • wesselwessel Member Posts: 537 Maven
    I have tackled tasks of similar sort.

    But using different techniques.

    There is this online book about reinforcement learning which is very good:
    http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html

    Since you have such nice states in card games, you can do a lot with brute force, Dynamic Programming:
    http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node40.html

    Or less brute force more experience based, TD-lambda, Sarsa:
    http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node64.html

    And of course there is markov models.

    All these techniques can use machine learning to find important features for the state representation.
    But the wrapper around them allows you to play many games.
    These techniques seem far better suited for your problem.
Sign In or Register to comment.