Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Scorecard

cristinapetcu28cristinapetcu28 Member Posts: 4 Contributor I
edited November 2018 in Help

Hi,

How can I build a scorecard? 

 

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Hi there and welcome to the community!

     

    This is a very general question, so I'm not entirely certain what your definition of "scorecard" might be.  It would help if you could share a bit more detail about your question, such as the industry/domain, or the type of data or prediction problem, etc., that you have in mind.

     

    Nevertheless, without that information, I can still say that RapidMiner is capable of producing predictive models that would satisfy nearly any definition of "scorecard" (in the predictive sense) that is commonly used.  Here are just a couple of examples:

    • building classification models based on logistic regressions for predicting binary outcomes like direct response or credit default
    • building classification models based on decision trees or cascading rulesets for predicting binary or multinomial outcomes
    • building regression-type models for generating continuous outcome predictions
    • even building qualitative scoring systems used for ranking by assigning points to various factors, "expert judgment" style

    All of these models often are output in a format called a "scorecard" that can easily be implemented in other environments like SQL, or even computed manually if necessary, while some of the more advanced machine-learning based approaches to modeling (neural nets, SVM, random forest, etc.) typically don't lend themselves to easy implementation in a scorecard fashion.

     

    In any of these cases, the key to building a scorecard is to first get a dataset together that is suitable for predictive modeling, which means you'll need to have a defined outcome variable (a "label" in RapidMiner terms), with multiple examples (cases or rows) and multiple attributes (variables or columns) that could be used to predict the outcome.  All these can be used RapidMiner to actually build predictive models.  You might want to check out some of the built in tutorials such as the one for credit default modeling or the one for churn modeling to see some example datasets and processes for how this is actually done in more detail inside RapidMiner.

     

    I hope this is helpful.

     

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • cristinapetcu28cristinapetcu28 Member Posts: 4 Contributor I

    Hi,

    Thank you for your answer!

    My project consists of building a scorecard which contains points depending on some factors e.g:

    • if the age is between 18 and 26 he receives 2 points; age>30 receives 3 points etc;
    • if he has a home gets 5 points;
    • if he has a bachelor degree he gets 6 points, if he has onky high shcool he gets 2 points

    This should be build with a reference to a historical data. Afterwards, it should be applied on a new data set and  if the sum of the points is grater, let's say, then 27, then he gets an OK. Otherwise he receives a NO.

    Can I build something like this in Rapidminer and if yes how should I proceed with my project.

    Thank you again!

    Cristina

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist

    Dear Cristina,

     

    you can indeed built such a model in RapidMiner. The key operator is Generate Attributes.

     

    Usually RapidMiner is used to built more advanced models than your scorecard based on multivariate methods (e.g. a Decision Tree) - I would consider this as an option.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • cristinapetcu28cristinapetcu28 Member Posts: 4 Contributor I

    Dear Martin,

    Thank you for your solution!

     

    I have one question regarding the Decision tree. After I apply this operator, I should use the Wight operator to see wich attribute is more important and afterwards to set a score with Generate Attributes operator? I don't see another option if I apply the Decison tree, because from what I have experimented with RapidMiner, this operator creats a model, but I am not aware of how I should give a score based on a Decision Tree.

     

    I hope I was clear enough, if not please let me know in order to give you with some more details.

     

    Thank you again for your ansewer!

    Cristine

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist

    Dear Cristine,

     

    the quick answer is that you need to apply the decision tree on the data you would like to score by using the Apply Model operator. This will produce new attributes, which can be used to define a score.

     

    Please have a look on: http://docs.rapidminer.com/studio/getting-started/ (especially the sections Creating and Applying a Model) for some more details.

     

    ~martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Hi Cristine, 

     

    As Martin says, you will need to use the "apply model" operator to a new dataset to get the model output.  You will need to have all the same data attributes (other than the outcome label) on the new dataset for the "apply model" operator to work properly.  Also remember that a decision tree doesn't output a score in the same sense of the word as you originally used it (i.e., point values assigned to specific data attribute values), which is more in line with what I called an"expert judgment" scorecard.  Instead, the output of a decision tree model is going to be a prediction of class membership based on your original labels (e.g., default vs not default) and an associated proability.  This should be clear if you look at the attributes that are added after you run the decision tree model.  

     

    I hope this is helpful.

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • cristinapetcu28cristinapetcu28 Member Posts: 4 Contributor I

    Hi Martin and Brian,

    Thank you both for your answers! 

    I understood what happens with the operator Decision Tree and Apply model, but I need to make an "expert judgment" scorecard, how Brian names it. It is great what Apply Model offers me, but I basically need those edge labels from Decsion to save them in a sort of a file, if it is possible, and to give them a score based on their  importance in the tree decison. I tried with the operator Generate Attribute, but there I need to write myself the functions in order to score the attributes. I need something to do this automatically.

    Capture.PNG

    I am not sure how clear I am, but please let me know if you need further information.

    Thanks,

    Cristina

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I don't think there is a way to have RapidMiner do want you want completely automatically, but there are ways to get closer to what you want.

     

    For example, take a look at the "Tree to Rules" subprocess operator, which will turn your tree model into a set of written rules defining all terminal nodes.  You can then use this ruleset pretty easily with "Generate Attributes" to define each segment that you want to assign the score to (which you still need to do manually, of course, but now you have all the segment definitions already written out for you).

     

    Alternatively, you can use the modeling algorithm "Rule Induction" instead of a tree, as that learner directly generates a ruleset.  You'd then do the same thing in terms of turning that ruleset into a set of scored segments using Generate Attributes.

     

    [Edit]:  Not sure if you know this too, but you can also see a text view of the tree rules simply by changing the view on the left from "graph" to "description".  But I don't find this presentation of the tree rules as easy to follow because the edge nodes are not clearly defined.

    rules for trees.PNGtext view of tree

     

     

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.