The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Auto Model cost matrix

mattnikielmattnikiel Member Posts: 7 Contributor I
edited June 2019 in Help

How does Auto Model handle unequal misclassification costs ? I don't see an option to specify relation between FP and FN or enter values into a cost matrix. It's a standard feature in most data mining packages. 

Tagged:

Answers

  • Options
    DocMusherDocMusher Member Posts: 333 Unicorn

    Hi,

    Did you take a look at the Results in Automodel. Each model provides a performance screen with the results you are requesting. Please take a look at an example I uploaded recently (near the end of the vid).

    https://www.youtube.com/watch?v=i_cfZqPY5Xk

    Cheers

    Sven

  • Options
    mattnikielmattnikiel Member Posts: 7 Contributor I

    Sorry, but all I can see there is a regular confusion matrix. I need to be able to take into account uneven misclassification costs as well as prior probabilities. So, you need either something like:

     

    https://docs.orange.biolab.si/3/visual-programming/widgets/evaluation/rocanalysis.html

    or

    https://support.sas.com/resources/papers/proceedings10/113-2010.pdf

    (page 6)

     

    Regards

     

    Matt

     

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you go to the actual RapidMiner process for any given model that you generate using AutoModel, you can simply change the default performance operator to the Performance(Costs) operator, which lets you enter the cost matrix in exactly the way you are describing.  You could also select one of the many other performance operators available.  It's just not built into the AutoModel interface to do that (yet).

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    mattnikielmattnikiel Member Posts: 7 Contributor I

    Can I see some examples of processes involving Performance (Cost) operator ? Also, is there an operator for prior probabilities too ? 

    Thank you

  • Options
    DocMusherDocMusher Member Posts: 333 Unicorn
    Hi,
    You might also use metacost in your actual https process from automodel. ://docs.rapidminer.com/latest/studio/operators/modeling/predictive/ensembles/metacost.html
    Looking forward if this was useful in your case.
    Cheers
    Sven
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Most basic operators (including Performance(Costs) contain tutorial processes built into the help for that operator.  See the screenshot below.  You can read about the operator in detail and then see a sample process with the operator configured.

    In most cases the selected machine learning algorithm will derive the prior probabilities automatically from your dataset based on class occurrences.  If your sample has been stratified or otherwise artificially constructed and you want to adjust that then you can always use the Generate Weight operator to do so.

    perf costs.PNG

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    mattnikielmattnikiel Member Posts: 7 Contributor I

    Thanks for the info. Can you tell me which operator I can use to oversample rare events and when should I consider doing this in the first place ? If my prior target class probability is below 20,10 or 5% ? Thank you.

     

    Matt

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist

    HI,

    SMOTE of operator toolbox is one good operator for this.

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    preesalopezpreesalopez Member Posts: 2 Contributor I

    Hello there,  It's a standard feature in most data mining packages.

  • Options
    mattnikielmattnikiel Member Posts: 7 Contributor I

    Hi,

    How to propery apply k-cross validation with oversampling (SMOTE) in Rapid Miner ? Can you show me the sample process, please ? Also, how do I adjust probablilities after oversampling ?

    Thank you

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    so as @mschmitz explained the SMOTE operator is part of the Operator Toolbox extension. You will need to use "normal" operators rather than Auto Model to do this. 

     

    You can access tutorials on how to use the SMOTE operator by putting the operator in your Design view, going to the Help panel, and then clicking "Jump To Tutorial Process".

     

    Screen Shot 2018-09-17 at 10.48.41 AM.png

     

    Scott

     

     

  • Options
    mattnikielmattnikiel Member Posts: 7 Contributor I

    Sorry, but I can't see such an operator in the toolbox. Where can I find it ? 

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @mattnikiel so if you download the Operator Toolbox from the marketplace, you should be able to see the operator here:

     

    Screen Shot 2018-09-24 at 9.35.55 AM.png

     

    Scott

     

  • Options
    mattnikielmattnikiel Member Posts: 7 Contributor I

    Ok, I got it. I'm wondering, is it posiible to perform oversampling during cross-validation just like described here: 

    www.marcoaltini.com%2Fblog%2Fdealing-with-imbalanced-data-undersampling-oversampling-and-proper-cross-validation

  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @mattnikiel

     

    I suppose it is possible, if I correctly understood what you want to achieve, looking at your link.

    This way only training part of data will be upsampled:

     

    Screenshot 2018-10-04 17.07.41.pngScreenshot 2018-10-04 17.08.20.png

Sign In or Register to comment.