🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Prediction Column out of Binary Machine Learning Classification Problem

summer_helmisummer_helmi Member Posts: 14 Contributor I
edited October 6 in Help
For example, in case of Logistic Regression, we can get Coefficients that can be multiplied by the predictors to get the final output in the form of an attribute in CSV file or an image. Please let me know if it is scientifically correct to get the weights/rules out of trained SVM. ANN, KNN, NB models and multiply each predictor with each weight/rule and get the sum for all predictors. I mean (predictor 1* its weight + predictor 2* its weight + predictor 3* its weigh + ........)

Answers

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 417   Unicorn

    At first glance, operating on weights/rules didn't sound logical to me: Decision Trees try to make examples fit in one or other category by treating all data as categorical rather than numerical. Logistic Regressions, on the other hand, are performed over numerical data, and altering the results might make more sense.

    However, Gradient Boosted Trees work in a similar fashion. That is, giving more weight to classes that are difficult to classify and less weight to the easier ones. It wouldn't hurt to make a quick test and see how predictors behave with your data. The keyword to continue researching is Boosting.

    Hope it helps,

    Rodrigo.
    varunm1
  • summer_helmisummer_helmi Member Posts: 14 Contributor I
    Thank you @rfuentealba for your help. Could you please let me which classifier rather than DT allow extract of weight/rules.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,156  RM Data Scientist
    Hi,
    your approach of coefficient*value only works for linear models. The strenght of most machine learning models is ,that they are non-linear. thats the cool part.Breaking down non-linear, multi-variate methods into single factors is 'tricky' to 'impossible'.
    Never the less, have a look at the WEI ports of the operators and at operators like Tree to Rules (or so?). They may help.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    varunm1rfuentealba
  • summer_helmisummer_helmi Member Posts: 14 Contributor I
    So linear ML model could work same way sch as Linear SVM, but other non-linear models could not,
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    edited October 6
    Hello @summer_helmi

    SVM is a one of the linear models , but it can work with non linear functions using kernel trick. Non linear algorithms have their own way of working, for example a decision tree works based on split criterion and a neural network work based on hidden unit activations.

    So basically every class of algorithms have their own way of working 

    For your initial question, yes its scientifically correct to get feature weights from an algorithm, as the weights are calculated based on proven methods. But it is not always correct to multiply the weight with feature, it is only correct for a class of linear models (GLM) that are based on linear equations 



    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    Tghadially
Sign In or Register to comment.