"How to set Weights on Iris Data Set?"

geb_hartgeb_hart Member Posts: 5 Contributor I
edited May 2019 in Help
I tried some different Algorythms on the Iris sample Data set and get around 96% Accuracy.
However the AutoModel gets 100% and i think this comes with the use of weights!?
Unfortunatly I'm not able to reproduce the process from the open process!

Can somone show me how to implement "weight by Correlation" for polynominal Data?


Best Answer


  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @geb_hart,

    I have a different hypothesis : 
    This performance of 100% (accuracy) is due to "luck" from my point of view . In deed, by defaut the "Auto-Model" tool
    performs a Split Validation with a ratio Training/Test = 0.8 / 0.2. So the performance is calculated on 20 % of the dataset (so 30 examples for the Iris dataset), if the sampling is "lucky" all the test examples are correcty classified which explains this performance.
    To convince you, you can : 
     - set an other "local random seed" for the sampling of the training/Test partition  . For example here the results with local random seed = 1991

    decrease the ratio training/Test in the Split Data (split of a validation set) operator. In this case, there are more test examples and there is less "luck" to have all the test examples correctly classified . Here the results with ratio Train/Test of 0.7/0.3  (and local random seed = 1992) : 

    As beta tester, I was amazed by the RapidMiner's Studio owner, that Auto-Model don't perform Cross-Validation (instead Split Validation).
    A priori with a Cross Validation, these kind of "perfect results" are impossible...
    So is there any reason to perform Split Validation instead Cross-Validation in this tool (maybe time of computation..?).

    And to conclude, the moral of this story is that "..in Data-science (and maybe more generally in the life), there are those who are lucky and ....
    the others...."

    I hope it helps,


  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I agree with everything @lionelderkrikor says about cross validation above.
    As far as your other question goes, there is no (sensible) way to use Weight by Correlation for polynominal data.  You could either look at another weighting approach (such as Weight by Information Gain) or you would have to transform all your data into binominal 0/1 flags and then calculate numerical correlations.  But in neither case will using Weight... operators improve your model performance to 100%!
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    cc'ing @IngoRM about Split vs Cross Validation in Auto Model.
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    RM Staff,

    I just updated the RapidMiner with the 9.1 "official release" and tested rapidly the Auto-Model tool : 
    I wanted to warmly welcome the introduction of Cross-Validation inside Auto-Model and I must admit that there is an impressive work on this release.



  • Options
    geb_hartgeb_hart Member Posts: 5 Contributor I
    I also tested the auto Model on the Iris Data in the 9.1 release.. and still get 100% with 3 of seven Models

    and still belief that weights play a role in them, but not reproducabel for me

    Please try it for yourself and if you could rebuild the process for GLM or SVM I would like to see it :)

    Thx for your Comments!!

  • Options
    M_MartinM_Martin RapidMiner Certified Analyst, Member Posts: 125 Unicorn
    Colleagues: a very interesting conversation, and I particularly interesting (and also somewhat worrisome) is the fact that RapidMiner marketing experience seems to indicate that users have a low patience threshold - this is a bottle of wine conversation topic in and of itself.  Best wishes, Michael Martin
Sign In or Register to comment.