RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Problems with Auto Model Cluster Analysis

TerpdogTerpdog Member, University Professor Posts: 8  University Professor
edited May 20 in Help
"I am using Auto Model to do a k-means cluster analysis. Works fine for 2 clusters. With 3 or more clusters or or more cluster has an average distance of ? and a Davies-Bouldin index of infinity. This appeared before and I thought Version 9.6 had fixed it but apparently not. It also appears in the beta of 9.7. Is there a way around this? Thanks."
Jasmine_

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,047   Unicorn
    Hi @Terpdog,

    Can you share your data in order we can reproduce and understand what's going on ?

    Regards,

    Lionel
  • TerpdogTerpdog Member, University Professor Posts: 8  University Professor
    I am not sure what files are needed but I have attached the only rapidminer file I could find and also an Excel file of the data. I was using only the first four variables for the cluster analysis.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,047   Unicorn
    Hi @Terpdog,

    Thank you for sharing your data.
    I can reproduce what you observe : 


     But there is something strange in Auto-Model itself because
    if I'm using your data (only the first four variables) with a k-Means model (with k = 3, 4,etc) in a classic RapidMiner process,
    the results are correct (ie I obtain finite values for DB index and average distances) : 



    Has someone an idea of what's going on in Auto-Model (clustering) ?

    In attached file, the classic (working) process in RapidMiner.

    Regards,

    Lionel



  • TerpdogTerpdog Member, University Professor Posts: 8  University Professor
    edited May 20
    Thanks Lionel. I did not think to try the process route. There has to be a bug in the Auto-Model routine. Hopefully that can get fixed. There is still a question of why the distances are negative which does not make sense.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,047   Unicorn
    @Terpdog,

    The "real" distances are, of course, positive.
    It seems to me that RapidMiner multiply the distances by minus one (-1) in order to work with negative values because
    RapidMiner's algorithms are searching to MAXIMIZE these values. (explanation to be confirmed by the RM staff, @sgenzer ?)

    Regards,

    Lionel
  • TerpdogTerpdog Member, University Professor Posts: 8  University Professor
    That makes sense. I am continually frustrated at how hard it is to get routine statistics following an analysis in RapidMiner. I am trying to use this in my book which talks about measures of fit in techniques such as cluster analysis, discriminant analysis and logistic regression and I can't get RapidMiner to produce them or it is so difficult it would be of no use to students. I may have to drop the idea of using it. Too bad.
Sign In or Register to comment.