First steps. Need help in clustering

Antonios1Antonios1 Member Posts: 9 Learner I
edited October 2020 in Help


I create a fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000 rows and two columns. Columns A contains IDs  with values ranging between 1 and 100. Column B contains an hypothetical expense amount between 0 and 50000 for each ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between  48000 and 50000.

Let’s suppose I don’t know how the dataset is composed and I’d wanted to see it there is one ore more IDs with anomaly concentration (I mean I would like the analysis to spot ID number 100 with its concentration between 480000 and 50000), what kind of analysis I should perform? I tried with clustering (k-means),  but without success; probably I do not know the steps to follow to perform the analysis. Might somebody help me?

Best Answer

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted
    Try some of the operators in the anomaly detection methods available in the free extension of that name.  LOF might be particularly useful in this type of context. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts


  • Options
    Antonios1Antonios1 Member Posts: 9 Learner I
    Thanks for helping Brian. I am really new at Rapidminer and AI, so forgive me if I do not use the relevant terms. Anyway, I am sorry I was unable  to test the LOF operator. I downoload the anomaly detection extension and used the LOF operator. I connected my file through the  out port to  the exe port on the LOF operator and connected the exa operator port to the res port. The process seemed  to take a lot of time to give an output so I stopped it after a few hours, I run it again this morning before going to work and  once back at one, I found the software crashed. I have launched it again to see how it proceed. Now it has been running for about 1 hour and still going. Pc is an i7 with  16GB Ram.

  • Options
    Antonios1Antonios1 Member Posts: 9 Learner I
    Thank you, Brian. It works. I had the possibility to run the operator on a different pc and it worked correctly. It also seems to be quite immediate to interpret the result..
Sign In or Register to comment.