Cluster Sampling in RapidMiner

StefanReiStefanRei Member Posts: 7 Newbie

i would like to use the Cluster Sampling Method in RapidMiner (e.g. look at Towardsdatascience Article for Sampling Techniques)

Do you have any suggestions? 

Thank you very much.



  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You'll have to incorporate this via a python script or R script since there is no native RapidMiner operator that implements this particular algorithm.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited July 2019
    Hello @StefanRei

    I am not sure if there is a particular operator in RM to do this. If this is implemented in Python or R, you can use the script operators to embed in the RM process. 

    One disadvantage from my view is that it is selecting entire sampled data from a few clusters which might either over-represent or under-represent the distributions in data. The problem with this is the high variations (low precision) in results. The major advantage is the processing time (fast) as it doesn't go through all the samples in our dataset. If you would like to have more precise results, you can go with stratified sampling.

    Based on the concept, one way to do what you need is by using clustering algorithms to generate clusters and select few clusters from that and test your process and observe how it goes. I didn't try this but got an idea based on the concept.

    Hope this helps.

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.