"meaning of sample ratio in ArffExampleSource"

lotusinsnowlotusinsnow Member Posts: 2 Contributor I
edited May 2019 in Help
Dear all,

I have a very large dataset, so the miner can't finish clustering successfully and also took a long time. I used sample_ratio=0.1 in ArffExampleSource, it executed successfully! Could you please tell me what kind of sampling mechanism that rapidminer is using, so I can have an idea of what the data likes after sampling by sample_ratio?

Many thanks,


  • Options
    lotusinsnowlotusinsnow Member Posts: 2 Contributor I
    I saw the code, and the sample is randomly chosen by the ratio.

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Jing,
    You are correct. For more sophisticated sampling algorithms, see the preprocessing/data/sampling group. There we provide operators like kennard-stone sampling, stratifiedSampling. Of course your data has to fit entirly into the memory, in order to sample it with this operators...

Sign In or Register to comment.