"meaning of sample ratio in ArffExampleSource"

lotusinsnowlotusinsnow Member Posts: 2 Contributor I
edited May 2019 in Help
Dear all,

I have a very large dataset, so the miner can't finish clustering successfully and also took a long time. I used sample_ratio=0.1 in ArffExampleSource, it executed successfully! Could you please tell me what kind of sampling mechanism that rapidminer is using, so I can have an idea of what the data likes after sampling by sample_ratio?

Many thanks,
Jing
Tagged:

Answers

  • lotusinsnowlotusinsnow Member Posts: 2 Contributor I
    I saw the code, and the sample is randomly chosen by the ratio.

    Jing
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Jing,
    You are correct. For more sophisticated sampling algorithms, see the preprocessing/data/sampling group. There we provide operators like kennard-stone sampling, stratifiedSampling. Of course your data has to fit entirly into the memory, in order to sample it with this operators...

    Greetings,
      Sebastian
Sign In or Register to comment.