RapidMiner

RapidMiner

meaning of sample ratio in ArffExampleSource

Contributor

meaning of sample ratio in ArffExampleSource

Dear all,

I have a very large dataset, so the miner can't finish clustering successfully and also took a long time. I used sample_ratio=0.1 in ArffExampleSource, it executed successfully! Could you please tell me what kind of sampling mechanism that rapidminer is using, so I can have an idea of what the data likes after sampling by sample_ratio?

Many thanks,
Jing
2 REPLIES
Contributor

Re: meaning of sample ratio in ArffExampleSource

I saw the code, and the sample is randomly chosen by the ratio.

Jing
Elite

Re: meaning of sample ratio in ArffExampleSource

Hi Jing,
You are correct. For more sophisticated sampling algorithms, see the preprocessing/data/sampling group. There we provide operators like kennard-stone sampling, stratifiedSampling. Of course your data has to fit entirly into the memory, in order to sample it with this operators...

Greetings,
  Sebastian
Old World Computing - Establishing the Future

Check out the Jackhammer Extension for RapidMiner! Crunch more data easier and with up to 700% speed up! Available only here