🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉
RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance
CLICK HERE TO DOWNLOAD
Sample Operator  Probability
I have to admit, I am having a hard time understanding the output from a Sample Operator when selecting probability as the sample parameter.
For example, if I use Generate Data to create a 100 example ExampleSet, and I connect the Sample Operator with probability and .1, I get 7 records.
In short, why is it not 10 records? I am having a hard time wrapping my head around this.
For example, if I use Generate Data to create a 100 example ExampleSet, and I connect the Sample Operator with probability and .1, I get 7 records.
In short, why is it not 10 records? I am having a hard time wrapping my head around this.
0
Best Answer

jacobcybulski Member, University Professor Posts: 365 UnicornHere the sample size is determined in a probabilistic way, from the normal distribution then the sample is selected randomly. I assume this could find its application in repeated resampling to avoid the bias attached to a fixed sample size.0
Answers
Thanks for the help, but I suppose, let me ask this differently. What exactly is the sampling doing under the hood? Is it assigning every record a score from a distribution and only selecting those with a value <..1 or > .9? I am just trying to wrap my head around this approach to sampling. I tend to compare to R or python where I can set the number of random records, or the % of records I want. The idea of probabilistic sampling is not something I have come across too often.
Cheers,
Dortmund, Germany
That is perfect. Exactly what I was looking for!