Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Sample Operator - Probability
I have to admit, I am having a hard time understanding the output from a Sample Operator when selecting probability as the sample parameter.
For example, if I use Generate Data to create a 100 example ExampleSet, and I connect the Sample Operator with probability and .1, I get 7 records.
In short, why is it not 10 records? I am having a hard time wrapping my head around this.
For example, if I use Generate Data to create a 100 example ExampleSet, and I connect the Sample Operator with probability and .1, I get 7 records.
In short, why is it not 10 records? I am having a hard time wrapping my head around this.
0
Best Answer
-
jacobcybulski Member, University Professor Posts: 391 UnicornHere the sample size is determined in a probabilistic way, from the normal distribution then the sample is selected randomly. I assume this could find its application in repeated resampling to avoid the bias attached to a fixed sample size.0
Answers
Thanks for the help, but I suppose, let me ask this differently. What exactly is the sampling doing under the hood? Is it assigning every record a score from a distribution and only selecting those with a value <..1 or > .9? I am just trying to wrap my head around this approach to sampling. I tend to compare to R or python where I can set the number of random records, or the % of records I want. The idea of probabilistic sampling is not something I have come across too often.
Cheers,
Dortmund, Germany
That is perfect. Exactly what I was looking for!