real random sampling on a small dataset

kayman · June 2017

I'm trying to generate some random user agent strings by sampling a small exampleset with OS info, and another small exampleset with browser details. From each exampleset I want to take one random example, concatenate these and use it for some other processing later on.

I use the sample operator, absolute = 1 and this gives me indeed each time one single example from all of my sets. Unfortunatly it is each time exactly the same, so there seems no randomness involved. I assume this will only start as soon as you have a bigger set of examples but I would like to understand how to do this on a real small set also. Or how much example there are are needed to be able to get random results from the sample operator instead of each time the same one?

Attached some example showing the problem, the result is always the same even if it should be random in theory.

MartinLiebig · June 2017

Hi,

use generate macro with

date_millis(date_now())%10000

and use it as a random seed.

That should do it.

jczogalla · June 2017

Hi,

you could also set the random generator of the process to be initialized with the system time. You can achieve that by setting the random seed of the process to -1. If you then don't use a local random seed for the sampling operator, the result will differ everytime you start the process.

Thomas_Ott · June 2017

Your using the system seed random generator, its the same everytime. try using a different seed.

kayman · June 2017

Thanks @Thomas_Ott, how would that work then in practice ?

I've tried the same with setting the 'use local random seed' but it still seems always the same in the end. I do get different values when I change the local random seed value, but they are then also always the same if I rerun the operator. Or am I doing this wrong?

What I would like to achieve is that each time when I run the process a different single example is sampled from my set, totally random.

I probably could create a macro using the random function and use that as an entry for the random seed number, but it looks a bit like overcomplicating things. Also the random function does not give me much options to generate a number between 1 and 1992 (max value allowed)

Thomas_Ott · June 2017

The reason why the random seed is the same if you set it to like 1992 is because academic researchers. They need something reproducible for peer-review for a particular randome number set. Let see if @Edin_Klapic might now if there is a purely random number generator inside RapidMiner Studio.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

real random sampling on a small dataset

Best Answers

Answers