real random sampling on a small dataset

kaymankayman Member Posts: 662 Unicorn
edited December 2018 in Help

I'm trying to generate some random user agent strings by sampling a small exampleset with OS info, and another small exampleset with browser details. From each exampleset I want to take one random example, concatenate these and use it for some other processing later on.

 

I use the sample operator, absolute = 1 and this gives me indeed each time one single example from all of my sets. Unfortunatly it is each time exactly the same, so there seems no randomness involved. I assume this will only start as soon as you have a bigger set of examples but I would like to understand how to do this on a real small set also. Or how much example there are are needed to be able to get random results from the sample operator instead of each time the same one?

 

Attached some example showing the problem, the result is always the same even if it should be random in theory.

Tagged:

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi,

     

    use generate macro with

    date_millis(date_now())%10000

    and use it as a random seed.


    That should do it.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • jczogallajczogalla Employee, Member Posts: 144 RM Engineering
    Solution Accepted

    Hi,

     

    you could also set the random generator of the process to be initialized with the system time. You can achieve that by setting the random seed of the process to -1. If you then don't use a local random seed for the sampling operator, the result will differ everytime you start the process.

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Your using the system seed random generator, its the same everytime. try using a different seed.

  • kaymankayman Member Posts: 662 Unicorn

    Thanks @Thomas_Ott, how would that work then in practice ?

     

    I've tried the same with setting the 'use local random seed' but it still seems always the same in the end. I do get different values when I change the local random seed value, but they are then also always the same if I rerun the operator. Or am I doing this wrong?

     

    What I would like to achieve is that each time when I run the process a different single example is sampled from my set, totally random.

     

    I probably could create a macro using the random function and use that as an entry for the random seed number, but it looks a bit like overcomplicating things. Also the random function does not give me much options to generate a number between 1 and 1992 (max value allowed) 

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    The reason why the random seed is the same if you set it to like 1992 is because academic researchers. They need something reproducible for peer-review for a particular randome number set. Let see if @Edin_Klapic might now if there is a purely random number generator inside RapidMiner Studio. 

Sign In or Register to comment.