Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
How to create a 'real' random value?
Probably overlooking something but I am struggling to get a random record that is actually changing each time. There is the option to get a random sample value, but this always gives me the same one, and same goes for generating a random value using the rand() function. Also here I get the same random number each time, while I want a new one instead.
I simply want to get a single random record out of a recordset, but it should be a different record each time I call the set.
Any ideas?
I simply want to get a single random record out of a recordset, but it should be a different record each time I call the set.
Any ideas?
0
Best Answers
-
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @kayman,
You are correct. There is no real random value in computer science... Maybe you have already heard Pseudorandom number.
“On a completely deterministic machine you can't generate anything you could really call a random sequence of numbers,” says Ward, “because the machine is following the same algorithm to generate them. Typically, that means it starts with a common 'seed' number and then follows a pattern.”
https://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-generate-a-truly-random-number/
For a more day-to-day example, the computer could rely on atmospheric noise or simply use the exact time you press keys on your keyboard as a source of unpredictable data, or entropy. For example, your computer might notice that you pressed a key at exactly 0.23423523 seconds after 2 p.m.
That is similar to how we use the date_now() or the process_start timestamp to generate the rand seed for a random number.<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="1992"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="generate_macro" compatibility="9.2.001" expanded="true" height="68" name="Generate Macro" width="90" x="112" y="34"> <list key="function_descriptions"> <parameter key="seed" value="mod(date_millis(date_now()),10000)"/> <parameter key="Pseudorandom_num" value="rand(round(eval(%{seed})))"/> </list> </operator> <operator activated="true" class="generate_data" compatibility="9.2.001" expanded="true" height="68" name="Generate Data" width="90" x="514" y="34"> <parameter key="target_function" value="random"/> <parameter key="number_examples" value="1"/> <parameter key="number_of_attributes" value="1"/> <parameter key="attributes_lower_bound" value="-10.0"/> <parameter key="attributes_upper_bound" value="10.0"/> <parameter key="gaussian_standard_deviation" value="10.0"/> <parameter key="largest_radius" value="10.0"/> <parameter key="use_local_random_seed" value="true"/> <parameter key="local_random_seed" value="%{seed}"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <connect from_op="Generate Data" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
8 -
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data ScientistHi @kayman ,try to set the random seed of the main process to -1, this will set the seed to something which is connected to the system time. Thus you get different numbers each time you run it.Best,Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany10
Answers
I can of course always use the randint python function also, but it feels like a missing option in Rapidminer to me.
Dortmund, Germany
David
I'll use the python randint then, thanks to all.
Dortmund, Germany
Whether it is pseudo or not doesn't matter, as long as I would get a different random number each time.
I would therefore have expected that the rapidminer rand() function would do something similar as using the date or so to emulate some form of randomness, but it appears to be rather static instead.
Anyway, I combined the suggestions given by yy and David and use a pseudo random seed value for the shuffle operator. Then I get a different order each time, so I just have to pick the first record and this will be different / random enough for my purposes.