Creating random numbers with user defined distribution?

wasperenwasperen Member Posts: 16 Contributor II
edited November 2018 in Help

Would anyone have a pointer to how to generate random numbers from a user specified distribution?



  • wasperenwasperen Member Posts: 16 Contributor II
    One option would be to generate a large number of samples outside RapidMiner and (Bootstrappingly) do a sample from that set... Does that make sense?
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn

    Maybe the Generate Attributes operator will help?

    Here's an example...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="145" width="212">
          <operator activated="true" class="generate_data" compatibility="5.0.11" expanded="true" height="60" name="Generate Data" width="90" x="112" y="75"/>
          <operator activated="true" class="generate_attributes" compatibility="5.0.11" expanded="true" height="76" name="Generate Attributes" width="90" x="308" y="74">
            <list key="function_descriptions">
              <parameter key="myFirstRandomNumber" value="rand()"/>
              <parameter key="mySecondRandomNumber" value="3*rand()+4"/>
            <parameter key="keep_all" value="false"/>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>

  • wasperenwasperen Member Posts: 16 Contributor II
    Hi Andrew,

    Thanks for your attention.

    This will indeed generate random numbers. The first one random numbers between 0.0 and 1.0. The second one between 4 and 7.

    The issue I have is that the rand() function generates numbers with a uniform distribution: every number has an equal chance of being returned. But I would like to have a random generator that could, for instance, be more "normally" distributed: numbers closer to the mean will have a higher chance of getting picked. You know, the nice bell curve...

    Or, even better, any user defined curve of that distribution...

    I have not come across such generator in RapidMiner - but I am sure I'm not the only one who is after such a generator.

    Or am I wrong, anyone?

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello Willem

    You can use other mathematical functions in the generate attributes operator. There's also a noise generator operator that I believe adds a gaussian based distribution of noise.

Sign In or Register to comment.