Discover hidden rules among data

lucnaplucnap Member Posts: 2 Contributor I
edited November 2018 in Help
Hello, I'm playing a bit with RapidMiner and I'm doing some experiments.
I would like to know if is possible to discover hidden rules among related data.
Here an example: P1P2A1A2RESULT234515678959101112131351415161724318192021?
Here the hidden rule is: RESULT= (P1 * P2) + A1 + A2

How can I achieve this with RapidMiner? Can I discover the rule if it exists?

And if a precise rule doesn't exist, how can I predict (the near) in the last row RESULT value based on the previous examples?




  • haddockhaddock Member Posts: 849 Maven
    Hi Luciano,

    What an interesting question! Although this doesn't look like standard RM fare it is because of a little used lovely that creates new attributes by mathematically combining existing attributes,  and remembering their construction. So the following code repeatedly and recursively generates new attributes and skims off the best. If you click on the top right of the meta data view you can display the constructions   :D There remains a little glitch about renaming attributes with the same construction, but a start at least.

    Doubtless better results can be achieved by tweaking things with an optimiser, and using more operators but I'm quite surprised at how well a two sign zombie can get on four examples, and you get to understand the output..

    Anyways, a really nice puzzle, thanks.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <operator activated="true" class="process" expanded="true" name="Process">
       <process expanded="true" height="476" width="868">
         <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
           <parameter key="file_name" value="C:\Haddock\luciano.csv"/>
         <operator activated="true" class="parse_numbers" expanded="true" height="76" name="Parse Numbers" width="90" x="45" y="165"/>
         <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="45" y="255">
           <parameter key="name" value="RESULT"/>
           <parameter key="target_role" value="label"/>
         <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="345">
           <parameter key="condition_class" value="missing_labels"/>
           <parameter key="invert_filter" value="true"/>
         <operator activated="true" class="remember" expanded="true" height="60" name="Store Examples" width="90" x="179" y="120">
           <parameter key="name" value="stack"/>
           <parameter key="io_object" value="ExampleSet"/>
         <operator activated="true" class="loop" expanded="true" height="76" name="Generate Formulae" width="90" x="313" y="120">
           <parameter key="iterations" value="10"/>
           <process expanded="true" height="399" width="868">
             <operator activated="true" class="recall" expanded="true" height="60" name="Recall" width="90" x="45" y="75">
               <parameter key="name" value="stack"/>
               <parameter key="io_object" value="ExampleSet"/>
               <parameter key="remove_from_store" value="false"/>
             <operator activated="true" class="generate_function_set" expanded="true" height="76" name="Generate Function Set" width="90" x="179" y="75">
               <parameter key="use_plus" value="true"/>
               <parameter key="use_mult" value="true"/>
             <operator activated="true" class="weight_by_correlation" expanded="true" height="76" name="Weight by Correlation" width="90" x="313" y="75">
               <parameter key="squared_correlation" value="true"/>
             <operator activated="true" class="select_by_weights" expanded="true" height="94" name="Select by Weights" width="90" x="447" y="75">
               <parameter key="weight_relation" value="top k"/>
               <parameter key="weight" value="0.95"/>
               <parameter key="k" value="40"/>
             <operator activated="true" class="remember" expanded="true" height="60" name="Remember" width="90" x="629" y="80">
               <parameter key="name" value="stack"/>
               <parameter key="io_object" value="ExampleSet"/>
             <connect from_op="Recall" from_port="result" to_op="Generate Function Set" to_port="example set input"/>
             <connect from_op="Generate Function Set" from_port="example set output" to_op="Weight by Correlation" to_port="example set"/>
             <connect from_op="Weight by Correlation" from_port="weights" to_op="Select by Weights" to_port="weights"/>
             <connect from_op="Weight by Correlation" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
             <connect from_op="Select by Weights" from_port="example set output" to_op="Remember" to_port="store"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
         <operator activated="true" breakpoints="after" class="recall" expanded="true" height="60" name="Recover Formulae" width="90" x="447" y="120">
           <parameter key="name" value="stack"/>
           <parameter key="io_object" value="ExampleSet"/>
         <connect from_op="Read CSV" from_port="output" to_op="Parse Numbers" to_port="example set input"/>
         <connect from_op="Parse Numbers" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
         <connect from_op="Filter Examples" from_port="example set output" to_op="Store Examples" to_port="store"/>
         <connect from_op="Store Examples" from_port="stored" to_op="Generate Formulae" to_port="input 1"/>
         <connect from_op="Recover Formulae" from_port="result" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>

    PS. There is a more succinct function which does not use P2 !
          Result=(P1*P1)+P1+ A1+A2
  • lucnaplucnap Member Posts: 2 Contributor I
    Hi Haddock, thank you so much for your reply. I'm not an experienced RM user as you are.
    In my example I used only 4 rows of data only for simplicity, but you can create as much as you want. Also the numbers used to calculate the result can (and should) be random.
    Is there a system to resolve with neural nets?
    I learned a lot from your code. If you have other ideas please post them.

  • karlrbkarlrb Member Posts: 4 Contributor I
    Very interesting!  But I get the message

    Meta data is underspecified. Cannot check precondition.

    when I attempt to run the program.  Help needed by this relative newbie to rm.

  • haddockhaddock Member Posts: 849 Maven
    Hi there Karl,

    It looks like you've got the data sorted out ( I seem to remember filling in the last question mark as 383 and just pasting into a csv ), so in fact you're good to go. I also get that message, it is a warning which means that RM cannot validate the setup, it does not necessarily mean that it cannot run the process.

    In this process that sounds about right, because it has something of a dynamic approach, as it recursively generates and tests. So ignore the warning and press the start button, you have nothing to lose but your sanity  :o

Sign In or Register to comment.