🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Benford's Law?

gfsi_jgfsi_j Member Posts: 2 Contributor I
Hello,

I'm new to RapidMiner, and I'm looking to develop some processes for fraud detection.  To that end, one thing I'm curious about is whether RapidMiner has any tools to apply Benford's Law to help find possibly fabricated data?  I haven't been able to find any operators for that purpose, but perhaps I am looking in the wrong place.

Thanks!

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,127  RM Data Scientist
    Hi,

    as far as I know we do not have this in as a native operator. There might be a simply way to build it with 3-4 operators.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 574   Unicorn
    Here's the process I use for Benford.  I can't claim credit, I think this might be one of Tobias' originally.  Very handy in it will accept any numerical attribute simply by tweaking the first macro. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="generate_transfer_data" compatibility="6.4.000" expanded="true" height="60" name="Generate Transfer Data" width="90" x="45" y="75"/>
          <operator activated="true" class="set_macro" compatibility="6.4.000" expanded="true" height="76" name="Set Macro" width="90" x="179" y="30">
            <parameter key="macro" value="ATTRIBUTE"/>
            <parameter key="value" value="Amount"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="6.0.003" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="%{ATTRIBUTE}"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
            <list key="function_descriptions">
              <parameter key="digit" value="cut(%{ATTRIBUTE},0,1)"/>
              <parameter key="digit_complex" value="floor(parse(%{ATTRIBUTE})/pow(10,floor(log(parse(%{ATTRIBUTE})))))"/>
            </list>
          </operator>
          <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
            <list key="aggregation_attributes">
              <parameter key="digit" value="count (fractional)"/>
            </list>
            <parameter key="group_by_attributes" value="digit"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="6.4.000" expanded="true" height="94" name="Filter Examples" width="90" x="380" y="120">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="digit.does_not_equal.0"/>
            </list>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="514" y="120">
            <list key="function_descriptions">
              <parameter key="benford" value="log(1+1/parse(digit))"/>
            </list>
          </operator>
          <connect from_op="Generate Transfer Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
          <connect from_op="Set Macro" from_port="through 1" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,127  RM Data Scientist
    And i got a new building block!

    Thanks a lot John.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • gfsi_jgfsi_j Member Posts: 2 Contributor I
    Thanks very much, JEdward!  That's very helpful  :)
Sign In or Register to comment.