Benford's Law?

gfsi_jgfsi_j Member Posts: 2 Contributor I
Hello,

I'm new to RapidMiner, and I'm looking to develop some processes for fraud detection.  To that end, one thing I'm curious about is whether RapidMiner has any tools to apply Benford's Law to help find possibly fabricated data?  I haven't been able to find any operators for that purpose, but perhaps I am looking in the wrong place.

Thanks!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    as far as I know we do not have this in as a native operator. There might be a simply way to build it with 3-4 operators.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Here's the process I use for Benford.  I can't claim credit, I think this might be one of Tobias' originally.  Very handy in it will accept any numerical attribute simply by tweaking the first macro. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="generate_transfer_data" compatibility="6.4.000" expanded="true" height="60" name="Generate Transfer Data" width="90" x="45" y="75"/>
          <operator activated="true" class="set_macro" compatibility="6.4.000" expanded="true" height="76" name="Set Macro" width="90" x="179" y="30">
            <parameter key="macro" value="ATTRIBUTE"/>
            <parameter key="value" value="Amount"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="6.0.003" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="%{ATTRIBUTE}"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
            <list key="function_descriptions">
              <parameter key="digit" value="cut(%{ATTRIBUTE},0,1)"/>
              <parameter key="digit_complex" value="floor(parse(%{ATTRIBUTE})/pow(10,floor(log(parse(%{ATTRIBUTE})))))"/>
            </list>
          </operator>
          <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
            <list key="aggregation_attributes">
              <parameter key="digit" value="count (fractional)"/>
            </list>
            <parameter key="group_by_attributes" value="digit"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="6.4.000" expanded="true" height="94" name="Filter Examples" width="90" x="380" y="120">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="digit.does_not_equal.0"/>
            </list>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="514" y="120">
            <list key="function_descriptions">
              <parameter key="benford" value="log(1+1/parse(digit))"/>
            </list>
          </operator>
          <connect from_op="Generate Transfer Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
          <connect from_op="Set Macro" from_port="through 1" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    And i got a new building block!

    Thanks a lot John.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • gfsi_jgfsi_j Member Posts: 2 Contributor I
    Thanks very much, JEdward!  That's very helpful  :)
Sign In or Register to comment.