Options

Help me to find out the right target role for the attributes

mrs_vendormrs_vendor Member Posts: 3 Contributor I
edited November 2018 in Help

How do I know, what kind of target role I have to use for the attributes?

I have a school project and I have to take a fraud detection.

I have to use the classification and the decision tree. With these tools, I want to get the characteristics of the customers, who are frauds. So I know the attribute "fraud" has the target role "label" and the "Kunden_ID" (="customer_ID") has the target role "id".

But I don´t know what target roles the other attributes have. If I choose for the other attributes the target role "regular", my decision tree will be flawed.

I would be very grateful if I could get help!

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Fraud Detetction Balanced_a2" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/data/Fraud Detetction Balanced_a2"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attributes" value="Alter|Betrugsfall|Forderungsbetrag|Kunden_ID|Polizeilicher_Bericht|Geschlecht|Anzahl_vergangener_Tage|Anzahl_Forderungen_seit_zwei_Jahren"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="Kunden_ID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="Betrugsfall" value="label"/>
<parameter key="Alter" value="regular"/>
<parameter key="Geschlecht" value="regular"/>
<parameter key="Anzahl_Forderungen_seit_zwei_Jahren" value="regular"/>
<parameter key="Anzahl_vergangener_Tage" value="regular"/>
<parameter key="Forderungs_ID" value="id"/>
<parameter key="Forderungsbetrag" value="regular"/>
<parameter key="Forderungskategorie" value="regular"/>
<parameter key="Polizeilicher_Bericht" value="regular"/>
</list>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="447" y="34"/>
<connect from_op="Retrieve Fraud Detetction Balanced_a2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @mrs_vendor,

     

    So that we can reproduce what you obtain, can you share your dataset ?

    Otherwise, what do you mean by  : my decision tree will be "flawed".

    Without seeing the results, maybe you can uncheck the parameters apply pruning and/or apply prepruning 

    of the Decision Tree operator ?

     

    Regards,

     

     

    Lionel

  • Options
    mrs_vendormrs_vendor Member Posts: 3 Contributor I

    Hello Lionel,

    thank you very much for your answer.
    I will share my dataset via dropbox because it is an excel file.

    https://www.dropbox.com/s/n62qem3smy9s595/Fraud%20Detetction%20Balanced_a2.xlsx?dl=0

    With "My decision tree will be flawed" I meant, that the results of the decision tree make no sense and I think that´s because of the wrong target roles.

     

    best regards

  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @mrs_vendor

     

    One thing to notice, you don't have to explicitly set the role 'regular' for ALL attributes, as by default they all are 'regular'. If you have a dataset loaded you need only to specify 'label' attribute (the one you need to predict) and then exclude attributes which are not relevant for prediction, in your case I would exclude ID attribute from modelling. 

     

    Otherwise, as @lionelderkrikor has mentioned, it would be beneficial if you could share your dataset here.  

  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @mrs_vendor

     

    Please check attached process, this can be a starting point for your modelling. It reads your Excel file directly into RapidMIner, so make sure you have specified the correct path to the file in 'Read Excel' settings.

     

    Please note that all ID attributes should be excluded from modelling process.

    I also have noticed that your dataset have duplicated entries, so it might make sense that you also remove them before modelling. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.1.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="/Users/kypexin/Downloads/Fraud Detetction Balanced_a2.xlsx"/>
    <parameter key="imported_cell_range" value="A1:J1703"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Kunden_ID.false.integer.attribute"/>
    <parameter key="1" value="Forderungs_ID.false.integer.attribute"/>
    <parameter key="2" value="Alter.true.integer.attribute"/>
    <parameter key="3" value="Geschlecht.true.binominal.attribute"/>
    <parameter key="4" value="Anzahl_vergangener_Tage.true.integer.attribute"/>
    <parameter key="5" value="Forderungskategorie.true.polynominal.attribute"/>
    <parameter key="6" value="Polizeilicher_Bericht.true.polynominal.attribute"/>
    <parameter key="7" value="Forderungsbetrag.true.numeric.attribute"/>
    <parameter key="8" value="Anzahl_Forderungen_seit_zwei_Jahren.true.integer.attribute"/>
    <parameter key="9" value="Betrugsfall.true.binominal.label"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Kunden_ID"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="split_data" compatibility="8.1.001" expanded="true" height="103" name="Split Data" width="90" x="313" y="136">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.8"/>
    <parameter key="ratio" value="0.2"/>
    </enumeration>
    </operator>
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="447" y="34">
    <parameter key="criterion" value="information_gain"/>
    <parameter key="apply_pruning" value="false"/>
    <parameter key="apply_prepruning" value="false"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="581" y="136">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_binominal_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="715" y="136">
    <parameter key="AUC" value="true"/>
    <parameter key="precision" value="true"/>
    <parameter key="recall" value="true"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Options
    mrs_vendormrs_vendor Member Posts: 3 Contributor I

    Hello Vladimir,

    thank you very very much. That helps me a lot! :)

     

    best regards

Sign In or Register to comment.