Can I optimize a custom performance metric?

anaRodriguesanaRodrigues Member Posts: 33 Contributor II
edited March 2021 in Help
I want to generate a F beta score and change beta according to the weight I want to put on precision or recall. And then I would like to add it to a performance vector so I can optimize it. I know there's a 'Performance to data' operator, but what I would need is the reverse. Is there any way I can do this?

EDIT: I found a similar question, but no solution:

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,421 RM Data Scientist
    Solution Accepted
    ehm, F1-Measure is already part of the operator Performance (Binominal)?

    Anyway, you can use Performance to Data to get tp, fp, tn and fn and then calculate the F1 score by hand. Then you can use Extract Performance to get it as a performance vector again. Attached is an example.


    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.001">
      <operator activated="true" class="process" compatibility="9.8.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.8.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.8.001" expanded="true" height="145" name="Validation" width="90" x="246" y="34">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="10"/>
            <parameter key="sampling_type" value="stratified sampling"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.8.001" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34">
                <parameter key="criterion" value="gain_ratio"/>
                <parameter key="maximal_depth" value="10"/>
                <parameter key="apply_pruning" value="true"/>
                <parameter key="confidence" value="0.1"/>
                <parameter key="apply_prepruning" value="true"/>
                <parameter key="minimal_gain" value="0.01"/>
                <parameter key="minimal_leaf_size" value="2"/>
                <parameter key="minimal_size_for_split" value="4"/>
                <parameter key="number_of_prepruning_alternatives" value="3"/>
              <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
              <description align="left" color="green" colored="true" height="80" resized="true" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="9.8.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              <operator activated="true" class="performance_binominal_classification" compatibility="9.8.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
                <parameter key="manually_set_positive_class" value="false"/>
                <parameter key="main_criterion" value="first"/>
                <parameter key="accuracy" value="false"/>
                <parameter key="classification_error" value="false"/>
                <parameter key="kappa" value="false"/>
                <parameter key="AUC (optimistic)" value="false"/>
                <parameter key="AUC" value="false"/>
                <parameter key="AUC (pessimistic)" value="false"/>
                <parameter key="precision" value="false"/>
                <parameter key="recall" value="false"/>
                <parameter key="lift" value="false"/>
                <parameter key="fallout" value="false"/>
                <parameter key="f_measure" value="false"/>
                <parameter key="false_positive" value="true"/>
                <parameter key="false_negative" value="true"/>
                <parameter key="true_positive" value="true"/>
                <parameter key="true_negative" value="true"/>
                <parameter key="sensitivity" value="false"/>
                <parameter key="specificity" value="false"/>
                <parameter key="youden" value="false"/>
                <parameter key="positive_predictive_value" value="false"/>
                <parameter key="negative_predictive_value" value="false"/>
                <parameter key="psep" value="false"/>
                <parameter key="skip_undefined_labels" value="true"/>
                <parameter key="use_example_weights" value="true"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <connect from_op="Performance" from_port="example set" to_port="test set results"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
              <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
            <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a decision tree model.</description>
          <operator activated="true" class="performance_to_data" compatibility="9.8.001" expanded="true" height="82" name="Performance to Data" width="90" x="380" y="136"/>
          <operator activated="true" class="set_role" compatibility="9.8.001" expanded="true" height="82" name="Set Role" width="90" x="514" y="136">
            <parameter key="attribute_name" value="Criterion"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="transpose" compatibility="9.8.001" expanded="true" height="82" name="Transpose" width="90" x="648" y="136"/>
          <operator activated="true" class="filter_example_range" compatibility="9.8.001" expanded="true" height="82" name="Filter Example Range" width="90" x="782" y="136">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="1"/>
            <parameter key="invert_filter" value="false"/>
          <operator activated="true" class="generate_attributes" compatibility="9.8.001" expanded="true" height="82" name="Generate Attributes" width="90" x="916" y="136">
            <list key="function_descriptions">
              <parameter key="F1" value="true_positive/(true_positive+0.5*(false_positive+false_negative))"/>
            <parameter key="keep_all" value="true"/>
          <operator activated="true" class="extract_performance" compatibility="9.8.001" expanded="true" height="82" name="Performance (2)" width="90" x="1050" y="136">
            <parameter key="performance_type" value="data_value"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value="F1"/>
            <parameter key="example_index" value="1"/>
            <parameter key="optimization_direction" value="maximize"/>
          <connect from_op="Retrieve Golf" from_port="output" to_op="Validation" to_port="example set"/>
          <connect from_op="Validation" from_port="performance 1" to_op="Performance to Data" to_port="performance vector"/>
          <connect from_op="Performance to Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance (2)" to_port="example set"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany


  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,421 RM Data Scientist
    The operator you search for is called Extract Performance.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • anaRodriguesanaRodrigues Member Posts: 33 Contributor II
    edited March 2021
    Hi Martin,
    Thank you for your reply, but I don't see how that operator can help me. It only allows me to calculate four statistics: min, max, average and count of a specific attribute. What I want is to calculate the F metric, which comes from the confusion matrix values.
    Thank you
  • anaRodriguesanaRodrigues Member Posts: 33 Contributor II
    Hi Martin,

    Yes the F1-measure is available, but I wanted to calculate F2, F3, F0.5 etc.. which was not possible before. 

    Thank you this solves the problem!
Sign In or Register to comment.