Options

Log the positive label value of the example set

alejandro_tobonalejandro_tobon Member Posts: 16 Maven
edited November 2018 in Help
Hi I want to ask, how do I log the label positive attribute of the exampleset.
I have a multy class example set, so I apply the Polynomial by Binomial Classification with the option
1 against all, this option will run a model for each label it finds on the dataset, in side it I have a descetion tree, a model applier and a log that logs information about AUC and Accuracy, but I can not find a way to log the positive value of the label.
Please let me know If I made my self clear.

Thanks

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you made yourself understandable, but your setup doesn't make much sense. Since you will apply the model on the training data, the performance doesn't give you a clue how good the model will perform on unseen data.
    Since the operator is not designed to work this way, there's no hook to get the current positive class.

    Greetings,
      Sebastian
  • Options
    alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    Hi Sebastian.
    It does meke sense because inside it I have a split validation operator wich separates the data in testing and training, so this way I can measure how well this operator will perform on unseen data, or at least I can have estimation.
    The goal of this model is to get the best values I can train my model for each class. It makes a lot of sense for text classification purposes, where you want to know the best parameter to set the model for classification texts.
    May be I am not using it in a wrong way so I attached the code Im using, If there a better way to use it, please let me know.

    Thanks

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key/>
            <value/>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="428" width="815">
          <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
            <parameter key="file_name" value="C:\Users\Alejandro\Desktop\ToSinc\Austral\Tesis\Security\ParcerCVSNVD\Data\nvdcve-2.0-2009.csv"/>
            <parameter key="skip_comments" value="false"/>
            <parameter key="use_quotes" value="false"/>
            <parameter key="column_separators" value=",\s*"/>
          </operator>
          <operator activated="false" class="sample_stratified" expanded="true" height="76" name="Sample (Stratified)" width="90" x="45" y="120">
            <parameter key="sample_size" value="10"/>
          </operator>
          <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text" width="90" x="45" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Summary"/>
          </operator>
          <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text (2)" width="90" x="179" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="CompromisedSoftware"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="30">
            <parameter key="prune_method" value="percentual"/>
            <list key="specify_weights"/>
            <process expanded="true" height="393" width="633">
              <operator activated="true" class="text:transform_cases" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="75"/>
              <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="246" y="75"/>
              <operator activated="true" class="text:stem_snowball" expanded="true" height="60" name="Stem (Snowball)" width="90" x="380" y="75"/>
              <operator activated="true" class="text:filter_stopwords_english" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="514" y="75"/>
              <connect from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
              <connect from_op="Stem (Snowball)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="54"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" breakpoints="after" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
            <parameter key="name" value="vuln:cwe"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="optimize_parameters_evolutionary" expanded="true" height="112" name="Optimize Parameters (Evolutionary)" width="90" x="514" y="210">
            <list key="parameters">
              <parameter key="Decision Tree (2).confidence" value="[1.0E-7;0.5]"/>
            </list>
            <process expanded="true" height="365" width="569">
              <operator activated="true" class="polynomial_by_binomial_classification" expanded="true" height="76" name="Polynomial by Binomial Classification" width="90" x="112" y="120">
                <process expanded="true" height="383" width="587">
                  <operator activated="true" breakpoints="after" class="split_validation" expanded="true" height="112" name="Validation (2)" width="90" x="246" y="75">
                    <process expanded="true" height="383" width="279">
                      <operator activated="true" class="decision_tree" expanded="true" height="76" name="Decision Tree (2)" width="90" x="89" y="30">
                        <parameter key="confidence" value="0.11025199457227444"/>
                      </operator>
                      <connect from_port="training" to_op="Decision Tree (2)" to_port="training set"/>
                      <connect from_op="Decision Tree (2)" from_port="model" to_port="model"/>
                      <portSpacing port="source_training" spacing="0"/>
                      <portSpacing port="sink_model" spacing="0"/>
                      <portSpacing port="sink_through 1" spacing="0"/>
                    </process>
                    <process expanded="true" height="383" width="279">
                      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                        <list key="application_parameters"/>
                      </operator>
                      <operator activated="true" breakpoints="after" class="performance_binominal_classification" expanded="true" height="76" name="Performance (2)" width="90" x="45" y="120">
                        <parameter key="main_criterion" value="AUC"/>
                        <parameter key="false_positive" value="true"/>
                        <parameter key="false_negative" value="true"/>
                        <parameter key="true_positive" value="true"/>
                        <parameter key="true_negative" value="true"/>
                      </operator>
                      <operator activated="true" class="log" expanded="true" height="76" name="Log" width="90" x="179" y="75">
                        <parameter key="filename" value="logTest.log"/>
                        <list key="log">
                          <parameter key="confidence" value="operator.Decision Tree (2).parameter.confidence"/>
                          <parameter key="AUC" value="operator.Performance (2).value.AUC"/>
                          <parameter key="FN" value="operator.Performance (2).value.false_negative"/>
                          <parameter key="FP" value="operator.Performance (2).value.false_positive"/>
                          <parameter key="TN" value="operator.Performance (2).value.true_negative"/>
                          <parameter key="TP" value="operator.Performance (2).value.true_positive"/>
                          <parameter key="Accuracy" value="operator.Performance (2).value.accuracy"/>
                        </list>
                        <parameter key="persistent" value="true"/>
                      </operator>
                      <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
                      <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
                      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                      <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
                      <connect from_op="Log" from_port="through 1" to_port="averagable 1"/>
                      <portSpacing port="source_model" spacing="0"/>
                      <portSpacing port="source_test set" spacing="0"/>
                      <portSpacing port="source_through 1" spacing="0"/>
                      <portSpacing port="sink_averagable 1" spacing="0"/>
                      <portSpacing port="sink_averagable 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="training set" to_op="Validation (2)" to_port="training"/>
                  <connect from_op="Validation (2)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="246" y="120">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="447" y="210">
                <list key="class_weights"/>
              </operator>
              <connect from_port="input 1" to_op="Polynomial by Binomial Classification" to_port="training set"/>
              <connect from_op="Polynomial by Binomial Classification" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Polynomial by Binomial Classification" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
              <connect from_op="Performance" from_port="performance" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
          <connect from_op="Nominal to Text" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
          <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Evolutionary)" from_port="result 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    ok I see, why this setup is reasonable, but anyway I would not log the performance of each fold separately but the aggregated performance of the XValidation.
    Beside from this I added three operators that might lead you the way how to extract the current label. Since I don't know your actual data and hence am not familiar with the attribute names I entered a generic "label" for the label attribute's name. You will have to adapt it in this way.

    Here's the process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key/>
            <value/>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="428" width="815">
          <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
            <parameter key="file_name" value="C:\Users\Alejandro\Desktop\ToSinc\Austral\Tesis\Security\ParcerCVSNVD\Data\nvdcve-2.0-2009.csv"/>
            <parameter key="skip_comments" value="false"/>
            <parameter key="use_quotes" value="false"/>
            <parameter key="column_separators" value=",\s*"/>
          </operator>
          <operator activated="false" class="sample_stratified" expanded="true" height="76" name="Sample (Stratified)" width="90" x="45" y="120">
            <parameter key="sample_size" value="10"/>
          </operator>
          <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text" width="90" x="45" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Summary"/>
          </operator>
          <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text (2)" width="90" x="179" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="CompromisedSoftware"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="30">
            <parameter key="prune_method" value="percentual"/>
            <list key="specify_weights"/>
            <process expanded="true" height="393" width="633">
              <operator activated="true" class="text:transform_cases" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="75"/>
              <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="246" y="75"/>
              <operator activated="true" class="text:stem_snowball" expanded="true" height="60" name="Stem (Snowball)" width="90" x="380" y="75"/>
              <operator activated="true" class="text:filter_stopwords_english" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="514" y="75"/>
              <connect from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
              <connect from_op="Stem (Snowball)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="54"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" breakpoints="after" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
            <parameter key="name" value="vuln:cwe"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="optimize_parameters_evolutionary" expanded="true" height="112" name="Optimize Parameters (Evolutionary)" width="90" x="514" y="210">
            <list key="parameters">
              <parameter key="Decision Tree (2).confidence" value="[1.0E-7;0.5]"/>
            </list>
            <process expanded="true" height="365" width="569">
              <operator activated="true" class="polynomial_by_binomial_classification" expanded="true" height="76" name="Polynomial by Binomial Classification" width="90" x="112" y="120">
                <process expanded="true" height="383" width="587">
                  <operator activated="true" breakpoints="after" class="split_validation" expanded="true" height="112" name="Validation (2)" width="90" x="246" y="75">
                    <process expanded="true" height="383" width="279">
                      <operator activated="true" class="decision_tree" expanded="true" height="76" name="Decision Tree (2)" width="90" x="89" y="30">
                        <parameter key="confidence" value="0.11025199457227444"/>
                      </operator>
                      <connect from_port="training" to_op="Decision Tree (2)" to_port="training set"/>
                      <connect from_op="Decision Tree (2)" from_port="model" to_port="model"/>
                      <portSpacing port="source_training" spacing="0"/>
                      <portSpacing port="sink_model" spacing="0"/>
                      <portSpacing port="sink_through 1" spacing="0"/>
                    </process>
                    <process expanded="true" height="460" width="426">
                      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                        <list key="application_parameters"/>
                      </operator>
                      <operator activated="true" breakpoints="after" class="performance_binominal_classification" expanded="true" height="76" name="Performance (2)" width="90" x="45" y="120">
                        <parameter key="main_criterion" value="AUC"/>
                        <parameter key="false_positive" value="true"/>
                        <parameter key="false_negative" value="true"/>
                        <parameter key="true_positive" value="true"/>
                        <parameter key="true_negative" value="true"/>
                      </operator>
                      <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="255">
                        <parameter key="condition_class" value="attribute_value_filter"/>
                        <parameter key="parameter_string" value="label!=other"/>
                        <parameter key="invert_filter" value="true"/>
                      </operator>
                      <operator activated="true" class="extract_macro" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="255">
                        <parameter key="macro" value="currentLabel"/>
                        <parameter key="macro_type" value="data_value"/>
                        <parameter key="attribute_name" value="label"/>
                        <parameter key="example_index" value="1"/>
                      </operator>
                      <operator activated="true" class="provide_macro_as_log_value" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="313" y="255">
                        <parameter key="macro_name" value="currentLabel"/>
                      </operator>
                      <operator activated="true" class="log" expanded="true" height="94" name="Log" width="90" x="313" y="75">
                        <parameter key="filename" value="logTest.log"/>
                        <list key="log">
                          <parameter key="confidence" value="operator.Decision Tree (2).parameter.confidence"/>
                          <parameter key="AUC" value="operator.Performance (2).value.AUC"/>
                          <parameter key="FN" value="operator.Performance (2).value.false_negative"/>
                          <parameter key="FP" value="operator.Performance (2).value.false_positive"/>
                          <parameter key="TN" value="operator.Performance (2).value.true_negative"/>
                          <parameter key="TP" value="operator.Performance (2).value.true_positive"/>
                          <parameter key="Accuracy" value="operator.Performance (2).value.accuracy"/>
                          <parameter key="currentLabel" value="operator.Provide Macro as Log Value.value.macro_value"/>
                        </list>
                        <parameter key="persistent" value="true"/>
                      </operator>
                      <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
                      <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
                      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                      <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
                      <connect from_op="Performance (2)" from_port="example set" to_op="Filter Examples" to_port="example set input"/>
                      <connect from_op="Filter Examples" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
                      <connect from_op="Extract Macro" from_port="example set" to_op="Provide Macro as Log Value" to_port="through 1"/>
                      <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 2"/>
                      <connect from_op="Log" from_port="through 1" to_port="averagable 1"/>
                      <portSpacing port="source_model" spacing="0"/>
                      <portSpacing port="source_test set" spacing="0"/>
                      <portSpacing port="source_through 1" spacing="0"/>
                      <portSpacing port="sink_averagable 1" spacing="0"/>
                      <portSpacing port="sink_averagable 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="training set" to_op="Validation (2)" to_port="training"/>
                  <connect from_op="Validation (2)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="246" y="120">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="447" y="210">
                <list key="class_weights"/>
              </operator>
              <connect from_port="input 1" to_op="Polynomial by Binomial Classification" to_port="training set"/>
              <connect from_op="Polynomial by Binomial Classification" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Polynomial by Binomial Classification" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
              <connect from_op="Performance" from_port="performance" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
          <connect from_op="Nominal to Text" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
          <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Evolutionary)" from_port="result 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian
  • Options
    alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    Sebastian.
    These tools, and your team are really amazing, this code really helped me a lot.
    Thank you very much.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Thanks for the kind words :)

    Greetings,
      Sebastian
Sign In or Register to comment.