[SOLVED] discretize by entropy evaluation

makak · May 2014

Hi,

I would like to use discretize by entropy operator with naive bayes classifier. As far as I understand discretize by entropy depends on class value and I it would not be correct to first discretize all dataset and then perform cross validation. I would like to set up experiment where in every test fold of cross-validation I discretize data by entropy and in test fold the classifier is evaluated on on test set discretize by bin intervals from train set fold. Is this possible. I am not sure If I was clear, simply I wish to classified new data using classifier build on discretized data, how I should apply the same discretization intervals on new data?
Any help, comment would be very appreciated.
Thank you.

Matus

fras · May 2014

Discretize operators provide an additional port with a preprocessing model. This can be used
in the X-Validation to ensure that the same preprocessing model from train is used with the test set:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.0.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="6.0.003" expanded="true" height="60" name="Sonar" width="90" x="112" y="120">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="6.0.003" expanded="true" height="112" name="Validation" width="90" x="246" y="120">
        <process expanded="true">
          <operator activated="true" class="discretize_by_entropy" compatibility="6.0.003" expanded="true" height="94" name="Discretize" width="90" x="45" y="30">
            <parameter key="attribute" value="Temperature"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="6.0.003" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="165"/>
          <operator activated="true" class="group_models" compatibility="6.0.003" expanded="true" height="94" name="Group Models" width="90" x="246" y="30"/>
          <connect from_port="training" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Discretize" from_port="preprocessing model" to_op="Group Models" to_port="models in 1"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Group Models" to_port="models in 2"/>
          <connect from_op="Group Models" from_port="model out" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="6.0.003" expanded="true" height="76" name="Apply Model" width="90" x="112" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_binominal_classification" compatibility="6.0.003" expanded="true" height="76" name="Performance" width="90" x="246" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Sonar" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="90"/>
      <portSpacing port="sink_result 2" spacing="54"/>
    </process>
  </operator>
</process>

makak · May 2014

Thank you very much. You saved my * ,exactly what I was looking for.

halimprabowo · October 2018

So it means the model or the bin created from the discretized process is applied to the new data right?

Not applying a new "discretize by entropy" preprocessing to the new data, I'm sorry if this is confusing, I only want to make sure.

Thank You

MartinLiebig · October 2018

Hi,

yes. You should apply the preprocessing model to the new data set.

BR,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[SOLVED] discretize by entropy evaluation

Answers