RapidMiner

RapidMiner

PaREn Extension

Regular Contributor

PaREn Extension

Dear everyone,

I found the new update for RapidMiner includes the PaREn Extension, which claims that it can suggest a most suitable classification method for the dataset. I would like very much to know how to use this extension.

Regards,
Gary
22 REPLIES
Regular Contributor

Re: PaREn Extension


Hi,

Try this

http://madm.dfki.de/rapidminer/wizard

However, perhaps some fixing may still be needed; I have tried to follow the guidelines in a simple test and was not successful in running it till the end.

Regards
Dan
Super Contributor

Re: PaREn Extension

Hello all,

I found the LandMarking operator doesn't work out of the box but by deselecting the "Linear Discriminant" check box I got a successful run.

Here's an example that predicts the KNN operator will do best on the Sonar data set and lo and behold it seems to - so that's quite cool.


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
    <process expanded="true" height="557" width="614">
      <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="Sonar data set" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.0.10" expanded="true" height="130" name="Multiply" width="90" x="45" y="210"/>
      <operator activated="true" class="x_validation" compatibility="5.0.10" expanded="true" height="112" name="Decision Tree (2)" width="90" x="179" y="390">
        <description>A cross-validation evaluating a decision tree model.</description>
        <process expanded="true" height="549" width="310">
          <operator activated="true" class="decision_tree" compatibility="5.0.10" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30"/>
          <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="549" width="310">
          <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model (3)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.0.10" expanded="true" height="76" name="Performance (Decision Tree)" width="90" x="179" y="30"/>
          <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (Decision Tree)" to_port="labelled data"/>
          <connect from_op="Performance (Decision Tree)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.0.10" expanded="true" height="112" name="Naive Bayes" width="90" x="179" y="255">
        <description>A cross-validation evaluating a decision tree model.</description>
        <process expanded="true" height="396" width="301">
          <operator activated="true" class="naive_bayes_kernel" compatibility="5.0.10" expanded="true" height="76" name="Naive Bayes (Kernel)" width="90" x="110" y="30"/>
          <connect from_port="training" to_op="Naive Bayes (Kernel)" to_port="training set"/>
          <connect from_op="Naive Bayes (Kernel)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="396" width="301">
          <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.0.10" expanded="true" height="76" name="Performance (Naive Bayes)" width="90" x="179" y="30"/>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (Naive Bayes)" to_port="labelled data"/>
          <connect from_op="Performance (Naive Bayes)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.0.0" expanded="true" height="112" name="KNN" width="90" x="179" y="120">
        <description>A cross-validation evaluating a decision tree model.</description>
        <process expanded="true" height="654" width="466">
          <operator activated="true" class="k_nn" compatibility="5.0.10" expanded="true" height="76" name="k-NN" width="90" x="179" y="30"/>
          <connect from_port="training" to_op="k-NN" to_port="training set"/>
          <connect from_op="k-NN" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="654" width="466">
          <operator activated="true" class="apply_model" compatibility="5.0.0" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.0.0" expanded="true" height="76" name="Performance (KNN)" width="90" x="179" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (KNN)" to_port="labelled data"/>
          <connect from_op="Performance (KNN)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="paren:landmarking" compatibility="5.0.0" expanded="true" height="60" name="LandMarking" width="90" x="179" y="30">
        <parameter key="Linear Discriminant" value="false"/>
        <parameter key="Cross-validation" value="true"/>
        <parameter key="Normalize Dataset" value="false"/>
      </operator>
      <connect from_op="Sonar data set" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="LandMarking" to_port="exampleset"/>
      <connect from_op="Multiply" from_port="output 2" to_op="KNN" to_port="training"/>
      <connect from_op="Multiply" from_port="output 3" to_op="Naive Bayes" to_port="training"/>
      <connect from_op="Multiply" from_port="output 4" to_op="Decision Tree (2)" to_port="training"/>
      <connect from_op="Decision Tree (2)" from_port="averagable 1" to_port="result 4"/>
      <connect from_op="Naive Bayes" from_port="averagable 1" to_port="result 3"/>
      <connect from_op="KNN" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="LandMarking" from_port="exampleset" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>




Andrew
Regular Contributor

Re: PaREn Extension

Dear Dan,

Thank you! The link is exactly what I need.

Regards,
Gary
Elite

Re: PaREn Extension

Hi,
we are in contact with the guys from DFKI contributing this extension. They found out, it runs fine under linux but fails on windows machines. We will publish a new version as soon as possible.

Greetings,
  Sebastian
Old World Computing - Establishing the Future

Check out the Jackhammer Extension for RapidMiner! Crunch more data easier and with up to 700% speed up! Available only here

Super Contributor

Re: PaREn Extension

Hi all,

the fix is on the update server.

Best,
Simon
Regular Contributor

Re: PaREn Extension

Ah so I wasn't the only one crashing this plugin on a Windows machine.  Thanks for the quick fix guys.

Thanks,
Tom

Regular Contributor

Re: PaREn Extension

Hi,

It is a great and very useful initiative to provide such an extension as PaREn. This kind of feature is included in other major DM software, so it was time. Many thanks to the PaREn team! Smiley Happy

I have tested this feature again since operational on Windows machines, and would like to make some constructive comments that, added to those to follow from other guys, would hopefully be a useful feedback to the developers, for future improvements.

Using a dataset of 1000 rows with a binominal label, the accuracy of a PaREn optimised classifier based on decision trees was 0.692, actually under the accuracy 0.726 of the elementary zeroR model (based on taking the mode as the prediction in all cases). Separately I built a decision tree at a glance, that gave an accuracy of 0.737 - very small improvement, model that was tested via cross validation.

Not sure if the current order in which the figures are is statistically significant, but anyway, one would normally expect the PaREn optimised classifier to outperform both the subsequent DT and the trivial model blindly predicting the most frequent class.

Any other guys with comments on their results?

BTW, most probably the answer is yes - but could the PaREn team tell us whether they made use of the ROC analysis implemented in RM, among others, to optimise accuracy? Thanks.

Regards,
Dan
Contributor

Re: PaREn Extension

Hi Dan,


It is a great and very useful initiative to provide such an extension as PaREn. This kind of feature is included in other major DM software, so it was time. Many thanks to the PaREn team! Smiley Happy

Thanks for your encouraging remarks. Can you please point to some DM software that has similar functionality?


Not sure if the current order, in which the figures are, is statistically significant, but anyway one would normally expect the PaREn optimised classifier to outperform both the subsequent DT and the trivial model blindly predicting the most frequent class.

You are right. Generally, optimized classifiers should perform better than a manually tuned one. However, currently we are doing a coarse grid search for a few parameters while using default values for others. In case of decision trees, search is just limited to the 'confidence' parameter. Any suggestions about which parameters to optimize are welcome.


could the PaREn team tell us whether they made use of the ROC analysis implemented in RM, among others, to optimise accuracy?

No, we are simply using classification accuracy for optimization purpose.

Cheers,
Faisal
Regular Contributor

Re: PaREn Extension

Faisal,

Thanks so much for providing this plugin! It really helps me in my data discovery tasks.

Regards,
Tom
www.neuralmarkettrends.com