PMML

earmijoearmijo Member Posts: 270 Unicorn
edited November 2018 in Help
I finally downloaded version 5. What made me do it? PMML.

I have to admit that I'm still in a transition stage. I like many new things in 5.0 (reporting, parallel processing, pmml) but I'm still used to the tree paradigm. I don't understand other things (why can't I define labels and ids in the ReadCSV operator as before,etc).

But I'm with version 5.0 from now on. I'm having some problems with the PMML operators. If I use examples generating data from inside RM, I get no errors. For instance:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="668" width="770">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="112" y="75">
        <parameter key="target_function" value="polynomial"/>
      </operator>
      <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="319" y="75"/>
      <operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="549" y="71">
        <parameter key="file" value="c:\linreg.xml"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Linear Regression" to_port="training set"/>
      <connect from_op="Linear Regression" from_port="model" to_op="Write PMML" to_port="model"/>
      <connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
But if I try to read a very simple CSV file. You can download it here http://dl.dropbox.com/u/5477950/cerveza.csv using the following code, I ran into trouble:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="735" width="985">
      <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
        <parameter key="file_name" value="c:\cerveza.csv"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
        <parameter key="name" value="cerveza"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="478" y="28"/>
      <operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="685" y="28">
        <parameter key="file" value="c:\linreg.xml"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
      <connect from_op="Linear Regression" from_port="model" to_op="Write PMML" to_port="model"/>
      <connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
I get the error message: "The setup does not seem to contain any obvious errors, but you should check the logs..."

What am I doing wrong?


Another problem (even with RM generated data): When I try to run a logistic regression, I get an error indicating that the class MyKLRModel cannot be exported to PMML.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="735" width="985">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="85" y="48">
        <parameter key="target_function" value="sum classification"/>
      </operator>
      <operator activated="true" class="logistic_regression" expanded="true" height="94" name="Logistic Regression" width="90" x="261" y="46"/>
      <operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="479" y="45">
        <parameter key="file" value="c:\logistic.xml"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Logistic Regression" to_port="training set"/>
      <connect from_op="Logistic Regression" from_port="model" to_op="Write PMML" to_port="model"/>
      <connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
If I try the evolutionary version of LR, I get the same error.

Thanks in advance for any help,

\Ernesto

Answers

  • SebastianLohSebastianLoh Member Posts: 99 Contributor II
    Hi Ernesto,

    first of all some good news: you still hav a tree view in RM5, just go to the menu View -> Schow View -> Tree and you can work like in RM 4.6

    The functionality of the CSV Reader is not like it supposed to be. Thats the reason why we are reimplementing it right now. The new readers  will be part of the next RM update.

    The PMML problem you posted seems to be a bug. We are working on it right now and will post a reply when we have a solution.

    Ciao Sebastian
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    in fact the linear regression problem is a bug, that I have solved. The fixed version will be shipped with the next update of the PMML extension.
    The logistic regression does work only if you switch the evolutionary to kernel "dot", since otherwise PMML is not powerful enough to represent the results.

    Greetings,
      Sebastian
  • earmijoearmijo Member Posts: 270 Unicorn
    Thank you very much for your quick answers,  Sebastian(s).

    S. Lohr:

    I knew about the Tree view, but I couldn't make it work. I hadn't discovered the Operator Wiring options. Now I can make it work. It is really nice to have both approaches available.

    I look forward to the new CSVReader operator. BTW, I understand you are working too the Attribute Editor. I miss that one. I like how easy it is now to import files (CSV, Excel) into the Repository. Having the ability to modify an existing Dataset in the repository is going to be nice.

    S. Land:

    I look forward to the new PMML version then.

    With respect to your second point, I haven't been successful in producing a PMML file with either of the 2 LogisticRegression operators (even using your suggestion about the Kernel type=dot).  I always get the same message about "KernelLogisticModel cannot be exported to PMML".

    Thanks again for your time.

    \Ernesto
Sign In or Register to comment.