Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
PMML
I finally downloaded version 5. What made me do it? PMML.
I have to admit that I'm still in a transition stage. I like many new things in 5.0 (reporting, parallel processing, pmml) but I'm still used to the tree paradigm. I don't understand other things (why can't I define labels and ids in the ReadCSV operator as before,etc).
But I'm with version 5.0 from now on. I'm having some problems with the PMML operators. If I use examples generating data from inside RM, I get no errors. For instance:
What am I doing wrong?
Another problem (even with RM generated data): When I try to run a logistic regression, I get an error indicating that the class MyKLRModel cannot be exported to PMML.
Thanks in advance for any help,
\Ernesto
I have to admit that I'm still in a transition stage. I like many new things in 5.0 (reporting, parallel processing, pmml) but I'm still used to the tree paradigm. I don't understand other things (why can't I define labels and ids in the ReadCSV operator as before,etc).
But I'm with version 5.0 from now on. I'm having some problems with the PMML operators. If I use examples generating data from inside RM, I get no errors. For instance:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>But if I try to read a very simple CSV file. You can download it here http://dl.dropbox.com/u/5477950/cerveza.csv using the following code, I ran into trouble:
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="668" width="770">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="112" y="75">
<parameter key="target_function" value="polynomial"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="319" y="75"/>
<operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="549" y="71">
<parameter key="file" value="c:\linreg.xml"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Write PMML" to_port="model"/>
<connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>I get the error message: "The setup does not seem to contain any obvious errors, but you should check the logs..."
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="735" width="985">
<operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
<parameter key="file_name" value="c:\cerveza.csv"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
<parameter key="name" value="cerveza"/>
<parameter key="target_role" value="label"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="478" y="28"/>
<operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="685" y="28">
<parameter key="file" value="c:\linreg.xml"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Write PMML" to_port="model"/>
<connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
What am I doing wrong?
Another problem (even with RM generated data): When I try to run a logistic regression, I get an error indicating that the class MyKLRModel cannot be exported to PMML.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>If I try the evolutionary version of LR, I get the same error.
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="735" width="985">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="85" y="48">
<parameter key="target_function" value="sum classification"/>
</operator>
<operator activated="true" class="logistic_regression" expanded="true" height="94" name="Logistic Regression" width="90" x="261" y="46"/>
<operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="479" y="45">
<parameter key="file" value="c:\logistic.xml"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Logistic Regression" from_port="model" to_op="Write PMML" to_port="model"/>
<connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thanks in advance for any help,
\Ernesto
0
Answers
first of all some good news: you still hav a tree view in RM5, just go to the menu View -> Schow View -> Tree and you can work like in RM 4.6
The functionality of the CSV Reader is not like it supposed to be. Thats the reason why we are reimplementing it right now. The new readers will be part of the next RM update.
The PMML problem you posted seems to be a bug. We are working on it right now and will post a reply when we have a solution.
Ciao Sebastian
in fact the linear regression problem is a bug, that I have solved. The fixed version will be shipped with the next update of the PMML extension.
The logistic regression does work only if you switch the evolutionary to kernel "dot", since otherwise PMML is not powerful enough to represent the results.
Greetings,
Sebastian
S. Lohr:
I knew about the Tree view, but I couldn't make it work. I hadn't discovered the Operator Wiring options. Now I can make it work. It is really nice to have both approaches available.
I look forward to the new CSVReader operator. BTW, I understand you are working too the Attribute Editor. I miss that one. I like how easy it is now to import files (CSV, Excel) into the Repository. Having the ability to modify an existing Dataset in the repository is going to be nice.
S. Land:
I look forward to the new PMML version then.
With respect to your second point, I haven't been successful in producing a PMML file with either of the 2 LogisticRegression operators (even using your suggestion about the Kernel type=dot). I always get the same message about "KernelLogisticModel cannot be exported to PMML".
Thanks again for your time.
\Ernesto