Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Optimizing Set Macro on 7.5
JEdward
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
Is anyone else finding problems optimizing Set Macro in version 7.5 of RapidMiner?
Was trying to optimize a python model & found that the value parametrer of Set Macro doesn't appear in Optimize Evolutionary.
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="optimize_parameters_evolutionary" compatibility="6.0.003" expanded="true" height="103" name="Optimize Parameters (Evolutionary)" width="90" x="313" y="34">
<list key="parameters">
<parameter key="nTree.value" value="[1.0;100.0]"/>
</list>
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.5.001" expanded="true" height="82" name="Hyperparameters" width="90" x="112" y="34">
<process expanded="true">
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="nTree" width="90" x="45" y="34">
<parameter key="macro" value="nTree"/>
<parameter key="value" value="200"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="minSizeSplit" width="90" x="246" y="34">
<parameter key="macro" value="minSizeSplit"/>
<parameter key="value" value="4"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="minLeafSize" width="90" x="45" y="289">
<parameter key="macro" value="minLeafSize"/>
<parameter key="value" value="2"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="maxDepth" width="90" x="45" y="391">
<parameter key="macro" value="maxDepth"/>
<parameter key="value" value="20"/>
</operator>
<connect from_port="in 1" to_op="nTree" to_port="through 1"/>
<connect from_op="nTree" from_port="through 1" to_op="minSizeSplit" to_port="through 1"/>
<connect from_op="minSizeSplit" from_port="through 1" to_op="minLeafSize" to_port="through 1"/>
<connect from_op="minLeafSize" from_port="through 1" to_op="maxDepth" to_port="through 1"/>
<connect from_op="maxDepth" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Cross Validation 2" width="90" x="447" y="34">
<parameter key="use_local_random_seed" value="true"/>
<process expanded="true">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="BDT (sklearn)" width="90" x="112" y="34">
<parameter key="script" value=" import pandas as pd from sklearn.ensemble import GradientBoostingClassifier from sklearn.ensemble import RandomForestClassifier #use RandomForestRegressor for regression problem # This script creates a RandomForestClassifier from SKLearn on RM data # It can be used as a generic template for other sklearn classifiers or regressors def rm_main(data): metadata = data.rm_metadata # Get the list of regular attributes and the label df = pd.DataFrame(metadata).T label = df[df[1]=="label"].index.values regular = df[df[1] != df[1]].index.values # === RandomForest === # # Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset # Create Random Forest object model= RandomForestClassifier(n_estimators = %{nTree} , max_depth = %{maxDepth} , min_samples_split = %{minSizeSplit} # The minimum number of samples required to split an internal node , min_samples_leaf = %{minLeafSize} # The minimum number of samples required to be at a leaf node ) # Train the model using the training sets and check score # model.fit(X, y) model.fit(data[regular], data[label]) # Predict Output # predicted = model.predict(x_test) return (model,regular,label[0]), data"/>
</operator>
<connect from_port="training set" to_op="BDT (sklearn)" to_port="input 1"/>
<connect from_op="BDT (sklearn)" from_port="output 1" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="103" name="Apply Model (2)" width="90" x="112" y="34">
<parameter key="script" value="import pandas as pd # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def rm_main(rfinfo, data): rf = rfinfo[0] regular = rfinfo[1] label = rfinfo[2] meta = data.rm_metadata predictions = rf.predict(data[regular]) confidences = rf.predict_proba(data[regular]) predictions = pd.DataFrame(predictions, columns=["prediction("+label+")"]) confidences = pd.DataFrame(confidences, columns=["confidence(" + str(c) + ")" for c in rf.classes_]) data = data.join(predictions) data = data.join(confidences) data.rm_metadata = meta data.rm_metadata["prediction("+label+")"] = ("nominal","prediction") for c in rf.classes_: data.rm_metadata["confidence("+str(c)+")"] = ("numerical","confidence_"+str(c)) return data, rf"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="7.5.001" expanded="true" height="82" name="Python" width="90" x="246" y="34">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="input 1"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="input 2"/>
<connect from_op="Apply Model (2)" from_port="output 1" to_op="Python" to_port="labelled data"/>
<connect from_op="Python" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Python</description>
</operator>
<operator activated="true" class="log" compatibility="7.5.001" expanded="true" height="82" name="Log" width="90" x="715" y="30">
<list key="log">
<parameter key="Count" value="operator.Apply Model (2).value.applycount"/>
<parameter key=" Testing Error" value="operator.Cross Validation 2.value.performance 1"/>
<parameter key="Training StdDev" value="operator.Cross Validation 2.value.std deviation 1"/>
<parameter key="nTree" value="operator.nTree.parameter.value"/>
<parameter key="maxDepth" value="operator.maxDepth.parameter.value"/>
<parameter key="minLeafSize" value="operator.minLeafSize.parameter.value"/>
<parameter key="minSizeSplit" value="operator.minSizeSplit.parameter.value"/>
</list>
</operator>
<connect from_port="input 1" to_op="Hyperparameters" to_port="in 1"/>
<connect from_op="Hyperparameters" from_port="out 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Cross Validation 2" to_port="example set"/>
<connect from_op="Cross Validation 2" from_port="performance 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="result 1"/>
<connect from_op="Optimize Parameters (Evolutionary)" from_port="parameter" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0
Answers
I think this isn't a bug because the Evolutionary optimzer uses the genetic parameters to 'randomly' assign values, so you can't take a Grid approach this. Did you try this in a regular Grid optmizer?
If not a bug then it's a missing feature. I've managed to create a workaround which works, but is clearly not the most efficient. Let's move this thread into feature requests.
Edit: realise my process didn't display properly.
As you can see, the workaround uses RM modelling operators to represent the values that I want to change in the Python code. So the feature I'd like is an operator which Optimize Parameters Evolutionary can access allowing values to be set and used by macros.