Traditional approaches for symbolic regression often suffered from a phenomenon called feature bloat which is why they are hardly used any longer today. They have been replaced by a combination of linear regression (for assigning coefficients) with automatic feature generation approaches. In RapidMiner you would use a combination of the operators Generalized Linear Models with Automatic Feature Engineering for this. The multi-objective optimization approach keeps the feature bloat in check and therefore reduces the risk for overfitting. I have attached a small demo process below.
I gave a presentation in London last week which also covered this to some degree. For this discussion I used similar data to the one in the example process mentioned above. I attached a couple of relevant slides showing a simple linear regression model, a decision tree model, a GBT model, and a model consisting of linear regression combined with automatic feature engineering. Like in symbolic regression, the resulting formula can be easily seen (in this case it was prediction(y) = 10,550
* |x| + 7,565
* x * |x|2 + 705 /
|x| + 17,394.
mznMember, University ProfessorPosts: 10 University Professor
Thanks a lot Ingo. I am interested in the following: 1. I have a set of data points (x1, x2, x3...) with a corresponding output (y1) 2. I need to derive a relation (in the form of an equation) that links x1, x2, x3 to y1 such that I can predict the output for any inputs variables. 3. Can I do this in RM? If yes, is there a simple example I/my graduate students can follow? 4. Your youtube videos are very helpful! Thanks!
0
IngoRMAdministrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts: 1,751 RM Founder
The process above is a cool example, but maybe not simple enough. Pretty much machine learning models in RapidMiner can be used for this task, but maybe I would go with a simple linear regression first. The process below shows a simple example for this. If you use the Model Simulator like I do in this example, the students can even play around with some of the inputs and see how the model reacts. You can see the Simulator in this video (around minute 6:40): https://academy.rapidminer.com/learn/video/auto-model-classification
Answers
Ingo
1. I have a set of data points (x1, x2, x3...) with a corresponding output (y1)
2. I need to derive a relation (in the form of an equation) that links x1, x2, x3 to y1 such that I can predict the output for any inputs variables.
3. Can I do this in RM? If yes, is there a simple example I/my graduate students can follow?
4. Your youtube videos are very helpful! Thanks!
Ingo
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="generate_data" compatibility="9.2.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"><br> <parameter key="target_function" value="sum"/><br> <parameter key="number_examples" value="1000"/><br> <parameter key="number_of_attributes" value="5"/><br> <parameter key="attributes_lower_bound" value="-10.0"/><br> <parameter key="attributes_upper_bound" value="10.0"/><br> <parameter key="gaussian_standard_deviation" value="10.0"/><br> <parameter key="largest_radius" value="10.0"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> </operator><br> <operator activated="true" class="add_noise" compatibility="9.2.000" expanded="true" height="103" name="Add Noise" width="90" x="179" y="34"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="random_attributes" value="5"/><br> <parameter key="label_noise" value="0.05"/><br> <parameter key="default_attribute_noise" value="0.0"/><br> <list key="noise"/><br> <parameter key="offset" value="0.0"/><br> <parameter key="linear_factor" value="1.0"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> </operator><br> <operator activated="true" class="split_data" compatibility="9.2.000" expanded="true" height="103" name="Split Data" width="90" x="313" y="187"><br> <enumeration key="partitions"><br> <parameter key="ratio" value="0.7"/><br> <parameter key="ratio" value="0.3"/><br> </enumeration><br> <parameter key="sampling_type" value="automatic"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> </operator><br> <operator activated="true" class="linear_regression" compatibility="9.2.000" expanded="true" height="103" name="Linear Regression" width="90" x="447" y="34"><br> <parameter key="feature_selection" value="none"/><br> <parameter key="alpha" value="0.05"/><br> <parameter key="max_iterations" value="10"/><br> <parameter key="forward_alpha" value="0.05"/><br> <parameter key="backward_alpha" value="0.05"/><br> <parameter key="eliminate_colinear_features" value="true"/><br> <parameter key="min_tolerance" value="0.05"/><br> <parameter key="use_bias" value="true"/><br> <parameter key="ridge" value="1.0E-8"/><br> </operator><br> <operator activated="true" class="apply_model" compatibility="9.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="238"><br> <list key="application_parameters"/><br> <parameter key="create_view" value="false"/><br> </operator><br> <operator activated="true" class="model_simulator:model_simulator" compatibility="9.2.000" expanded="true" height="103" name="Model Simulator" width="90" x="782" y="136"/><br> <connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/><br> <connect from_op="Add Noise" from_port="example set output" to_op="Split Data" to_port="example set"/><br> <connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/><br> <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/><br> <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/><br> <connect from_op="Linear Regression" from_port="exampleSet" to_op="Model Simulator" to_port="training data"/><br> <connect from_op="Apply Model" from_port="labelled data" to_op="Model Simulator" to_port="test data"/><br> <connect from_op="Apply Model" from_port="model" to_op="Model Simulator" to_port="model"/><br> <connect from_op="Model Simulator" from_port="simulator output" to_port="result 1"/><br> <connect from_op="Model Simulator" from_port="model output" to_port="result 2"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="105"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> <portSpacing port="sink_result 3" spacing="0"/><br> </process><br> </operator><br></process>