Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Using mod output to isolate variables
I am not sure I am using the correct terms here, I will try to be descriptive.
I want to isolate (or control for) selected variables. That is, from a dataset (x,y,z,result) to create a model and then plot (x,result) considering y,z to be fixed. The final plot will be extrapolated to a wider range of x values.
I am at the point of having the model created (I used linear regression for testing purposes). The remaining steps, which I can't find out how to perform, are to create the appropriate dataset and apply the model.
Is there a data generator suitable for that or should I manually create the tables like (1,0,0),(2,0,0),(3,0,0),(4,0,0)... ?
After data is entered, or generated, how do I use the "mod" output to predict the result value?
Finally, am I using a totally wrong approach for the task I am trying to achieve? Is there a better way to visualize that kind of dependancy than this one?
Thank you all in advance.
I want to isolate (or control for) selected variables. That is, from a dataset (x,y,z,result) to create a model and then plot (x,result) considering y,z to be fixed. The final plot will be extrapolated to a wider range of x values.
I am at the point of having the model created (I used linear regression for testing purposes). The remaining steps, which I can't find out how to perform, are to create the appropriate dataset and apply the model.
Is there a data generator suitable for that or should I manually create the tables like (1,0,0),(2,0,0),(3,0,0),(4,0,0)... ?
After data is entered, or generated, how do I use the "mod" output to predict the result value?
Finally, am I using a totally wrong approach for the task I am trying to achieve? Is there a better way to visualize that kind of dependancy than this one?
Thank you all in advance.
0
Answers
Can you post a few rows of example data?
Best regards,
Wessel
I mostly need to visualize mes=f(res), for a given set (t,r).
Physical modeling of the problem says I should expect mes=a*res^2+b*res+c, but given the effect t and the fact that the order of magnitude is so different it is quite difficult. The values of a and b are not independent of t and r.
I thought of getting a number of examples, large enough to have a lot of cases with the same t,r but that seems impossible.
I'm sure I'm missing something.
I generated an attribute res^2 and ran linear regression.
And then after made a scatter plot.
I used 0,1-normalization to make it all fit.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
<process expanded="true" height="409" width="840">
<operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve" width="90" x="153" y="146">
<parameter key="repository_entry" value="//RS/A"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.1.008" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
<parameter key="name" value="mes"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.1.008" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
<list key="function_descriptions">
<parameter key="res^2" value="res^2"/>
</list>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.1.008" expanded="true" height="94" name="Linear Regression" width="90" x="447" y="30">
<parameter key="feature_selection" value="none"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="581" y="120">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.1.008" expanded="true" height="94" name="Normalize" width="90" x="715" y="30">
<parameter key="include_special_attributes" value="true"/>
<parameter key="method" value="range transformation"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Linear Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Normalize" to_port="example set input"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<connect from_op="Normalize" from_port="example set output" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="162"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
The problem is that with mes=f(res)=a*res^2 + b*res + c the values of a,b,c are not independent of t,r.
Doing that kind of regression would only acounf for c=g(t,r).
What I want to do is find a and b, given the values of t,r.
To put it in a proper form, the function is:
mes(res, t, r) = a(t,r)*res^2 + b(t,r)*res + c(t,r)
The quest is to find a(t,r), b(t,r) and c(t,r).
This does not seem like a problem suitable for Rapid Miner.
You can use a fuzzy neural network, or a genetic algorithm to solve this problem.
But you will have to write your own Java code.
Best regards,
Wessel
A visual representation of mes vs res for 4-5 pairs of (r,t) would be enough.
Still, is it reasonable to ask for enough data values with the same t and r? Gathering 100 examples for each pair will take around 8 months.
And then I could train the model 5 times and get the required 5 values for a,b,c.
I was hoping I could find a workaround and work with randomly collected values but it seems quite difficult.