Parameter classification in Material Science
Hello,
My task is related to object classification in the field of material science. The program RapidMiner is new to me, wherefore I want to draw on the broad knowledge of the community. I would be happy for response on m problem.
My problem (pre work excluded):
I have built a Excel list in the following way.
- First row: Object ID (running number) – approx. 4500
- Following rows: Object parameters (e.g. Area, Perimeter, …) – approx. 26
- Last row: belonging/label
In the first place I had for each label/class (total of 12) an own Excel list. I put all Data from those in one Excel list.
My Goals:
- To find a classification method (e.g. SVM) for training the problem in order to apply the model on unknown/not classified objects for getting their belonging.
- To find out which of the parameters are of interest for the model (optimize selection)
For now, I imported the Master Excel file with all objects in RapidMinder (Version 5.3) as followed:
- Object ID (running number): Integer; ID
- Object parameters: Real, Attributes
- Class (total of 12): Text; Label
From own research I started as follows (code can be found further down):
- Main Process
o Retrieve Data Excel file Repository
o Optimize selection
- Evalution Process
o Validation
- Training
o SVM Linear
- Testing
o Apply Model
o Performence
Is my approach correct? How would you build up the process structure in order to solve the problem?
If more information is needed I will provide it.
Thanks to any help and response.
My task is related to object classification in the field of material science. The program RapidMiner is new to me, wherefore I want to draw on the broad knowledge of the community. I would be happy for response on m problem.
My problem (pre work excluded):
I have built a Excel list in the following way.
- First row: Object ID (running number) – approx. 4500
- Following rows: Object parameters (e.g. Area, Perimeter, …) – approx. 26
- Last row: belonging/label
In the first place I had for each label/class (total of 12) an own Excel list. I put all Data from those in one Excel list.
My Goals:
- To find a classification method (e.g. SVM) for training the problem in order to apply the model on unknown/not classified objects for getting their belonging.
- To find out which of the parameters are of interest for the model (optimize selection)
For now, I imported the Master Excel file with all objects in RapidMinder (Version 5.3) as followed:
- Object ID (running number): Integer; ID
- Object parameters: Real, Attributes
- Class (total of 12): Text; Label
From own research I started as follows (code can be found further down):
- Main Process
o Retrieve Data Excel file Repository
o Optimize selection
- Evalution Process
o Validation
- Training
o SVM Linear
- Testing
o Apply Model
o Performence
Is my approach correct? How would you build up the process structure in order to solve the problem?
If more information is needed I will provide it.
Thanks to any help and response.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve MasterExcel" width="90" x="45" y="75">
<parameter key="repository_entry" value="../MasterExcel/MasterExcel"/>
</operator>
<operator activated="true" class="optimize_selection_evolutionary" compatibility="5.3.015" expanded="true" height="94" name="Optimize Selection (Evolutionary)" width="90" x="246" y="75">
<process expanded="true">
<operator activated="true" class="x_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
<process expanded="true">
<operator activated="true" class="support_vector_machine_linear" compatibility="5.3.015" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
<connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
<connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="example set" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve MasterExcel" from_port="output" to_op="Optimize Selection (Evolutionary)" to_port="example set in"/>
<connect from_op="Optimize Selection (Evolutionary)" from_port="weights" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0