Parameter classification in Material Science

Contributor II

Parameter classification in Material Science


My task is related to object classification in the field of material science. The program RapidMiner is new to me, wherefore I want to draw on the broad knowledge of the community. I would be happy for response on m problem.

My problem (pre work excluded):
I have built a Excel list in the following way.

- First row: Object ID (running number) – approx. 4500
- Following rows: Object parameters (e.g. Area, Perimeter, …) – approx. 26
- Last row: belonging/label

In the first place I had for each label/class (total of 12) an own Excel list. I put all Data from those in one Excel list.

My Goals:
- To find a classification method (e.g. SVM) for training the problem in order to apply the model on unknown/not classified objects for getting their belonging.
- To find out which of the parameters are of interest for the model (optimize selection)

For now, I imported the Master Excel file with all objects in RapidMinder (Version 5.3) as followed:
- Object ID (running number): Integer; ID
- Object parameters: Real, Attributes
- Class (total of 12): Text; Label

From own research I started as follows (code can be found further down):
- Main Process
  o Retrieve Data  Excel file Repository
  o Optimize selection
- Evalution Process
  o Validation
- Training
  o SVM Linear
- Testing
  o Apply Model
  o Performence

Is my approach correct? How would you build up the process structure in order to solve the problem?
If more information is needed I will provide it.

Thanks to any help and response.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
  <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve MasterExcel" width="90" x="45" y="75">
        <parameter key="repository_entry" value="../MasterExcel/MasterExcel"/>
      <operator activated="true" class="optimize_selection_evolutionary" compatibility="5.3.015" expanded="true" height="94" name="Optimize Selection (Evolutionary)" width="90" x="246" y="75">
        <process expanded="true">
          <operator activated="true" class="x_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.015" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              <operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
          <connect from_port="example set" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
      <connect from_op="Retrieve MasterExcel" from_port="output" to_op="Optimize Selection (Evolutionary)" to_port="example set in"/>
      <connect from_op="Optimize Selection (Evolutionary)" from_port="weights" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>