Parameter classification in Material Science

nihaniha Member Posts: 4 Contributor I
edited August 2019 in Help

My task is related to object classification in the field of material science. The program RapidMiner is new to me, wherefore I want to draw on the broad knowledge of the community. I would be happy for response on m problem.

My problem (pre work excluded):
I have built a Excel list in the following way.

- First row: Object ID (running number) – approx. 4500
- Following rows: Object parameters (e.g. Area, Perimeter, …) – approx. 26
- Last row: belonging/label

In the first place I had for each label/class (total of 12) an own Excel list. I put all Data from those in one Excel list.

My Goals:
- To find a classification method (e.g. SVM) for training the problem in order to apply the model on unknown/not classified objects for getting their belonging.
- To find out which of the parameters are of interest for the model (optimize selection)

For now, I imported the Master Excel file with all objects in RapidMinder (Version 5.3) as followed:
- Object ID (running number): Integer; ID
- Object parameters: Real, Attributes
- Class (total of 12): Text; Label

From own research I started as follows (code can be found further down):
- Main Process
  o Retrieve Data  Excel file Repository
  o Optimize selection
- Evalution Process
  o Validation
- Training
  o SVM Linear
- Testing
  o Apply Model
  o Performence

Is my approach correct? How would you build up the process structure in order to solve the problem?
If more information is needed I will provide it.

Thanks to any help and response.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
  <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve MasterExcel" width="90" x="45" y="75">
        <parameter key="repository_entry" value="../MasterExcel/MasterExcel"/>
      <operator activated="true" class="optimize_selection_evolutionary" compatibility="5.3.015" expanded="true" height="94" name="Optimize Selection (Evolutionary)" width="90" x="246" y="75">
        <process expanded="true">
          <operator activated="true" class="x_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.015" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              <operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
          <connect from_port="example set" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
      <connect from_op="Retrieve MasterExcel" from_port="output" to_op="Optimize Selection (Evolutionary)" to_port="example set in"/>
      <connect from_op="Optimize Selection (Evolutionary)" from_port="weights" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>

Sign In or Register to comment.