Validate data using historical data

aiguti · July 2014

Dear felows,

I have one historical data of part weight (peso bruto) and I would like to validate if one sample of parts is within the expected value (peso).
I do not know what is the right Model to be used. I tried LDA, Naive Bayes and others but it did not work.

here is the XML


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\aiguti\Documents\kdd\peso - training.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Ecode.true.polynominal.attribute"/>
          <parameter key="1" value="Peso Bruto.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="Read CSV (2)" width="90" x="45" y="255">
        <parameter key="csv_file" value="C:\Users\aiguti\Documents\kdd\peso - scoring.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="ecode.true.polynominal.attribute"/>
          <parameter key="1" value="Peso.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
        <parameter key="name" value="Peso Bruto"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="vector_linear_regression" compatibility="5.3.005" expanded="true" height="76" name="Vector Linear Regression" width="90" x="380" y="30"/>
      <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="648" y="30">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Read CSV (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Vector Linear Regression" to_port="training set"/>
      <connect from_op="Vector Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Thank you

homburg · August 2014

Hi aiguti,

with your process you train a model using a dataset called peso-training and later apply it to peso-scoring. So far this looks like a typical holdout strategy, you only need to add a "Performance" operator to compute some performance values. In order to recommend a suitable learner it maybe helpful if you could tell me more about your data and what exactly you want to achieve.

Cheers,
Helge

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Validate data using historical data

Answers