RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Validate data using historical data

aigutiaiguti Member Posts: 1 Contributor I
edited November 2018 in Help
Dear felows,

I have one historical data of part weight (peso bruto) and I would like to validate if one sample of parts is within the expected value (peso).
I do not know what is the right Model to be used. I tried LDA, Naive Bayes and others but it did not work.

here is the XML

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\aiguti\Documents\kdd\peso - training.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Ecode.true.polynominal.attribute"/>
          <parameter key="1" value="Peso Bruto.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="Read CSV (2)" width="90" x="45" y="255">
        <parameter key="csv_file" value="C:\Users\aiguti\Documents\kdd\peso - scoring.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="ecode.true.polynominal.attribute"/>
          <parameter key="1" value="Peso.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
        <parameter key="name" value="Peso Bruto"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="vector_linear_regression" compatibility="5.3.005" expanded="true" height="76" name="Vector Linear Regression" width="90" x="380" y="30"/>
      <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="648" y="30">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Read CSV (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Vector Linear Regression" to_port="training set"/>
      <connect from_op="Vector Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Thank you

Answers

  • homburghomburg Moderator, Employee, Member Posts: 114  RM Data Scientist
    Hi aiguti,

    with your process you train a model using a dataset called peso-training and later apply it to peso-scoring. So far this looks like a typical holdout strategy, you only need to add a "Performance" operator to compute some performance values. In order to recommend a suitable learner it maybe helpful if you could tell me more about your data and what exactly you want to achieve.

    Cheers,
    Helge
Sign In or Register to comment.