Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Validate data using historical data

aigutiaiguti Member Posts: 1 Learner III
edited November 2018 in Help
Dear felows,

I have one historical data of part weight (peso bruto) and I would like to validate if one sample of parts is within the expected value (peso).
I do not know what is the right Model to be used. I tried LDA, Naive Bayes and others but it did not work.

here is the XML

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\aiguti\Documents\kdd\peso - training.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Ecode.true.polynominal.attribute"/>
          <parameter key="1" value="Peso Bruto.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="Read CSV (2)" width="90" x="45" y="255">
        <parameter key="csv_file" value="C:\Users\aiguti\Documents\kdd\peso - scoring.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="ecode.true.polynominal.attribute"/>
          <parameter key="1" value="Peso.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
        <parameter key="name" value="Peso Bruto"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="vector_linear_regression" compatibility="5.3.005" expanded="true" height="76" name="Vector Linear Regression" width="90" x="380" y="30"/>
      <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="648" y="30">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Read CSV (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Vector Linear Regression" to_port="training set"/>
      <connect from_op="Vector Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Thank you

Answers

  • homburghomburg Employee, Member Posts: 114 RM Data Scientist
    Hi aiguti,

    with your process you train a model using a dataset called peso-training and later apply it to peso-scoring. So far this looks like a typical holdout strategy, you only need to add a "Performance" operator to compute some performance values. In order to recommend a suitable learner it maybe helpful if you could tell me more about your data and what exactly you want to achieve.

    Cheers,
    Helge
Sign In or Register to comment.