Rapidminer changes my values...

JorgeJorge Member Posts: 19 Maven
edited November 2018 in Help
Hi,

I'm working with Rapidminer 4.3 in that project

<operator name="Root" class="Process" expanded="yes">
    <operator name="ArffExampleSource" class="ArffExampleSource" breakpoints="after">
        <parameter key="data_file" value="C:\Input.arff"/>
        <parameter key="id_attribute" value="1"/>
        <parameter key="label_attribute" value="example6"/>
    </operator>
    <operator name="InteractiveAttributeWeighting" class="InteractiveAttributeWeighting">
    </operator>
    <operator name="Learn" class="OperatorChain" expanded="yes">
        <operator name="W-NaiveBayesUpdateable" class="W-NaiveBayesUpdateable">
        </operator>
        <operator name="ModelWriter" class="ModelWriter">
            <parameter key="model_file" value="C:\model.mod"/>
            <parameter key="output_type" value="XML"/>
        </operator>
    </operator>
    <operator name="ArffExampleSource (2)" class="ArffExampleSource" breakpoints="after">
        <parameter key="data_file" value="C:\Prediction.arff"/>
        <parameter key="id_attribute" value="1"/>
        <parameter key="label_attribute" value="example6"/>
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file" value="C:\model.mod"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
    </operator>
</operator>
with input.arff...

@RELATION Input

@ATTRIBUTE Id numeric
@ATTRIBUTE example1 string
@ATTRIBUTE example2 string
@ATTRIBUTE example3 string
@ATTRIBUTE example4 string
@ATTRIBUTE example5 string
@ATTRIBUTE example6 string

@DATA
'1','ex1','hello4','hw1','false','1000k','slow'
'2','ex1','hello6','hw2','true','4000k','slow'
'3','ex1','hello2','hw3','false','500k','slow'
'4','ex1','hello3','hw3','true','2000k','slow'
'5','ex2','hello2','hw2','true','500k','slow'
'6','ex2','hello5','hw1','true','1000k','mid'
'7','ex2','hello2','hw3','false','4000k','fast'
'8','ex3','hello','hw1','true','2000k','mid'
'9','ex3','hello','hw2','true','4000k','fast'
'10','ex3','hello','hw3','false','2000k','slow'
'11','ex3','hello','hw1','false','500k','mid'
and prediction.arff.....

@RELATION Prediction

@ATTRIBUTE Id numeric
@ATTRIBUTE example1 string
@ATTRIBUTE example2 string
@ATTRIBUTE example3 string
@ATTRIBUTE example4 string
@ATTRIBUTE example5 string


@DATA
'100','ex1','hello','hw1','false','1000k'
'101','ex1','hello2','hw2','true','4000k'
'102','ex1','hello','hw2','true','4000k'
'103','ex1','hello2','hw3','true','500k'
'104','ex1','hello','hw2','true','2000k'
'105','ex1','hello2','hw1','true','4000k'
'106','ex2','hello3','hw1','false','500k'
'107','ex3','hello3','hw2','true','4000k'
'108','ex3','hello4','hw3','true','500k'
'109','ex3','hello5','hw3','false','500k'
'110','ex3','hello6','hw2','true','500k'
'111','ex3','hello2','hw1','false','500k'
'112','ex3','hello6','hw1','true','500k'
when I execute the program, at the results, I click on "Data View" of the "Data Table" and the values of the colum "example1" are differents of the prediction.arff example1 attribute.

Anyone can help me?
Is only a print error, or affects too in the learning operator?

Thanks in advance.

Cheers,
Jorge

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello Jorge

    I got this warning message:

    [Warning] W-NaiveBayesUpdateable: The internal nominal mappings are not the same between training and application for attribute 'example2'. This will probably lead to wrong results during model application.
    RM stores a mapping for nominal values which somehow affects the models. I suggest as workaround:
    -> Load both files, add an attribute marking it as train /prediction (AttributeConstruction and ChangeAttributeRole)
    -> Merge (ExampleSetMerge)
    -> save as exampleset

    now you can perform your posted process either by loading the set twice and apply ExampleFilter or by using a combination of ExampleFilter and IOMultiplier

    hope this was helpful

    regards,

    Steffen
  • JorgeJorge Member Posts: 19 Maven
    Thanks a lot steffen

    It works perfectly now  :)
  • pathrospathros Member Posts: 9 Contributor II
    Steffen. I got the same problem but in rapidminer 5.0. I apply a model gotten from the "optimize selection evolutionary" process and i get the same
    warnings:
    " WARNING: SimpleDistribution: The internal nominal mappings are not the same between training and application for attribute 'carrera'. This will probably lead to wrong results during model application."
    and the results in the prediction are not the same as those resulted in the split validation which tells me that this warning does lead to wrong results.

    but i don't find the same operators where you say:
    RM stores a mapping for nominal values which somehow affects the models. I suggest as workaround:
    -> Load both files, add an attribute marking it as train /prediction (AttributeConstruction and ChangeAttributeRole)
    -> Merge (ExampleSetMerge)
    -> save as exampleset

    how can i do the latter in rapidminer 5.0?


    i do it without the optimizer: my XML looks like this

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
        <process expanded="true" height="528" width="619">
          <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="vm_socdem_e_Xchanged" width="90" x="45" y="120">
            <parameter key="repository_entry" value="vm_socdem_e_Xchanged"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="SET ID" width="90" x="160" y="127">
            <parameter key="name" value="cuenta"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role" width="90" x="281" y="136">
            <parameter key="name" value="aprob_c"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes (2)" width="90" x="380" y="30">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="turno|raz_elec|no_unam|ingr_fi|esc_m|edad|carrera|a_ov|sost_ec|alg|geo_e|geo_a|qui|elec|X_sec|X_bach|bach|ENP|transp|dur_bach|trastes|refri|c_agua|tv_cable|horno_m|cel|inter|comp|auto_p|p_serv"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.0.10" expanded="true" height="76" name="Naive Bayes" width="90" x="447" y="165"/>
          <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="vm_socdem_e_Xchanged_prueba" width="90" x="45" y="300">
            <parameter key="repository_entry" value="vm_socdem_e_Xchanged_prueba"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="vm_socdem_prueba" width="90" x="112" y="435">
            <parameter key="name" value="cuenta"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="300">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="carrera|no_unam|edad|turno|raz_elec|ingr_fi|esc_m|a_ov|sost_ec|alg|geo_a|geo_e|elec|qui|X_sec|X_bach|bach|ENP|dur_bach|transp|refri|trastes|c_agua|cel|tv_cable|horno_m|comp|inter|auto_p|p_serv"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model" width="90" x="492" y="288">
            <list key="application_parameters"/>
            <parameter key="create_view" value="true"/>
          </operator>
          <connect from_op="vm_socdem_e_Xchanged" from_port="output" to_op="SET ID" to_port="example set input"/>
          <connect from_op="SET ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="vm_socdem_e_Xchanged_prueba" from_port="output" to_op="vm_socdem_prueba" to_port="example set input"/>
          <connect from_op="vm_socdem_prueba" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="216"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the merge operator is now called append.

    Greetings,
      Sebastian
Sign In or Register to comment.