Write Model

marcelmarcel Member Posts: 8 Contributor I
edited November 2018 in Help
Hi,

I have a problem by saving a Neural Net Model using the "Write Model" operator.

Rapid Miner stops with a severe error but there is no further information about the problem. I tried it with all file formats.

Process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="open_file" compatibility="5.3.015" expanded="true" height="60" name="Open File (2)" width="90" x="45" y="255">
        <parameter key="filename" value="/home/ubuntu/test.csv"/>
      </operator>
      <operator activated="true" class="read_csv" compatibility="5.3.015" expanded="true" height="60" name="Read CSV (2)" width="90" x="45" y="165">
        <parameter key="csv_file" value="/home/ubuntu/test.csv"/>
        <parameter key="trim_lines" value="true"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="remove_useless_attributes" compatibility="5.3.015" expanded="true" height="76" name="Remove Useless Attributes" width="90" x="45" y="30"/>
      <operator activated="true" class="remove_correlated_attributes" compatibility="5.3.015" expanded="true" height="76" name="Remove Correlated Attributes" width="90" x="179" y="30"/>
      <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="210">
        <parameter key="condition_class" value="missing_attributes"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.015" expanded="true" height="76" name="Set Role (2)" width="90" x="447" y="255">
        <parameter key="attribute_name" value="label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="380" y="120">
        <parameter key="condition_class" value="no_missing_attributes"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.3.015" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
        <parameter key="attribute_name" value="label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="neural_net" compatibility="5.3.015" expanded="true" height="76" name="Neural Net" width="90" x="581" y="75">
        <list key="hidden_layers"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="715" y="120">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="write_model" compatibility="5.3.015" expanded="true" height="60" name="Write Model" width="90" x="782" y="255">
        <parameter key="model_file" value="/home/ubuntu/models/model.xml"/>
        <parameter key="output_type" value="XML"/>
      </operator>
      <operator activated="false" class="weka:W-SMO" compatibility="5.3.001" expanded="true" height="76" name="W-SMO" width="90" x="179" y="390"/>
      <connect from_op="Open File (2)" from_port="file" to_op="Read CSV (2)" to_port="file"/>
      <connect from_op="Read CSV (2)" from_port="output" to_op="Remove Useless Attributes" to_port="example set input"/>
      <connect from_op="Remove Useless Attributes" from_port="example set output" to_op="Remove Correlated Attributes" to_port="example set input"/>
      <connect from_op="Remove Correlated Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Neural Net" to_port="training set"/>
      <connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • marcelmarcel Member Posts: 8 Contributor I
    After extensive research I think that the number of attributes is too high.

    In my example there are more than 1000 attributes left as input for the Neural Net learner. The Neural Net computes a correct model, but the Write Model operator is not able to save the model, no matter what format I use (XML oder binary). All attempts with the operators "Store" and "Write" failed, too.

    I think that it should be possible that a data mining software is able to handle more than 1000 attributes. It is unreasonable to run the Neural Net operator every time for 30-60 minutes to create a model for new test data.

    Unfortunately there is a variety of strange errors in Rapid Miner that make the program unsuitable for practical use.
  • marcelmarcel Member Posts: 8 Contributor I
    When I reduce the number of attributes to a maximum of 700 I can save my Neural Net model.

    But unfortunately the problem still exists when reading the previously saved model to classify new test data again (see code):
    Error: "Process failed" (with no further information)

    Another interesting fact: it depends on the test data. Sometimes the Process failed, sometimes not.
    The operator "Read Model" reading the training data throws an error, but the problem lies inside the test data!

    This is annoying. I have invested a lot in Rapid Miner, only to find that the software is unusable for large data sets.
    Somehow no one here seems to have an idea.

    Process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop_files" compatibility="5.3.015" expanded="true" height="76" name="Loop Files" width="90" x="313" y="75">
            <parameter key="directory" value="/home/marcel/models"/>
            <process expanded="true">
              <operator activated="true" class="open_file" compatibility="5.3.015" expanded="true" height="60" name="Open File (2)" width="90" x="45" y="390">
                <parameter key="filename" value="/home/marcel/minercluster.csv"/>
              </operator>
              <operator activated="true" class="read_csv" compatibility="5.3.015" expanded="true" height="60" name="Read CSV (2)" width="90" x="45" y="255">
                <parameter key="csv_file" value="/home/marcel/Projekte/minercluster.csv"/>
                <parameter key="trim_lines" value="true"/>
                <parameter key="use_quotes" value="false"/>
                <parameter key="first_row_as_names" value="false"/>
                <list key="annotations">
                  <parameter key="0" value="Name"/>
                </list>
                <parameter key="encoding" value="UTF-8"/>
                <list key="data_set_meta_data_information"/>
              </operator>
              <operator activated="true" class="remove_useless_attributes" compatibility="5.3.015" expanded="true" height="76" name="Remove Useless Attributes" width="90" x="112" y="120"/>
              <operator activated="true" class="read_model" compatibility="5.3.015" expanded="true" height="60" name="Read Model" width="90" x="112" y="30">
                <parameter key="model_file" value="%{file_path}"/>
              </operator>
              <operator activated="true" class="remove_correlated_attributes" compatibility="5.3.015" expanded="true" height="76" name="Remove Correlated Attributes" width="90" x="246" y="120">
                <parameter key="attribute_order" value="random"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="380" y="255">
                <parameter key="condition_class" value="missing_attributes"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.3.015" expanded="true" height="76" name="Set Role (2)" width="90" x="514" y="255">
                <parameter key="attribute_name" value="label"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles">
                  <parameter key="id" value="id"/>
                </list>
              </operator>
              <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes" width="90" x="715" y="120">
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="regular_expression" value="predicted"/>
              </operator>
              <operator activated="true" class="write_csv" compatibility="5.3.015" expanded="true" height="76" name="Write CSV" width="90" x="782" y="255">
                <parameter key="csv_file" value="/home/marcel/a_%{file_name}.csv"/>
              </operator>
              <connect from_op="Open File (2)" from_port="file" to_op="Read CSV (2)" to_port="file"/>
              <connect from_op="Read CSV (2)" from_port="output" to_op="Remove Useless Attributes" to_port="example set input"/>
              <connect from_op="Remove Useless Attributes" from_port="example set output" to_op="Remove Correlated Attributes" to_port="example set input"/>
              <connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
              <connect from_op="Remove Correlated Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
              <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
              <connect from_op="Write CSV" from_port="through" to_port="out 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Loop Files" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,908   RM Engineering
    Edit: Sorry, I was mistaken. See below.
  • marcelmarcel Member Posts: 8 Contributor I
    Hi Marco,
    I used the "Store" and "Retrieve" operators, too.
    But the problem is the same.
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,908   RM Engineering
    Hi,

    that is indeed true :(
    I have updated the issue.

    There is one work-around available: You have to manually increase the Java stack-size. You can do so by providing the JVM parameter "-Xss16m" as an argument when starting RapidMiner from the RapidMinerGUI.sh/RapidMinerGUI.bat file found in the RapidMiner/scripts folder. You cannot use the RapidMiner.exe in that case.

    FYI, the 16m stands for 16 MB for each stack, you could potentially increase that even more in case that is not enough but be aware that this will eat your memory for breakfast.

    Regards,
    Marco
  • marcelmarcel Member Posts: 8 Contributor I
    Thanks, works fine!

    Regards,
    Marcel
Sign In or Register to comment.