Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Validation process hangs up

qwertzqwertz Member Posts: 130 Contributor II
edited November 2018 in Help

Dear all,

I created a simple validation process which is intended to be run several times in order to examine the optimal number of training cycles. (--> run with 10 cycles, run with 20 cycles, run with 200 cycles and compare performances)

With 10 training cycles of the neural net this process works fine. But once set to 200 the process hangs up. It keeps running and running and running. Finally, the "send bug report" dialog appears.

To determine the rootcause I tried to modify several things:
- set training cycles to a lower number --> works at very small numbers of training cycles only
- additionally, I tried to reveal if this behaviour is related to a certain column or value. Therefore, I deleted some columns and ran the process (--> works). Then I took the former deleted columns only and ran the process (--> works). Same observation with selected rows. So obviously the error is not related to the data values itself. But when taking the original file it doesn't work. --> overall result: works sometimes
- take the generate data sample with the same number of rows and columns instead of my excel sheet --> works

To reproduce this error I uploaded my excel sample file here

Any kind of help appreciated...  :-\


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="447" width="622">
      <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
        <parameter key="excel_file" value="C:\sample.xls"/>
        <parameter key="imported_cell_range" value="A1:AR74"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        <parameter key="date_format" value="dd.MM.yyyy"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="a.true.real.attribute"/>
          <parameter key="1" value="b.true.real.attribute"/>
          <parameter key="2" value="c.true.real.attribute"/>
          <parameter key="3" value="d.true.real.attribute"/>
          <parameter key="4" value="e.true.real.attribute"/>
          <parameter key="5" value="f.true.real.attribute"/>
          <parameter key="6" value="g.true.real.attribute"/>
          <parameter key="7" value="h.true.real.attribute"/>
          <parameter key="8" value="i.true.real.attribute"/>
          <parameter key="9" value="j.true.real.attribute"/>
          <parameter key="10" value="k.true.real.attribute"/>
          <parameter key="11" value="l.true.real.attribute"/>
          <parameter key="12" value="m.true.real.attribute"/>
          <parameter key="13" value="n.true.real.attribute"/>
          <parameter key="14" value="o.true.real.attribute"/>
          <parameter key="15" value="p.true.real.attribute"/>
          <parameter key="16" value="q.true.real.attribute"/>
          <parameter key="17" value="r.true.real.attribute"/>
          <parameter key="18" value="s.true.real.attribute"/>
          <parameter key="19" value="t.true.real.attribute"/>
          <parameter key="20" value="u.true.real.attribute"/>
          <parameter key="21" value="v.true.real.attribute"/>
          <parameter key="22" value="w.true.real.attribute"/>
          <parameter key="23" value="x.true.real.attribute"/>
          <parameter key="24" value="y.true.real.attribute"/>
          <parameter key="25" value="z.true.real.attribute"/>
          <parameter key="26" value="aa.true.real.attribute"/>
          <parameter key="27" value="bb.true.real.attribute"/>
          <parameter key="28" value="cc.true.real.attribute"/>
          <parameter key="29" value="dd.true.real.attribute"/>
          <parameter key="30" value="ee.true.real.attribute"/>
          <parameter key="31" value="ff.true.real.attribute"/>
          <parameter key="32" value="gg.true.real.attribute"/>
          <parameter key="33" value="hh.true.real.attribute"/>
          <parameter key="34" value="ii.true.real.attribute"/>
          <parameter key="35" value="jj.true.real.attribute"/>
          <parameter key="36" value="kk.true.real.attribute"/>
          <parameter key="37" value="ll.true.real.attribute"/>
          <parameter key="38" value="mm.true.real.attribute"/>
          <parameter key="39" value="nn.true.real.attribute"/>
          <parameter key="40" value="oo.true.real.attribute"/>
          <parameter key="41" value="pp.true.real.attribute"/>
          <parameter key="42" value="label.true.real.label"/>
          <parameter key="43" value="ID.true.real.id"/>
      <operator activated="false" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_examples" value="74"/>
        <parameter key="number_of_attributes" value="44"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="150.0"/>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="246" y="30">
        <parameter key="training_window_width" value="20"/>
        <parameter key="training_window_step_size" value="10"/>
        <parameter key="test_window_width" value="20"/>
        <process expanded="true" height="465" width="295">
          <operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" height="76" name="Neural Net" width="90" x="102" y="30">
            <list key="hidden_layers"/>
            <parameter key="training_cycles" value="200"/>
          <connect from_port="training" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Neural Net" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        <process expanded="true" height="465" width="295">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          <operator activated="true" class="series:forecasting_performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="170" y="30">
            <parameter key="horizon" value="1"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
      <connect from_op="Read Excel" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>


  • qwertzqwertz Member Posts: 130 Contributor II

    Moreover, I found that when pushing the stop button the process still keeps running and CPU usage remains high...
  • haddockhaddock Member Posts: 849 Maven

    I looked at your data, some columns with missing values, one consists only of missing values! If you take them out the problem disappears. This glitch has been reported already, here http://rapid-i.com/rapidforum/index.php/topic,5104.msg18296.html#msg18296 .
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <process expanded="true" height="447" width="622">
         <operator activated="true" class="read_excel" compatibility="5.2.003" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
           <parameter key="excel_file" value="/home/cjfpainter/Downloads/sample.xls"/>
           <parameter key="imported_cell_range" value="A1:AR74"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           <parameter key="date_format" value="dd.MM.yyyy"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="a.true.real.attribute"/>
             <parameter key="1" value="b.true.real.attribute"/>
             <parameter key="2" value="c.true.real.attribute"/>
             <parameter key="3" value="d.true.real.attribute"/>
             <parameter key="4" value="e.true.real.attribute"/>
             <parameter key="5" value="f.true.real.attribute"/>
             <parameter key="6" value="g.true.real.attribute"/>
             <parameter key="7" value="h.true.real.attribute"/>
             <parameter key="8" value="i.true.real.attribute"/>
             <parameter key="9" value="j.true.real.attribute"/>
             <parameter key="10" value="k.true.real.attribute"/>
             <parameter key="11" value="l.true.real.attribute"/>
             <parameter key="12" value="m.true.real.attribute"/>
             <parameter key="13" value="n.true.real.attribute"/>
             <parameter key="14" value="o.true.real.attribute"/>
             <parameter key="15" value="p.true.real.attribute"/>
             <parameter key="16" value="q.true.real.attribute"/>
             <parameter key="17" value="r.true.real.attribute"/>
             <parameter key="18" value="s.true.real.attribute"/>
             <parameter key="19" value="t.true.real.attribute"/>
             <parameter key="20" value="u.true.real.attribute"/>
             <parameter key="21" value="v.true.real.attribute"/>
             <parameter key="22" value="w.true.real.attribute"/>
             <parameter key="23" value="x.true.real.attribute"/>
             <parameter key="24" value="y.true.real.attribute"/>
             <parameter key="25" value="z.true.real.attribute"/>
             <parameter key="26" value="aa.true.real.attribute"/>
             <parameter key="27" value="bb.true.real.attribute"/>
             <parameter key="28" value="cc.true.real.attribute"/>
             <parameter key="29" value="dd.true.real.attribute"/>
             <parameter key="30" value="ee.true.real.attribute"/>
             <parameter key="31" value="ff.true.real.attribute"/>
             <parameter key="32" value="gg.true.real.attribute"/>
             <parameter key="33" value="hh.true.real.attribute"/>
             <parameter key="34" value="ii.true.real.attribute"/>
             <parameter key="35" value="jj.true.real.attribute"/>
             <parameter key="36" value="kk.true.real.attribute"/>
             <parameter key="37" value="ll.true.real.attribute"/>
             <parameter key="38" value="mm.true.real.attribute"/>
             <parameter key="39" value="nn.true.real.attribute"/>
             <parameter key="40" value="oo.true.real.attribute"/>
             <parameter key="41" value="pp.true.real.attribute"/>
             <parameter key="42" value="label.true.real.label"/>
             <parameter key="43" value="ID.true.real.id"/>
         <operator activated="true" class="select_attributes" compatibility="5.2.003" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
           <parameter key="attribute_filter_type" value="regular_expression"/>
           <parameter key="regular_expression" value="c|h|n|u"/>
           <parameter key="invert_selection" value="true"/>
         <operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="380" y="30">
           <parameter key="training_window_width" value="20"/>
           <parameter key="test_window_width" value="20"/>
           <process expanded="true" height="465" width="295">
             <operator activated="true" class="neural_net" compatibility="5.2.003" expanded="true" height="76" name="Neural Net" width="90" x="102" y="30">
               <list key="hidden_layers"/>
               <parameter key="training_cycles" value="2000"/>
               <parameter key="normalize" value="false"/>
             <connect from_port="training" to_op="Neural Net" to_port="training set"/>
             <connect from_op="Neural Net" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           <process expanded="true" height="465" width="295">
             <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             <operator activated="true" class="series:forecasting_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance" width="90" x="170" y="30">
               <parameter key="horizon" value="1"/>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
         <connect from_op="Read Excel" from_port="output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
  • qwertzqwertz Member Posts: 130 Contributor II

    Hi haddock,

    thanks for having a look into my issue here. I have to admit that I haven't found the other post...

    Ok, it seems obvious that this error is related to missing values then. (Stupid me! I posted the wrong file... In the end it showed the same behaviour but the original one hasn't had empty columns. Just some missing values.)

    Nevertheless, I wonder if there is a rule of thumb on how many missing values a neural net can handle. I've had a generated data set where I deleted only some single values just to see if a neural net can handle this - success. However, it tends to throw errors if either the number of missing values or training cycles increase.

    My real data is indeed similar to what I've posted. In case that I filter all the examples with missing values there won't be much left to build a model with... Any ideas on that?

    (And there is still this issue that pressing the stop bottom doesn't stop the process... though this is obsolete once the model works fine..)

    Bye & take care
  • haddockhaddock Member Posts: 849 Maven
    Hi there Qwertz,

    I don't use neuros, but my guess is that you'll need to replace missing values with something, perhaps the average of the given values for the attribute, but at least a value that doesn't distort too much. It's not just neuros that choke on missing values and  incomplete data inevitably invites bias so you have my sympathy!

    Best wishes.
  • qwertzqwertz Member Posts: 130 Contributor II

    Ok... I tested a new process for hours now and I came to the point where I am convinced that something is wrong with the neural net.

    My investigation revealed that:
    - I can run the attached process with a certain number of attributes
    - One more attribute and the neural net hangs up (doesn't matter which one)
    - If I take another arbitrary attribute out then it works again
    - If I reduce the lengths of the attributes' names then I can run the neural net with more attributes (still not with all)
    - The higher the number of training cycles the less the number off attributes which can be handeled

    To reproduce this error try the following:
    - Run the process as provided here --> ok
    - Add attribute "s" to the subset in the "select attribute" operator --> fail
    - Change number of training cycles to 10 --> ok again

    I uploaded another sample (this time there are no missing values, etc.)
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
     <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
       <process expanded="true" height="407" width="865">
         <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
           <parameter key="excel_file" value="C:\sample2.xls"/>
           <parameter key="imported_cell_range" value="A1:AX235"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           <parameter key="date_format" value="dd.MM.yyyy"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="id.true.integer.id"/>
             <parameter key="1" value="a.true.real.attribute"/>
             <parameter key="2" value="b.true.real.attribute"/>
             <parameter key="3" value="c.true.real.attribute"/>
             <parameter key="4" value="d.true.real.attribute"/>
             <parameter key="5" value="e.true.real.attribute"/>
             <parameter key="6" value="f.true.real.attribute"/>
             <parameter key="7" value="g.true.real.attribute"/>
             <parameter key="8" value="h.true.real.attribute"/>
             <parameter key="9" value="i.true.real.attribute"/>
             <parameter key="10" value="j.true.real.attribute"/>
             <parameter key="11" value="k.true.real.attribute"/>
             <parameter key="12" value="l.true.real.attribute"/>
             <parameter key="13" value="m.true.real.attribute"/>
             <parameter key="14" value="n.true.real.attribute"/>
             <parameter key="15" value="o.true.real.attribute"/>
             <parameter key="16" value="p.true.real.attribute"/>
             <parameter key="17" value="q.true.real.attribute"/>
             <parameter key="18" value="r.true.real.attribute"/>
             <parameter key="19" value="s.true.real.attribute"/>
             <parameter key="20" value="t.true.real.attribute"/>
             <parameter key="21" value="u.true.real.attribute"/>
             <parameter key="22" value="v.true.real.attribute"/>
             <parameter key="23" value="w.true.real.attribute"/>
             <parameter key="24" value="x.true.real.attribute"/>
             <parameter key="25" value="y.true.real.attribute"/>
             <parameter key="26" value="z.true.real.attribute"/>
             <parameter key="27" value="aa.true.real.attribute"/>
             <parameter key="28" value="ab.true.real.attribute"/>
             <parameter key="29" value="ac.true.real.attribute"/>
             <parameter key="30" value="ad.true.real.attribute"/>
             <parameter key="31" value="ae.true.real.attribute"/>
             <parameter key="32" value="af.true.real.attribute"/>
             <parameter key="33" value="ag.true.real.attribute"/>
             <parameter key="34" value="ah.true.real.attribute"/>
             <parameter key="35" value="ai.true.real.attribute"/>
             <parameter key="36" value="aj.true.real.attribute"/>
             <parameter key="37" value="ak.true.real.attribute"/>
             <parameter key="38" value="al.true.real.attribute"/>
             <parameter key="39" value="am.true.real.attribute"/>
             <parameter key="40" value="an.true.real.attribute"/>
             <parameter key="41" value="ao.true.real.attribute"/>
             <parameter key="42" value="ap.true.real.attribute"/>
             <parameter key="43" value="aq.true.real.attribute"/>
             <parameter key="44" value="ar.true.real.attribute"/>
             <parameter key="45" value="as.true.real.attribute"/>
             <parameter key="46" value="at.true.real.attribute"/>
             <parameter key="47" value="au.true.real.attribute"/>
             <parameter key="48" value="av.true.real.attribute"/>
             <parameter key="49" value="aw.true.real.attribute"/>
         <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes (3)" width="90" x="179" y="120">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="id|a|aa|ab|ac|ad|ae|af|ag|ah|ai|aj|ak|al|am|an|ao|ap|aq|ar|as|at|au|av|aw|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r"/>
         <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="313" y="120"/>
         <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="447" y="30">
           <parameter key="horizon" value="1"/>
           <parameter key="window_size" value="1"/>
           <parameter key="create_label" value="true"/>
           <parameter key="label_attribute" value="a"/>
         <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="585" y="30">
           <parameter key="training_window_width" value="20"/>
           <parameter key="training_window_step_size" value="10"/>
           <parameter key="test_window_width" value="20"/>
           <process expanded="true" height="407" width="346">
             <operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" height="76" name="Neural Net" width="90" x="128" y="30">
               <list key="hidden_layers"/>
               <parameter key="training_cycles" value="255"/>
               <parameter key="learning_rate" value="0.25"/>
               <parameter key="momentum" value="0.05"/>
             <connect from_port="training" to_op="Neural Net" to_port="training set"/>
             <connect from_op="Neural Net" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           <process expanded="true" height="407" width="346">
             <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             <operator activated="true" class="series:forecasting_performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
               <parameter key="horizon" value="1"/>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
         <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing (2)" width="90" x="447" y="210">
           <parameter key="window_size" value="1"/>
           <parameter key="create_label" value="true"/>
           <parameter key="label_attribute" value="a"/>
         <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="715" y="165">
           <list key="application_parameters"/>
         <connect from_op="Read Excel" from_port="output" to_op="Select Attributes (3)" to_port="example set input"/>
         <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Windowing" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Windowing (2)" to_port="example set input"/>
         <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
         <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
         <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="126"/>
         <portSpacing port="sink_result 2" spacing="0"/>

    I am completely at a loss... :(
    Looking forward to hearing from you

  • haddockhaddock Member Posts: 849 Maven

    Sorry, I cannot confirm this, all works OK on my my linux 16GB box.

  • qwertzqwertz Member Posts: 130 Contributor II
    Hi haddock,

    sorry, there was a typo in my post [now modified]: attribute "r" is already in the list. I meant please add attribute "s" and the process will hang up.

    PS: I run Rapidminer on Windows 7 with 4GB ram.

    Have a nice day
  • haddockhaddock Member Posts: 849 Maven
    Bonjour Qwertz!

    Indeed, this does hang, so we're left to ponder why, and here we hit the reason that I don't use neuros, namely that convergence to a solution cannot be guaranteed. You can hit local minima which trap the search and so hang the machine. Apologies if you're fully aware of this, otherwise just imagine black holes in your search space. Support vector machines do not have this property.

    There could be something wrong with the implementation, but there is something frail in the whole neuro approach.

    Sorry not to be more definite.
Sign In or Register to comment.