Time to Complete Expectation Maximization Analysis?

jonmrichjonmrich Member Posts: 2 Contributor I
edited November 2018 in Help
Hello. I'm running an EM Clustering. The data is 7 columns with about 2000 entries. So far, the process has been running for more than 4 1/2 hours. I'm on a MacBook Pro with 16GB of RAM, so it's not a slow machine. Is it normal for this type of analysis with this much data to take this long?

Thanks,
JMR
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
        <parameter key="csv_file" value="/Users/richman/Downloads/amendedmc3.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="MacRoman"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="StoreCode.false.integer.attribute"/>
          <parameter key="1" value="Gender.true.binominal.attribute"/>
          <parameter key="2" value="FavFood.true.polynominal.attribute"/>
          <parameter key="3" value="FavBev.true.polynominal.attribute"/>
          <parameter key="4" value="Exp.true.polynominal.attribute"/>
          <parameter key="5" value="DiningHabits.true.polynominal.attribute"/>
          <parameter key="6" value="Chldndrtwlv.true.binominal.attribute"/>
          <parameter key="7" value="Age.true.polynominal.attribute"/>
          <parameter key="8" value="Education.true.polynominal.attribute"/>
          <parameter key="9" value="AnnualIncome.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.3.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="179" y="255">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="380" y="165">
        <list key="comparison_groups"/>
      </operator>
      <operator activated="true" class="write_csv" compatibility="5.3.008" expanded="true" height="76" name="Write CSV (2)" width="90" x="581" y="75"/>
      <operator activated="true" class="expectation_maximization_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering" width="90" x="447" y="300"/>
      <connect from_op="Read CSV" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
      <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    are you sure it's still doing something? Your process setup you provided does not link anything to any output, neither the write csv nor any of the output ports of the clustering operator are connected.
    Your process with the example size you mentioned should not take much longer than half a minute, certainly not hours (took me 20 seconds with randomly generated data).
    When a process is running at the bottom left of RapidMiner you will see the name of the currently executed operator and the time it took until now.

    Regards,
    Marco
Sign In or Register to comment.