X-Means Error

ScottJ · October 2012

Hello,

I have a list of messages and I am trying to determine which go together to analyze later. I am trying to group them using X-Means after creating a binomial word vector and I get an error during run time: SEVERE: java.lang.ArrayIndexOutOfBoundsException: 237. Is there something I am not doing to the examples input before I am passing it to the X-Means block?

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <parameter key="logverbosity" value="all"/>
    <parameter key="logfile" value="C:\LogToolsGUI\MachineLearning\LogMessages.log"/>
    <parameter key="resultfile" value="C:\LogToolsGUI\MachineLearning\MessageResults.res"/>
    <process expanded="true" height="566" width="815">
      <operator activated="true" class="read_csv" compatibility="5.2.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="75">
        <parameter key="csv_file" value="C:\LogToolsGUI\TestSet.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="parse_numbers" value="false"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.false.polynominal.attribute"/>
          <parameter key="1" value="att2.true.text.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="112" y="165">
        <parameter key="vector_creation" value="Term Occurrences"/>
        <parameter key="add_meta_information" value="false"/>
        <parameter key="keep_text" value="true"/>
        <parameter key="prune_method" value="absolute"/>
        <parameter key="prune_below_absolute" value="5"/>
        <parameter key="prune_above_absolute" value="9999"/>
        <list key="specify_weights"/>
        <process expanded="true" height="686" width="802">
          <operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
          <operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="45" y="120"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="5.2.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="45" y="210"/>
          <operator activated="true" class="text:filter_by_length" compatibility="5.2.004" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="246" y="210">
            <parameter key="min_chars" value="2"/>
          </operator>
          <operator activated="false" class="text:generate_n_grams_terms" compatibility="5.2.004" expanded="true" height="60" name="Generate n-Grams (Terms)" width="90" x="246" y="120"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="x_means" compatibility="5.2.008" expanded="true" height="76" name="X-Means" width="90" x="380" y="165"/>
      <connect from_op="Read CSV" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="X-Means" to_port="example set"/>
      <connect from_op="X-Means" from_port="cluster model" to_port="result 1"/>
      <connect from_op="X-Means" from_port="clustered set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

EDIT: I shrank the input size down to 300 rows and it started running. Any larger and it would fail.

MariusHelf · October 2012

Hi,

it seems you have found a bug. If you get an error message with a "submit bug report", please use it. The report will contain additional, valuable information which will help us to locate the bug.

Best, Marius

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

X-Means Error

Answers