RapidMiner

RapidMiner

NullPointerException

Contributor II

NullPointerException

Hi,

I am a newbie to RapidMiner. I am trying to use Expectation Maximization to cluster some data. I have a around 500 000 of data rows in .csv file. I am using the process "Read CSV" -> Normalise -> Replace Missing Vlaues -> Clustering
However i always get a nullpointer exception at the clustering time  Smiley Sad
I am doing something wrong here?

Thanks in advance
Darme
14 REPLIES
Super Contributor

Re: NullPointerException

Do you get an error dialog which allows to submit a bug report? If so, please use the corresponding button.
If there is no such dialog, please post your process setup and give us a detailed description of your data (number and types of attributes, and any particularities).

Best regards,
Marius
Contributor II

Re: NullPointerException

Hi Marius,

Thank you for your prompt reply. Following is the error massage i get.

The setup does not seem to contain any obvious errors, but you should check the log massages or activate the debug mode in the settings dialog in order to get more information about this problem

The log contains the following

          subprocess 'Main Process'
            +- Read CSV[1] (Read CSV)
            +- Normalize[1] (Normalize)
            +- Replace Missing Values[1] (Replace Missing Values)
      ==>  +- Clustering[1] (Expectation Maximization Clustering)
Apr 23, 2013 4:49:13 PM SEVERE: java.lang.NullPointerException

the data has 11 attributes which are of types text, number and date. In the normalise process i have set value type to numeric
In the clustering i have set randomly assigned examples
In the  Replace Missing Values i have set attribute filter type to all and default to average

do you need any more information?  Please let me know

Thanks again
Darme
Super Contributor

Re: NullPointerException

Hi,

it seems that you also have missing values in your nominal and/or date attributes. You should remove/replace all missing values before applying Expectation Maximum Clustering.

Best regards,
Marius
Contributor II

Re: NullPointerException

Hi again,

I added two Replace Missing Vlaues steps to the below process. One has attribute filter type , "value_type" set to text  with default set to value and replenishment set as "extra"

The other has the value-type "date" and replenishment value of 23/4/2013.

Still i get the same error. Am i still on the wrong path. Please help.

Thank you very much
Darme
Super Contributor

Re: NullPointerException

Can you please post your process setup as described in the post linked in my signature?

Additionally, try to set a breakpoint before the clustering operator and inspect the metadata for missing values.

Best regards,
Marius
Contributor II

Re: NullPointerException

Hi Marius,

Once again thank you for your advices.
I have attached the code of the process i am using and i believe all the required information is there.

Since i have a very large set of data, if a breakpoint is set for clustering then i think i need to iterate for each row of data one by one.
Is there a way to stop when a value is missing, similar to setting conditions to breakpoints?

Thanks and Regards
Darrshan

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
    <process expanded="true" height="494" width="709">
      <operator activated="true" class="read_csv" compatibility="5.1.009" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\yahoo\Desktop\CSEtemp.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="StockCode.true.text.attribute"/>
          <parameter key="1" value="SectorKey.true.text.attribute"/>
          <parameter key="2" value="TimeKey.true.date.attribute"/>
          <parameter key="3" value="OpenPrice.true.real.attribute"/>
          <parameter key="4" value="ClosePrice.true.real.attribute"/>
          <parameter key="5" value="NetChange.true.real.attribute"/>
          <parameter key="6" value="ChangePercentage.true.real.attribute"/>
          <parameter key="7" value="Highest.true.real.attribute"/>
          <parameter key="8" value="Lowest.true.real.attribute"/>
          <parameter key="9" value="Volume.true.integer.attribute"/>
          <parameter key="10" value="TotalValue.true.real.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="normalize" compatibility="5.1.009" expanded="true" height="94" name="Normalize" width="90" x="45" y="255">
        <parameter key="attribute_filter_type" value="value_type"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.009" expanded="true" height="94" name="Replace Missing Values (3)" width="90" x="179" y="345">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.009" expanded="true" height="94" name="Replace Missing Values" width="90" x="313" y="345">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="text"/>
        <parameter key="default" value="value"/>
        <list key="columns">
          <parameter key="SectorKey" value="value"/>
          <parameter key="StockCode" value="value"/>
          <parameter key="TimeKey" value="value"/>
        </list>
        <parameter key="replenishment_value" value="extra"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.009" expanded="true" height="94" name="Replace Missing Values (2)" width="90" x="447" y="345">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="date"/>
        <parameter key="default" value="value"/>
        <list key="columns"/>
        <parameter key="replenishment_value" value="23/4/2013"/>
      </operator>
      <operator activated="true" class="expectation_maximization_clustering" compatibility="5.1.009" expanded="true" height="76" name="Clustering" width="90" x="514" y="75">
        <parameter key="k" value="3"/>
        <parameter key="add_as_label" value="true"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="inital_distribution" value="randomly assigned examples"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Replace Missing Values (3)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (3)" from_port="original" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Replace Missing Values (2)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>[ /code]
Super Contributor

Re: NullPointerException

No, you don't need to check each row one by one: just switch the the metadata view in the results perspective, and for each attribute you'll see the number of missing values.

Anyway, my suspect is that in the second Replace Missing Values operator you should select valye_type nominal, polynominal or binominal instead of text (text is a special data type used only in the Text Processing extension).
Experiment with that setting, *and* check the result with a breakpoint.

Best regards,
Marius
Contributor II

Re: NullPointerException

Hi,

As you have advised i changed the settings of Replace Missing Values operator and also changed the read csv operators data types accordingly.
Still i am getting the same result Smiley Sad

Also i created break points before clustering and in the meta data view the "Missing value" column shows only "?" I also set break points at each step and looked at the meta data and the result was same.

Furthermore i created the given schema on a MS SQL server evaluation edition and ran a query to retrieve null values for the given data set. The result was that there are no null values.

Do you think something else has gone wrong? Any more information needed?

Thanks again
Darme
Regular Contributor

Re: NullPointerException

I have tried to reproduce your error with my own data (with missings included), but your process runs without an error. Your process XML says you are still using a quite old version (5.1). Could you update RapidMiner to 5.3.8 and check again?