Normalization time

surfing35 · August 2009

First, as a newer user I want to complement the nice design and gui you have developed for rapidminer.

I used the normalization operator, using the z-transformation on a data set consisting of 1700 examples and 5000 features (sparse formatted ). The attributes are all integers, stored in the sparse-float-array. The normalization works fine but took a very long time ~ around 20 minutes; the next stage applied learning of an svm which only took a few minutes.
Is there any way to speed up the normalization? I want to apply it to larger data sets in the near future: ~50,000 examples

Thanks,

Bill

haddock · August 2009

Hi,

It's about the sparse format, the following takes 8 seconds for me ...

<operator name="Root" class="Process" expanded="yes">
    <operator name="MassiveDataGenerator" class="MassiveDataGenerator">
        <parameter key="number_examples"	value="1700"/>
        <parameter key="number_attributes"	value="5000"/>
        <parameter key="sparse_representation"	value="false"/>
    </operator>
    <operator name="Normalization" class="Normalization">
    </operator>
</operator>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Normalization time

Answers