Options

Normalization time

surfing35surfing35 Member Posts: 2 Contributor I
edited November 2018 in Help
First, as a newer user I want to complement the nice design and gui you have developed for rapidminer.

I used the normalization operator, using the z-transformation on a data set consisting of 1700 examples and 5000 features (sparse formatted ).  The attributes are all integers, stored in the sparse-float-array.  The normalization works fine but took a very long time ~ around 20 minutes;  the next stage applied learning of an svm which only took a few minutes. 
Is there any way to speed up the normalization?  I want to apply it to larger data sets in the near future:  ~50,000 examples

Thanks,

Bill

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi,

    It's about the sparse format, the following takes 8 seconds for me ...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="MassiveDataGenerator" class="MassiveDataGenerator">
            <parameter key="number_examples" value="1700"/>
            <parameter key="number_attributes" value="5000"/>
            <parameter key="sparse_representation" value="false"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
    </operator>
Sign In or Register to comment.