🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

"Generalized Hebbian Algorithm (GHA)"

s-a-s-hs-a-s-h Member Posts: 4 Contributor I
edited May 2019 in Help
Hello,

my setup is the ExampleSetGenerator (ESG) and the GHA. The ESG generates number of examples=30 with 500 attributes. As long as I take the standard parameters for the GHA everything works fine, the only problem is the text output of the PCs, which takes relatively long. My first question is, if this output can be disabled ?

When I change number_of_component of the GHA from -1 to 10 the error message "Process failed ! The setup does not seem to contain any obvious errors, but you should check the log messages or activate the debug mode in the settings dialog in order to get more information about this problem" appears. The log screen shows: "ArrayIndexOutOfBoundsException occured in 1st application of GHA (GHA)".

No matter, which settings I change, I always get the same error message. Does anybody of you have advice ?

Thank you,
Sascha
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Sascha,
    you could use the PCA instead, this will work up to 6000 Attributes if your RAM is large enough.
    Of course this is only a temporal solution until we have fixed the bug.

    Greetings,
      Sebastian
  • s-a-s-hs-a-s-h Member Posts: 4 Contributor I
    Hello Sebastian,

    thanks for your answer.

    How long do you think it will take to fix the bug ?

    For how many attributes do you think the GHA will work ? - For an almost unlimited number ? - Will it work i.e. for 100.000 attributes with the restriction only to calculate the first 100 PCs ?

    How long do you think these calculations will take on a standard double-processor machine with 2 GHz ? - I know that you can not give me an exact answer, but can you estimate if it will take the time of a coffee break or the time for an extensive lunch or the whole afternoon or more than a day ? - I assume that also the desired accuracy will play a big part.

    The reason why I ask this, is, that the GHA is an interesting feature and it will help a lot for the application I d like to use it . The only thing is, if it will deliver results in a reasonable amount of time for the given issue. What do you think ?

    Thank you for answer.

    Sascha
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Sasha,
    I don't know how long this will take. We have a lot to do at the moment and I somehow feel ashamed to say, but paying customers somehow insist on beeing prefered.
    Since the developer having programmed the GHA has long left our project and I have never used it, I can't say how long it will take. The standard PCA needs about 3 hours for 4000 attributes but then has calculated all PCs.
    If you are working with gene expression data, you probably has many more attributes than examples. You then might try the KernelPCA, which not only enables you to analyse the variance in higher dimensions, but also has only linear runtime in number of attributes but quadratic in number of examples.

    Greetings,
      Sebastian
  • Legacy UserLegacy User Member Posts: 0 Newbie
    Hello Sebastian,

    thank you for your helpful answer - I  ve tried the kernelPCA - as it seems, it is fast enough for me -  the only remaining question for me is: Does the kernel PCA also deliver results in the sense of complete eigenvectors and eigenvalues ?

    The following code delivers PCs and eigenvalues of the PCA:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="number_examples" value="30"/>
            <parameter key="number_of_attributes" value="500"/>
            <parameter key="target_function" value="random"/>
        </operator>
        <operator name="PCA" class="PCA">
        </operator>
        <operator name="FastICA" class="FastICA" activated="no">
            <parameter key="number_of_components" value="10"/>
        </operator>
        <operator name="KernelPCA" class="KernelPCA" activated="no">
        </operator>
    </operator>

    This code does not deliver any useful information:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="number_examples" value="30"/>
            <parameter key="number_of_attributes" value="500"/>
            <parameter key="target_function" value="random"/>
        </operator>
        <operator name="PCA" class="PCA" activated="no">
        </operator>
        <operator name="FastICA" class="FastICA" activated="no">
            <parameter key="number_of_components" value="10"/>
        </operator>
        <operator name="KernelPCA" class="KernelPCA">
        </operator>
    </operator>

    The only output is:
    "KernelPCA
    com.rapidminer.operat[email protected]"

    Is their a chance to get the actual calculated information ?

    Thank you for your answer,

    Sascha
Sign In or Register to comment.