Options

Problem with feature evaluation using YAGGA2

andyknownasabuandyknownasabu Member Posts: 3 Contributor I
edited November 2018 in Help
Dear all,

I've managed to set up the first working RM chain for feature evaluation -
At least I think so, because I see the following error message on the console:
G Feb 4, 2009 9:15:11 AM: [Warning] Cannot generate test attribute: No such attribute: corr. We just keep both attributes fo
r sure...
Last message repeated 2 times.
The chain looks as follows:
<?xml version="1.0" encoding="US-ASCII"?>
<process version="4.3">

  <operator name="Root" class="Process" expanded="yes">
      <operator name="Data Source" class="ArffExampleSource">
          <parameter key="data_file"    value="all_subjects.arff"/>
          <parameter key="id_attribute" value="id"/>
          <parameter key="label_attribute"      value="label"/>
      </operator>
      <operator name="YAGGA2" class="YAGGA2" expanded="yes">
          <parameter key="use_diff"    value="true"/>
          <parameter key="use_max"      value="true"/>
          <parameter key="use_min"      value="true"/>
          <parameter key="use_sin"      value="false"/>
          <parameter key="use_square_roots"    value="true"/>
          <operator name="SimpleValidation" class="SimpleValidation" expanded="yes">
              <parameter key="create_complete_model"    value="true"/>
              <operator name="DecisionTree" class="DecisionTree">
                  <parameter key="criterion"    value="gini_index"/>
                  <parameter key="maximal_depth"        value="5"/>
              </operator>
              <operator name="Applier Chain" class="OperatorChain" expanded="yes">
                  <operator name="Test" class="ModelApplier">
                      <list key="application_parameters">
                      </list>
                      <parameter key="keep_model"      value="true"/>
                  </operator>
                  <operator name="ClassificationPerformance" class="ClassificationPerformance">
                      <parameter key="keep_example_set" value="true"/>
                      <parameter key="root_mean_squared_error"  value="true"/>
                      <parameter key="root_relative_squared_error"      value="true"/>
                      <parameter key="weighted_mean_precision"  value="true"/>
                      <parameter key="weighted_mean_recall"    value="true"/>
                  </operator>
              </operator>
          </operator>
          <operator name="ProcessLog" class="ProcessLog">
              <parameter key="filename" value="process_log.txt"/>
              <list key="log">
                <parameter key="Generation"    value="operator.YAGGA2.value.generation"/>
                <parameter key="Recall" value="operator.ClassificationPerformance.value.weighted_mean_recall"/>
                <parameter key="Precision"      value="operator.ClassificationPerformance.value.weighted_mean_precision"/>
              </list>
          </operator>
      </operator>
      <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter">
          <parameter key="attribute_weights_file"      value="attribute.wgt"/>
      </operator>
      <operator name="PerformanceWriter" class="PerformanceWriter">
          <parameter key="performance_file"    value="performance.per"/>
      </operator>
      <operator name="AttributeConstructionsWriter" class="AttributeConstructionsWriter">
          <parameter key="attribute_constructions_file" value="attribute.cst"/>
      </operator>
  </operator>

</process>
Can anybody explain to me why this error occurs, what it means, how to fix it (if possible) and in general if the above
scheme makes sense at all? I'd highly appreciate to hear from your experience and concerning how to improve the above process.

Thank you very much and best regards!

Answers

  • Options
    BAMBAMBAMBAMBAMBAM Member Posts: 20 Maven
    I am having the same problem with YAGGA2 (but not with YAGGA).  I get these errors 1000's of times:

    G Aug 17, 2009 9:00:00 PM: [Warning] exp: Infinite value generated, replaced by NaN.
    G Aug 17, 2009 9:00:00 PM: [Warning] exp: NaN generated.
    G Aug 17, 2009 9:00:00 PM: [Warning] 1/: NaN generated.
    Last message repeated 5 times.
    G Aug 17, 2009 9:00:00 PM: [Warning] /: Infinite value generated.
    Last message repeated 105 times.
    ....
    G Aug 17, 2009 9:02:13 PM: [Warning] Cannot generate test attribute: No such attribute: BB202CBas / LRAll1CUpr2. We just keep both attributes for sure...
    Last message repeated 20880 times.

    I think it is a bug with YAGGA2 since I don't have the problem when I replace YAGGA2 with YAGGA.  It seems that, when there is a problem with a generated attribute (NaN or infinity error) then the code doesn't deal with the situation gracefully.


    This is my XML:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="Minutes.csv"/>
            <parameter key="label_name" value="RRRatio"/>
            <parameter key="id_name" value="id"/>
            <parameter key="sample_ratio" value="0.05"/>
        </operator>
        <operator name="AttributeFilter" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="parameter_string" value="Symbol|Bar|BarDate|BarTime|HighestHighAfter|HighestLowAfter|LowestHighAfter|LowestLowAfter|VWAvgHighAfter|VWAvgLowAfter|MaxLongLoss|MaxShortLoss|LongAvgProfit|ShortAvgProfit"/>
            <parameter key="invert_filter" value="true"/>
        </operator>
        <operator name="YAGGA2" class="YAGGA2" expanded="yes">
            <parameter key="population_size" value="100"/>
            <parameter key="maximum_number_of_generations" value="1000"/>
            <parameter key="generations_without_improval" value="10"/>
            <parameter key="p_initialize" value="0.1"/>
            <parameter key="use_plus" value="false"/>
            <parameter key="use_diff" value="true"/>
            <parameter key="use_div" value="true"/>
            <parameter key="use_square_roots" value="true"/>
            <parameter key="use_sin" value="false"/>
            <parameter key="use_log" value="true"/>
            <parameter key="use_absolute_values" value="false"/>
            <parameter key="constant_generation_prob" value="0.0"/>
            <operator name="SimpleValidation" class="SimpleValidation" expanded="yes">
                <parameter key="local_random_seed" value="10"/>
                <operator name="W-REPTree" class="W-REPTree">
                    <parameter key="M" value="1000.0"/>
                </operator>
                <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                    <operator name="Applier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="RegressionPerformance" class="RegressionPerformance">
                        <parameter key="keep_example_set" value="true"/>
                        <parameter key="spearman_rho" value="true"/>
                        <parameter key="use_example_weights" value="false"/>
                    </operator>
                </operator>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <list key="log">
                  <parameter key="generation" value="operator.YAGGA2.value.generation"/>
                  <parameter key="performance" value="operator.YAGGA2.value.performance"/>
                  <parameter key="best" value="operator.YAGGA2.value.best"/>
                </list>
            </operator>
        </operator>
        <operator name="AttributeConstructionsWriter" class="AttributeConstructionsWriter" breakpoints="after">
            <parameter key="attribute_constructions_file" value="MinuteYagga100.att"/>
        </operator>
        <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter" breakpoints="after">
            <parameter key="attribute_weights_file" value="MinuteYagga100.wgt"/>
        </operator>
    </operator>
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    there is no way to "deal" gracefully when you divide by zero or exceed the maximal range of a double value. At least the code is gracefull enough to say whats the problem: In your case exp, 1/ and / causes these errors, because you have very larg numbers and zeros in your dataset. So if you turn of these generating functions in the parameters, the problem will vanish.
    The problem does not occur in YAGGA, because it simply does not allow to construct such attributes...

    Greetings,
      Sebastian
  • Options
    BAMBAMBAMBAMBAMBAM Member Posts: 20 Maven
    I see that it does deal gracefully with the situation, because it doesn't crash :)

    However, the operator seems to repeatly create the same "dangerous" features ... and therefore the entire process gets bogged down in outputing tens of thousands of warning messages.  Perhaps it could remember which feature combinations were dangerous?

    Also, when I use YAGGA, it never has generated a "new" attribute (even though I have all the boxes checked (addition, division, reciprocal, etc.). I've had create roughly 10,000 new feature combination but I've never seem a "gensym" attribute output (or even when I stop the process and examine the current attributes being evaluated).  I haven't seen any other postings on the forums about this problem, but I just can't figure why I'd be having it.  All I do is switch back and forth between YAGGA and YAGGA2 using the "replace operator" GUI command, so the XML doesn't really change. Any ideas? I'd post the XML but it is just the same as what I'd posted before with "YAGGA2" replaced with "YAGGA".

    LG,
    John
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi John,
    it will probably speed up your process, if you select a higher logverbosity in the root operator, so that these hundreds and thousands of warnings aren't displayed any more.

    Did you make breakpoints before the xvalidation inside YAGGA to ckeck if any new attributes got generated?

    Greetings,
      Sebastian
Sign In or Register to comment.