Options

Create Objective Function fro Evolutionary Algorithm (Parameter Optimization)

knnbayazknnbayaz Member Posts: 9 Contributor II
edited November 2018 in Help
Hi Dear Community

I have been working on price forecasting by support vector regression for my thesis. I created relevant features with the prices. I created the model which implements feature selection and parameter optimization together. I used evolutionary algorithm to select best "k" attributes (svm attribute weight algorithm was used and "k" attributes selected) and support vector regression parameters (nu, gamma, C). I also combined performance vectors by combine performance operator. Root mean squared error and number of attributes was selected as criterias. The weight of root mean squared error is 0.7 and the weight of number of attributes is 0.3. The model works great. The model finds best parameter and minimum nuber of feature.

My problem is to add correletion into the combine performance as criteria. Root mean squared error and number of attributes was optimized  to minimization way but correlation must be optimized to maximization way. How can i combine these three criteria in combine performance operator? I looked at create formula operation. Maybe it can help me? but i dont know how can it work? I am waiting for your help.

Another question is evolutionary optimization stops before number of max generations. Max generations is 100 but generally it stops in 50th generations. Do you know anything about it?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <description>Pazar:
Weight By PCA (5), Gauss,Tour (0.25), Prob(0.9),Keep Best
Parameter set:

Performance:
PerformanceVector [
*****root_mean_squared_error: 15.123132 +/- 1.558278 (mikro: 15.203202 +/- 0.000000)
-----absolute_error: 10.749156 +/- 1.021879 (mikro: 10.749156 +/- 10.751419)
-----relative_error: 8.83% +/- 0.96% (mikro: 8.83% +/- 11.87%)
-----correlation: 0.893216 +/- 0.021480 (mikro: 0.892177)
-----spearman_rho: 0.897900 +/- 0.023013 (mikro: 5.387402)
-----kendall_tau: 0.730874 +/- 0.031571 (mikro: 4.385243)
]
SVM.C = 349.34275575370486
SVM.nu = 0.4741072199572831
SVM.gamma = 1.0E-5
Select by Weights.k = 33
C:349.33742656664316 Nu:0.4291679723296585 Gamma:1.0E-5 RMSE:13.069646621784996 MAPE:0.07480695227635964 MAE:9.443566849050331 Corr:0.9176969415924828 K:26.0

Pazar:
Weight By Relief (20) (NonNormalized, Gauss,Tour (0.25), Prob(0.9)
C:324.82805462669523 NU:0.46831863984298205 Gamma:7.099485940907746E-5 RMSE:11.20861632389257 MAPE0.06294480347697497 MAE:7.949018244089598 Corr:0.9374834724799084 K:33.0


Cumartesi
Weight By Relief (20) (NonNormalized, Gauss,Boltzmann, Prob(0.9)
PerformanceVector [
*****root_mean_squared_error: 12.903715 +/- 1.111219 (mikro: 12.951474 +/- 0.000000)
-----absolute_error: 9.139935 +/- 0.653655 (mikro: 9.139935 +/- 9.176180)
-----relative_error: 7.02% +/- 0.54% (mikro: 7.02% +/- 8.97%)
-----correlation: 0.929155 +/- 0.012758 (mikro: 0.929126)
-----spearman_rho: 0.924755 +/- 0.011501 (mikro: 5.548529)
-----kendall_tau: 0.772453 +/- 0.017273 (mikro: 4.634719)
]
SVM.C = 324.82805462669523
SVM.nu = 0.46831863984298205
SVM.gamma = 7.099485940907746E-5
Select by Weights.k = 33



Pazartesi
Weight By Relief (20) (Non-Normalized, Switch,Boltzmann, Prob(0.9)
PerformanceVector [
*****root_mean_squared_error: 14.406567 +/- 1.108196 (mikro: 14.450276 +/- 0.000000)
-----absolute_error: 10.157388 +/- 0.753901 (mikro: 10.158396 +/- 10.277037)
-----relative_error: 7.63% +/- 0.95% (mikro: 7.63% +/- 9.78%)
-----correlation: 0.886775 +/- 0.013296 (mikro: 0.888370)
-----spearman_rho: 0.899418 +/- 0.012317 (mikro: 5.396508)
-----kendall_tau: 0.736149 +/- 0.015988 (mikro: 4.416893)
]
SVM.C = 324.82805462669523
SVM.nu = 0.4746549118743857
SVM.gamma = 7.099485940907746E-5
Select by Weights.k = 33

C:324.82805462669523 Nu:0.4746549118743857 Gamma:7.099485940907746E-5 RMSE:14.273522402331189 MAPE:0.07110594344955053 MAE:9.718228530010594 Corr:0.8747126747273638 K:33.0

C:324.82805462669523 Nu:0.4746549118743857 Gamma7.099485940907746E-5 RMSE:14.13840447755943 MAPE:0.07282235225443726 MAE:9.781834957950943 Corr:0.8722168856716528 K:25.0


Hafta İçi
Weight By Relief (20) (Non-Normalized, Switch,Boltzmann, Prob(0.9)

Performance:
PerformanceVector [
*****root_mean_squared_error: 11.759630 +/- 1.530881 (mikro: 11.858857 +/- 0.000000)
-----absolute_error: 7.690999 +/- 0.924004 (mikro: 7.690999 +/- 9.026684)
-----relative_error: 5.49% +/- 0.70% (mikro: 5.49% +/- 7.37%)
-----correlation: 0.931027 +/- 0.013617 (mikro: 0.930025)
-----spearman_rho: 0.935731 +/- 0.008928 (mikro: 5.614387)
-----kendall_tau: 0.794053 +/- 0.019052 (mikro: 4.764317)
]
SVM.C = 324.82805462669523
SVM.nu = 0.4746549118743857
SVM.gamma = 1.9062962726105077E-4
Select by Weights.k = 36

C:324.82805462669523 NU:0.4746549118743857 Gamma:1.9062962726105077E-4 RMSE:11.007134982553742 MAPE:0.05390369973896053 MAE:7.495131592979047 Corr:0.9348047744384642 K:36.0

C:324.82805462669523 NU:0.4746549118743857 Gamma:7.099485940907746E-5 RMSE:11.4593106153097 MAPE:0.05537831381509461 MAE:7.888946409682697 Corr:0.9337350092766873 K:33.0

C:188.2015332981262 NU:0.29386413755875573 Gamma:1.9062962726105077E-4 RMSE:11.526658754110203 MAPE:0.05801719898871299 MAE:7.9792785705712825 Corr:0.9325221843509023 K:36.0</description>
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="341" width="435">
      <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
        <parameter key="excel_file" value="C:\Users\KenanB\Desktop\TEZ\PTF\TrainDataClusByDay.xlsx"/>
        <parameter key="sheet_number" value="2"/>
        <parameter key="imported_cell_range" value="A1:AL1045"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Tarih.true.date_time.id"/>
          <parameter key="1" value="SGOF.true.numeric.label"/>
          <parameter key="2" value="1DayLag.true.numeric.attribute"/>
          <parameter key="3" value="2DayLag.true.numeric.attribute"/>
          <parameter key="4" value="3DayLag.true.numeric.attribute"/>
          <parameter key="5" value="1WeekLag.true.numeric.attribute"/>
          <parameter key="6" value="Volatilite.true.real.attribute"/>
          <parameter key="7" value="MACD.true.real.attribute"/>
          <parameter key="8" value="1HaftaSaatlikOrtalama.true.real.attribute"/>
          <parameter key="9" value="1HftSaOrtEksiSapma.true.real.attribute"/>
          <parameter key="10" value="1HftSaOrtArtıSapma.true.real.attribute"/>
          <parameter key="11" value="4HaftaOrtalama.true.numeric.attribute"/>
          <parameter key="12" value="2HaftaOrtalama.true.numeric.attribute"/>
          <parameter key="13" value="KGUP-BilateralAggreements.true.numeric.attribute"/>
          <parameter key="14" value="Saat0.true.integer.attribute"/>
          <parameter key="15" value="Saat1.true.integer.attribute"/>
          <parameter key="16" value="Saat2.true.integer.attribute"/>
          <parameter key="17" value="Saat3.true.integer.attribute"/>
          <parameter key="18" value="Saat4.true.integer.attribute"/>
          <parameter key="19" value="Saat5.true.integer.attribute"/>
          <parameter key="20" value="Saat6.true.integer.attribute"/>
          <parameter key="21" value="Saat7.true.integer.attribute"/>
          <parameter key="22" value="Saat8.true.integer.attribute"/>
          <parameter key="23" value="Saat9.true.integer.attribute"/>
          <parameter key="24" value="Saat10.true.integer.attribute"/>
          <parameter key="25" value="Saat11.true.integer.attribute"/>
          <parameter key="26" value="Saat12.true.integer.attribute"/>
          <parameter key="27" value="Saat13.true.integer.attribute"/>
          <parameter key="28" value="Saat14.true.integer.attribute"/>
          <parameter key="29" value="Saat15.true.integer.attribute"/>
          <parameter key="30" value="Saat16.true.integer.attribute"/>
          <parameter key="31" value="Saat17.true.integer.attribute"/>
          <parameter key="32" value="Saat18.true.integer.attribute"/>
          <parameter key="33" value="Saat19.true.integer.attribute"/>
          <parameter key="34" value="Saat20.true.integer.attribute"/>
          <parameter key="35" value="Saat21.true.integer.attribute"/>
          <parameter key="36" value="Saat22.true.integer.attribute"/>
          <parameter key="37" value="Saat23.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="weight_by_svm" compatibility="5.2.008" expanded="true" height="76" name="Weight by SVM" width="90" x="101" y="121">
        <parameter key="normalize_weights" value="false"/>
        <parameter key="C" value="300.0"/>
      </operator>
      <operator activated="true" class="parallel:optimize_parameters_evolutionary_parallel" compatibility="5.1.000" expanded="true" height="130" name="Optimize Parameters (Evolutionary)" width="90" x="246" y="30">
        <list key="parameters">
          <parameter key="SVM.C" value="[100;500]"/>
          <parameter key="SVM.nu" value="[0.01;0.5]"/>
          <parameter key="SVM.gamma" value="[0.000001;0.01]"/>
          <parameter key="Select by Weights.k" value="[1;36]"/>
        </list>
        <parameter key="max_generations" value="100"/>
        <parameter key="population_size" value="8"/>
        <parameter key="keep_best" value="false"/>
        <parameter key="selection_type" value="roulette wheel"/>
        <parameter key="crossover_prob" value="0.5"/>
        <parameter key="number_of_threads" value="8"/>
        <parameter key="parallelize_optimization_process" value="true"/>
        <process expanded="true" height="296" width="524">
          <operator activated="true" class="select_by_weights" compatibility="5.2.008" expanded="true" height="94" name="Select by Weights" width="90" x="45" y="30">
            <parameter key="weight_relation" value="top k"/>
            <parameter key="k" value="3"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
            <parameter key="number_of_validations" value="5"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <parameter key="parallelize_training" value="true"/>
            <parameter key="parallelize_testing" value="true"/>
            <process expanded="true" height="332" width="330">
              <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.008" expanded="true" height="76" name="SVM" width="90" x="120" y="30">
                <parameter key="svm_type" value="nu-SVR"/>
                <parameter key="gamma" value="0.005646641766942668"/>
                <parameter key="C" value="349.3487713713193"/>
                <parameter key="nu" value="0.43304461037008907"/>
                <parameter key="cache_size" value="240"/>
                <list key="class_weights"/>
                <parameter key="calculate_confidences" value="true"/>
              </operator>
              <connect from_port="training" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="332" width="330">
              <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="187" y="30">
                <parameter key="main_criterion" value="root_mean_squared_error"/>
                <parameter key="absolute_error" value="true"/>
                <parameter key="relative_error" value="true"/>
                <parameter key="correlation" value="true"/>
                <parameter key="spearman_rho" value="true"/>
                <parameter key="kendall_tau" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="performance_attribute_count" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="94" y="201"/>
          <operator activated="true" class="combine_performances" compatibility="5.2.008" expanded="true" height="60" name="Performance (3)" width="90" x="232" y="194">
            <list key="criteria_weights">
              <parameter key="root_mean_squared_error" value="0.7"/>
              <parameter key="number_of_attributes" value="0.3"/>
            </list>
          </operator>
          <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="380" y="165">
            <parameter key="filename" value="C:\Users\KenanB\Desktop\TEZ\log\log1.txt"/>
            <list key="log">
              <parameter key="App" value="operator.Validation.value.applycount"/>
              <parameter key="C" value="operator.SVM.parameter.C"/>
              <parameter key="Nu" value="operator.SVM.parameter.nu"/>
              <parameter key="Gamma" value="operator.SVM.parameter.gamma"/>
              <parameter key="RMSE" value="operator.Performance.value.root_mean_squared_error"/>
              <parameter key="MAPE" value="operator.Performance.value.relative_error"/>
              <parameter key="MAE" value="operator.Performance.value.absolute_error"/>
              <parameter key="Corr" value="operator.Performance.value.correlation"/>
              <parameter key="K" value="operator.Select by Weights.parameter.k"/>
            </list>
          </operator>
          <connect from_port="input 1" to_op="Select by Weights" to_port="weights"/>
          <connect from_port="input 2" to_op="Select by Weights" to_port="example set input"/>
          <connect from_op="Select by Weights" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="training" to_op="Performance (2)" to_port="example set"/>
          <connect from_op="Validation" from_port="averagable 1" to_op="Performance (2)" to_port="performance"/>
          <connect from_op="Performance (2)" from_port="performance" to_op="Performance (3)" to_port="performance"/>
          <connect from_op="Performance (3)" from_port="performance" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="source_input 3" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Weight by SVM" to_port="example set"/>
      <connect from_op="Weight by SVM" from_port="weights" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
      <connect from_op="Weight by SVM" from_port="example set" to_op="Optimize Parameters (Evolutionary)" to_port="input 2"/>
      <connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="result 1"/>
      <connect from_op="Optimize Parameters (Evolutionary)" from_port="parameter" to_port="result 3"/>
      <connect from_op="Optimize Parameters (Evolutionary)" from_port="result 1" to_port="result 2"/>
      <connect from_op="Optimize Parameters (Evolutionary)" from_port="result 2" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the Combine Performances operator should "know" the direction of the criteria. The new documentation states:

    "It should be noted that some criteria values are considered positive by this operator e.g. accuracy. On the other hand some criteria values (usually error related) are considered negative by this operator e.g. relative error.."

    Did you try if that is true for your problem? Maybe you can test it in a small example process.

    Best regards,
    Marius
Sign In or Register to comment.