stock prediction model problem

ahanazi · June 2009

gent;
i hope you are doing well.

i'm using SVM - LibSVMLearner

i gut this result:
class precision : pred. Sell = 95%
: pred. buy= 90%
when i try to apply the model in another dataset it give me all rediction is :Buy!!!!!
how can i solve this??

***here is the XML *****
<operator name="XLU Prediction with a SVM" class="Process" expanded="yes">
<parameter key="resultfile" value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\result.res"/>
<operator name="Load Data from Spreadsheet" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\data\index-weekly-from-1-1-2003-T0-28-2-2009.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="11"/>
<parameter key="create_id" value="true"/>
<parameter key="id_column" value="2"/>
</operator>
<operator name="Normalize the Data" class="Normalization">
<parameter key="return_preprocessing_model" value="true"/>
<parameter key="create_view" value="true"/>
<parameter key="min" value="0.1"/>
<parameter key="max" value="0.9"/>
</operator>
<operator name="DataStatistics" class="DataStatistics">
</operator>
<operator name="Cross Validate" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="Train the SVM" class="LibSVMLearner">
<parameter key="keep_example_set" value="true"/>
<parameter key="degree" value="5"/>
<parameter key="gamma" value="0.8976"/>
<parameter key="C" value="19.0"/>
<list key="class_weights">
</list>
<parameter key="calculate_confidences" value="true"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\realModelFile\SVM ahmed model.mod"/>
<parameter key="overwrite_existing_file" value="false"/>
<parameter key="output_type" value="XML"/>
</operator>
</operator>
<operator name="Test the SVM's Performance" class="OperatorChain" expanded="yes">
<operator name="Apply the SVM to Test Data" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
</operator>
<operator name="Give Performance Stats" class="ClassificationPerformance">
<parameter key="keep_example_set" value="true"/>
<parameter key="accuracy" value="true"/>
<parameter key="weighted_mean_recall" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="margin" value="true"/>
<parameter key="logistic_loss" value="true"/>
<list key="class_weights">
</list>
</operator>
</operator>
</operator>
***************************************************end***************************************************
Best Regards for all.

land · June 2009

Hi,
your are estimating the performance on the train set. Since SVMs easily overfit the data (especially with rbf kernel), the performance on the train set might be very good (possibly 100%) but will fail on new examples.
I would suggest you are take a look on the sample processes for the xvalidation, since you are using it in a strange way: You are learning a model for each fold and write each of them into the same file, overwriting the previous one. Originally XValidation is build for performance estimation. This would be achieved by a setup like the following:

<operator name="XLU Prediction with a SVM" class="Process" expanded="yes">
    <parameter key="resultfile"	value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\result.res"/>
    <operator name="Load Data from Spreadsheet" class="ExcelExampleSource">
        <parameter key="excel_file"	value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\data\index-weekly-from-1-1-2003-T0-28-2-2009.xls"/>
        <parameter key="first_row_as_names"	value="true"/>
        <parameter key="create_label"	value="true"/>
        <parameter key="label_column"	value="11"/>
        <parameter key="create_id"	value="true"/>
        <parameter key="id_column"	value="2"/>
    </operator>
    <operator name="Normalize the Data" class="Normalization">
        <parameter key="return_preprocessing_model"	value="true"/>
        <parameter key="create_view"	value="true"/>
        <parameter key="min"	value="0.1"/>
        <parameter key="max"	value="0.9"/>
    </operator>
    <operator name="DataStatistics" class="DataStatistics">
    </operator>
    <operator name="Cross Validate" class="XValidation" expanded="yes">
        <parameter key="keep_example_set"	value="true"/>
        <parameter key="create_complete_model"	value="true"/>
        <operator name="Train the SVM" class="LibSVMLearner">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="degree"	value="5"/>
            <parameter key="gamma"	value="0.8976"/>
            <parameter key="C"	value="19.0"/>
            <list key="class_weights">
            </list>
            <parameter key="calculate_confidences"	value="true"/>
        </operator>
        <operator name="Test the SVM's Performance" class="OperatorChain" expanded="no">
            <operator name="Apply the SVM to Test Data" class="ModelApplier">
                <parameter key="keep_model"	value="true"/>
                <list key="application_parameters">
                </list>
                <parameter key="create_view"	value="true"/>
            </operator>
            <operator name="Give Performance Stats" class="ClassificationPerformance">
                <parameter key="keep_example_set"	value="true"/>
                <parameter key="accuracy"	value="true"/>
                <parameter key="weighted_mean_recall"	value="true"/>
                <parameter key="weighted_mean_precision"	value="true"/>
                <parameter key="correlation"	value="true"/>
                <parameter key="margin"	value="true"/>
                <parameter key="logistic_loss"	value="true"/>
                <list key="class_weights">
                </list>
            </operator>
        </operator>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file"	value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\realModelFile\SVM ahmed model.mod"/>
        <parameter key="overwrite_existing_file"	value="false"/>
        <parameter key="output_type"	value="XML"/>
    </operator>
</operator>

Greetings,
Sebastian

emolano · June 2009

so the model writer should be after the cross validate chain? always?
does it mean that the example on http://www.neuralmarkettrends.com/2007/05/09/building-an-ai-financial-model-lesson-iv/ is wrong ?
Please look at the way they do it at http://www.neuralmarkettrends.com/wp-content/uploads/2007/05/preformance-pref.JPG - they write the model after every iteration. Maybe land got the code from them.
Thanks
e

haddock · June 2009

Hi there,

Sebastian is right to say that writing out the model each time makes no sense at all. I'd like to add that you should look at sliding window validation as well, and avoid stratified sampling. This latter issue has already been covered at...

http://rapid-i.com/rapidforum/index.php/topic,908.msg3395.html#msg3395

Otherwise you will have some very nasty surprises when you trade!

Good weekend to all!

ahanazi · June 2009

thaks a lot.
actually i'm new for rapidminer...
how can i finad find ready made models? which i can learn from and make development ?

my objective is to create models for trading stock . i lost a lot of money....now i back with rapidminer as last chance ;D.

haddock · June 2009

Hi there Ahanazi,

Reading your last post I felt a lot of sympathy, but also a need to try and warn you about what you can achieve with RM, and what you cannot. I've been in the field of artificial intelligence/machine learning and finance for twenty years, I've made mistakes and I've tried to learn from them, some are obvious while others sneak up on you when you aren't paying attention. So here are my two cents...

Whatever you do, never trade a system you don't understand, so don't get one off the shelf, because you will never understand how to set your stop losses and profit targets, and without money management only doom awaits. Also understand that markets evolve, which means that there is not a model that will always work, unless of course it can evolve as well. We are talking about human activity, where the rules of the game change, not copper sulphate, whose properties are constant.

The good news is that by being here you are on the right track, but you will need to put in a lot of work. RM is in my experience simply the best environment to build and test models in a rigorous way. That means that you can build and test extensive systems rapidly, just like it says on the box. I use it to identify markets where models work best, and to understand what performance I can reasonably expect; generally if you ask people why they trade market X rather than market Y they have no idea. No wonder most traders lose their money in less than two years. So, in a nutshell, you can use RM to identify markets and trading horizons. If you press on the globe underneath my ludicrous picture you'll get an idea of what I mean. I'll post up more details of my methods there if folks are interested.

So you can reasonably expect to build and test systems whose limitations you will understand, what you will then need is a way of simulating their behaviour with various money management techniques, by integrating with Tradestation or the like. I cannot stress too much the need to match the performance of whatever you make with your ability to handle the hits that the real world will dole up to any model. Risk too much and die quickly, risk too little and die from boredom!

If you have managed to stay awake through all of the above you probably have the stamina to see it through. Start with the examples, and don't even think about markets until you can explain all of those examples. Like most things, it will take many many hours of hard work, but the rewards are there. No point in doing otherwise, as poor is not fun!

" I'm so poor I can't even pay attention." ~Ron Kittle, 1987

ahanazi · June 2009

Mr. Haddock
Good morning;

i read what you have written carefuly, and i would like to comments:
i like to talk to expert people like you.
2- in my working in RM, i like to acheive tow things, i want to create model give me percentage of propapilty of high return with percentage of the risk of buying this stock (or group of stocks), i beleve that RM will help in this. i have used Amibroker a lot, but i feel will not help me in this... that how and why i'm using RM.
i visit your website of risk, and i like to learn a lot about if you can post examples created by RM.

my question: from where can i get lot usefull examples of RM?.

again thanks a lot for your valuable advice.
BR

wessel · June 2009

I have a problem very similar to this:
Regression problem: 2 numerical attributes, 1 numerical class attribute.

I would like to evaluate the performance of 2 different "machine learning algorithms"
- WEKA REP-TREE
- WEKA Linear Regression

I'm able to do this in WEKA, but I can't figure out the correct setup in RapidMinder.
I would be greatful for some xml examples on how to compare different "machine learning algorithms".

This is what I do in WEKA:
TRAIN / TEST percentage split, random order using seed
do this 5 times, using a different seed

Dataset:
v u label
-------------------
v0 u0 label0
v1 u1 label1
v2 u2 label2
... ...
v99 u99 label99

performance := <abs error, rel error>

Output example::
seed, performance(Linear Regression), performance(REP-TREE)
0, <5, 100%>,

, 60%>
1, <5, 100%>, <2, 40%>
2, <5, 100%>,

, 60%>
3, <5, 100%>, <2, 40%>
4, <5, 100%>,

, 60%>

significant_difference(REP-TREE, LinearRegression) == True

haddock · June 2009

Hi Folks!

Quick replies, as food beckons and stomach makes funny noises...

@ ahanazi - Sorry, when I said "examples" I should have said "samples", my apologies. So I mean all you can open if you go File/Open/Samples and pick your subject area. These are the files that the tutorial ( Help/Rapidminer Tutorial ) selects from.

@ wessel - If you cast a glance over the following code you'll soon see that it is just a wrapper applied around one of those validation samples, and some more learners bashed in, just a few minutes to produce more or less infinite combos. Given it grinds on random junk it is only a junk muncher, but it might point you in interesting directions, hope so.

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="random"/>
        <parameter key="number_of_attributes"	value="2"/>
    </operator>
    <operator name="For Leaner = 1 to 3" class="IteratingOperatorChain" expanded="yes">
        <parameter key="iterations"	value="3"/>
        <operator name="Set Learner Number" class="SingleMacroDefinition">
            <parameter key="macro"	value="Learner"/>
            <parameter key="value"	value="%{a}"/>
        </operator>
        <operator name="Train and Test" class="XValidation" expanded="yes">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="sampling_type"	value="shuffled sampling"/>
            <operator name="Create Model Using Learner Number" class="OperatorSelector" expanded="yes">
                <operator name="1 Training" class="LibSVMLearner">
                    <parameter key="svm_type"	value="epsilon-SVR"/>
                    <parameter key="kernel_type"	value="poly"/>
                    <parameter key="C"	value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="2 LinearRegression" class="LinearRegression">
                </operator>
                <operator name="3 GPLearner" class="GPLearner">
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error"	value="true"/>
                    <parameter key="absolute_error"	value="true"/>
                    <parameter key="relative_error"	value="true"/>
                    <parameter key="normalized_absolute_error"	value="true"/>
                    <parameter key="root_relative_squared_error"	value="true"/>
                    <parameter key="squared_error"	value="true"/>
                    <parameter key="correlation"	value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Map results to a table" class="ProcessLog">
            <list key="log">
              <parameter key="Learner"	value="operator.Set Learner Number.value.macro_value"/>
              <parameter key="Performance"	value="operator.Train and Test.value.performance"/>
            </list>
        </operator>
    </operator>
    <operator name="Cleanup and view results" class="IOConsumer">
        <parameter key="io_object"	value="ExampleSet"/>
        <parameter key="deletion_type"	value="delete_one"/>
    </operator>
</operator>

wessel · June 2009

This looks really nice. Thanks a lot ;D!

I think I can play with it and make it perform 10 X validations, all with a different seed.

I'm not sure how to modify the "Map results to a table" though.
When I double click on it, and then click "log: Edit List (2)...", it doesn't let me add columns .
Your table currently the looks like this:
Learner Performance
1 0.306
2 0.297
3 0.300

Something like this would be more informative:
Seed Absolute_Error_Learner1 Absolute_Error_Learner2 Absolute_Error_Learner3
1 0.306 0.297 0.300
2 0.326 0.227 0.320
3 0.333 0.233 0.333
4 0.346 0.247 0.340
5 0.305 0.295 0.305
...
10 0.306 0.297 0.300

With a table like this its easy to calculate if their is a significant difference by hand.

haddock · June 2009

Hi Wessel,

Not sure about the edit list problem you encountered, sometimes you need to add one by one, perhaps you could elaborate? Anyway here is a rework to expand along the lines you suggest, I've used the parameter iteration operator, as it is neater than nested loops, and I've converted the results log to an example set, so that the results can be aggregated. Have fun tinkering with it!

<operator name="Root" class="Process" expanded="yes">
    <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
        <list key="parameters">
          <parameter key="ExampleSetGenerator.local_random_seed"	value="[1.0;5.0;5;linear]"/>
          <parameter key="Create Model Using Learner Number.select_which"	value="[1.0;3.0;3;linear]"/>
        </list>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function"	value="random"/>
            <parameter key="number_of_attributes"	value="2"/>
            <parameter key="local_random_seed"	value="5"/>
        </operator>
        <operator name="Train and Test" class="XValidation" expanded="yes">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="sampling_type"	value="shuffled sampling"/>
            <operator name="Create Model Using Learner Number" class="OperatorSelector" expanded="yes">
                <parameter key="select_which"	value="3"/>
                <operator name="1 Training" class="LibSVMLearner">
                    <parameter key="svm_type"	value="epsilon-SVR"/>
                    <parameter key="kernel_type"	value="poly"/>
                    <parameter key="C"	value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="2 LinearRegression" class="LinearRegression">
                </operator>
                <operator name="3 GPLearner" class="GPLearner">
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error"	value="true"/>
                    <parameter key="absolute_error"	value="true"/>
                    <parameter key="relative_error"	value="true"/>
                    <parameter key="normalized_absolute_error"	value="true"/>
                    <parameter key="root_relative_squared_error"	value="true"/>
                    <parameter key="squared_error"	value="true"/>
                    <parameter key="correlation"	value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Map results to a table" class="ProcessLog">
            <list key="log">
              <parameter key="Seed"	value="operator.ExampleSetGenerator.parameter.local_random_seed"/>
              <parameter key="Learner"	value="operator.Create Model Using Learner Number.parameter.select_which"/>
              <parameter key="Performance"	value="operator.Evaluation.value.absolute_error"/>
            </list>
        </operator>
    </operator>
    <operator name="Convert log for aggregation" class="ProcessLog2ExampleSet">
    </operator>
    <operator name="Make model number nominal" class="AttributeSubsetPreprocessing" expanded="no">
        <parameter key="condition_class"	value="attribute_name_filter"/>
        <parameter key="attribute_name_regex"	value="Learner"/>
        <operator name="Numerical2FormattedNominal" class="Numerical2FormattedNominal">
        </operator>
    </operator>
    <operator name="Do stat by leaner" class="Aggregation">
        <list key="aggregation_attributes">
          <parameter key="Performance"	value="average"/>
          <parameter key="Performance"	value="variance"/>
          <parameter key="Performance"	value="standard_deviation"/>
        </list>
        <parameter key="group_by_attributes"	value="Learner"/>
    </operator>
</operator>

wessel · June 2009

What version of RapidMiner are you using?
When I load the xml I get XMLException: unknown operator: Numerical2FormattedNominal

I installed the value series plugin, but still gives me the same error.
I have version 4.4

When I google:
Numerical2FormattedNominal + site:http://rapid-i.com/
it only finds this post ???

haddock · June 2009

Hi Wessel,

I'm on 4.4 enterprise, and note the operator is not on the community edition, not sure why. You can use numerical2polynominal instead, like this...

<operator name="Root" class="Process" expanded="yes">
    <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
        <list key="parameters">
          <parameter key="ExampleSetGenerator.local_random_seed"	value="[1.0;5.0;5;linear]"/>
          <parameter key="Create Model Using Learner Number.select_which"	value="[1.0;3.0;3;linear]"/>
        </list>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function"	value="random"/>
            <parameter key="number_of_attributes"	value="2"/>
            <parameter key="local_random_seed"	value="5"/>
        </operator>
        <operator name="Train and Test" class="XValidation" expanded="yes">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="sampling_type"	value="shuffled sampling"/>
            <operator name="Create Model Using Learner Number" class="OperatorSelector" expanded="yes">
                <parameter key="select_which"	value="3"/>
                <operator name="1 Training" class="LibSVMLearner">
                    <parameter key="svm_type"	value="epsilon-SVR"/>
                    <parameter key="kernel_type"	value="poly"/>
                    <parameter key="C"	value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="2 LinearRegression" class="LinearRegression">
                </operator>
                <operator name="3 GPLearner" class="GPLearner">
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error"	value="true"/>
                    <parameter key="absolute_error"	value="true"/>
                    <parameter key="relative_error"	value="true"/>
                    <parameter key="normalized_absolute_error"	value="true"/>
                    <parameter key="root_relative_squared_error"	value="true"/>
                    <parameter key="squared_error"	value="true"/>
                    <parameter key="correlation"	value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Map results to a table" class="ProcessLog">
            <list key="log">
              <parameter key="Seed"	value="operator.ExampleSetGenerator.parameter.local_random_seed"/>
              <parameter key="Learner"	value="operator.Create Model Using Learner Number.parameter.select_which"/>
              <parameter key="Performance"	value="operator.Evaluation.value.absolute_error"/>
            </list>
        </operator>
    </operator>
    <operator name="Convert log for aggregation" class="ProcessLog2ExampleSet">
    </operator>
    <operator name="Make model number nominal" class="AttributeSubsetPreprocessing" expanded="yes">
        <parameter key="condition_class"	value="attribute_name_filter"/>
        <parameter key="attribute_name_regex"	value="Learner"/>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
    </operator>
    <operator name="Do stat by leaner" class="Aggregation">
        <list key="aggregation_attributes">
          <parameter key="Performance"	value="average"/>
          <parameter key="Performance"	value="variance"/>
          <parameter key="Performance"	value="standard_deviation"/>
        </list>
        <parameter key="group_by_attributes"	value="Learner"/>
    </operator>
</operator>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

stock prediction model problem

Answers