RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

What values am I supposed to use with ARIMA PDQ?

SkyTraderSkyTrader Member Posts: 88 Contributor I
edited September 6 in Help
Hi there,

My P and Q values are not zero?

I have tried various different values but cannot get rid of this warning?



Cheers for any help,
Tagged:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,636  RM Data Scientist
    Hi,
    please change the parameters in Optimize Grid, its overriding the ones you set in ARIMA.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    Hi Martin, @mschmitz
    I'm looking at the p and q values in the Optimisation Grid of you APPL ARIMA process but I don't see any p or q value settings that = 0 and would cause the "0 is not allowed" warnings? The minimums are set to start from 1 not 0? What am I meant to change and to what values please?

    Cheers,



  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,636  RM Data Scientist
    you can try 0 if you want to. I think its allowed.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    edited September 11
    But the warning is saying p = 0 and q = 0 are not allowed?
    Anyway I went to change 1 to 0 for p and q but it made no difference as I still got the same warning?
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,087   Unicorn
    edited September 11
    Hi @SkyTrader, Hi @mschmitz  

    yes, I confirm that if p=0 AND q=0 an error is raising by RM .(I used the Automized ARIMA on US - Consumption data process from the time series  templates)

    There is an easy palliative solution : 

    In the Optimize Parameters operator, set the parameter error handling = ignore error

    hope this helps,

    Regards,

    Lionel

    PS : if this solution does not fix your error, please share your process and your data.

    EDIT : 

    for your use case, I think you can only do your parameters search with these values  : 

    p = 1,2,3,4,5
    d = 0,1,2
    q = 0,1,2,3,4,5


  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    Hi @lionelderkrikor, @mschmitz,

    Thanks very much, I set "error handling" to "ignore" and this eliminated the p,d,q warning but the process stops and my Macbook produces the coloured wheel at 84% every time I run it (it's a large Dow Jone data file with 5000 rows plus many technical indicators). I don't get any error message but the only way to resolve the issue is to keep force quitting RM?





  • hughesfleming68hughesfleming68 Member Posts: 313   Unicorn
    Hi Skytrader, how much data are you using for your prediction? If you are using a sliding window validation to see how your model performed over many years, that is one thing. If you are making a prediction then you need much less data. Nothing that happened 10 years ago is relevant to what is happening now. Not in financial data.
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    Hi @hughesfleming68
    Thanks, In this particular run I am using daily 2000-2020 (which includes the 2003, 2007 and 2020 crashes). I have a 20 day window size and 5 day horizon.

    It's the optimisation process that is causing the freeze up as ARIMA works with this data set when not optimising?

  • hughesfleming68hughesfleming68 Member Posts: 313   Unicorn
    edited September 23
    Hi SkyTrader,

    In my opinion it is too much data. There is very little signal in price data alone. It is better to think about the problem along the lines of what is driving price now. Certainly nothing that happened in 2008.

    Perhaps it is better to not think about predicting price but just positive or negative bias over the period you are interested in. You might want to look at this as a classification problem.

    I think prediction has a place in algorithmic trading but only over very short time frames. Certainly not days and not all the time. The signal to noise ratio is very variable, a lot of the time, your prediction will be random. You will have to be ready to be wrong a lot.

    I also think that Arima is the wrong tool for this particular job. Price action is too irregular.
    SkyTrader
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    edited September 24
    Cheers for the input Alex, @hughesfleming68
    I take your point about the amount of data, I've seen people discuss this issue and read research papers where a couple of years of data (and more) was used. I've not heard of using even less than a few days.

    Re: ARIMA, I was under the impression and encouraged that it was possible to get good predictions by Fabian's video Elaborate Your Time Series video: 

    I'm also working with a Random Forest process but just can't get it to predict any point beyond my data set using Apply Model operator that Martin had suggested along with the Lag operator. 

    I'd be interested to know which algorithms did you find worked best for this type of task and did you aim at getting predictions based on classification of actual price targeting?

    Many thanks,

  • hughesfleming68hughesfleming68 Member Posts: 313   Unicorn
    edited September 24
    Hi SkyTrader,

    I think you need to review which machine learning operators can extrapolate. Random Forest won't be able to do that. I would also be cautious with academic papers. I have looked at many that have been flawed in one way or another or have had results that are not reproducible. In the end, you are going to have to test everything yourself. Keep in mind, that even if you solve your prediction problems, you still need to transfer your prediction to the real world. Your prediction may look good after validation but still lose money when you try and implement it. There are two parts to this problem, the prediction and the strategy to execute it. Don't underestimate how difficult the second part is. 

    I still think classification is a better approach and this can be a pure classification task where you label your data and then predict the class or you can take a regression and turn it into a classification problem just by averaging the slope.

    Your attributes will determine if your model has any chance of predicting anything, If you select  attributes that are basically random, then your prediction will be random as well. This goes for almost all, if not all technical indicators that are commonly found in trading software. There is no edge there and very little from price momentum in general. If it was that easy to use a few indicators as attributes and then plug in a random forest and all of sudden make money, then everyone would do it. You will have to dig deeper to find your edge.

    Random Forest can work well with the right data set but my preference will always be to go directly to neural nets for this kind of job which can span from extremely simple to extremely complex. Getting results from them is not plug and play.

    You should read three books....

    1. Advances in Financial Machine Leaning by Marcos Lopez de Prado.

    2. Machine Learning for Algorithmic Trading by Stefan Jansen

    3. Any decent book on probability 

    There are many things you can achieve with machine learning and finance. The problem is that price prediction is at the bottom of the list and might not even be necessary. Trading is a reactionary business where regimes change constantly. Look to build a process that helps you identify what is happening at any one time. You will have better results in the long run.

    regards,

    Alex

     
    lionelderkrikor
Sign In or Register to comment.