prediction with svm

minesmines Member Posts: 12 Learner I
Does anyone know how to make a prediction for the next ten days with the svm algorithm in rapidminer?

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    Do you want to make a prediction for each of the next ten days, or just for the tenth day?

    In the first case you would build a loop with ten iterations, filtering your data accordingly. Essentially, you build a data structure where the value of the selected day is the target variable (label), and you make sure to only use data 10 days before that. For example different averages (7 day, 30 day, year ago, ...) to get different aspects of the data. 

    The "tenth day prediction" is just a special case of this without the loop.

    Note: this is what you have to do if you insist on using SVM. There are multiple more or less automatic time series prediction algorithms that do exactly what you want with a lot less effort. 

    Regards,
    Balázs

  • minesmines Member Posts: 12 Learner I
    Hello @BalazsBarany!
    I want to make a prediction for each os the next ten days. Can you explain to me how to create a loop in the rapidminer or if there any information about that ?
    Regards
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    if you look at the operators under Utility/Process Control/Loops, you'll see a lot of different ones. 
    For this use case I would use Loop Values. It takes an example set with the nominal values (these would be your dates in a textual representation). The current value is available as a macro inside the loop, so you can easily select the data according to it. 

    Regards,
    Balázs
  • minesmines Member Posts: 12 Learner I
    @BalazsBarany But i should use that after apply a model or should do that in cross validation?
    Thank you.
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    filtering the data for building the models happens before you build the model. You then apply the model to today's data.

    E. g. if you want a prediction for the 7th day from now, you would filter out data from the last 6 or 7 days (depending on when you get the value for the current day) and build the model from that, with "today" being the target (label). This model can be applied to the unfiltered data up until today and it gives you the prediction for today + 7 days.

    The point is to throw away data that you can't know yet for your prediction. You know the history and possibly today's value (maybe only in the afternoon, depending on the use case). You don't know tomorrow or the day after tomorrow, but you'd like to predict a future value. So you build the model from what you *can* know at the time of the model application, and you do that by filtering the past data accordingly.

    Regards,

    Balázs
  • minesmines Member Posts: 12 Learner I
    @BalazsBarany thank you for your help. But i use a loop value and i should use the column data (which have all my dates) or choose the column that i want to predict? Because my goal is to make a prediction with svm algorithms and i want to predict de number of cases in a disease for the next 10 days.
    Best regards
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    usually you would use the time series operators to build columns from the data history.

    You probably have something like this:

    Date | Cases 
    2021-05-13 | 13
    2021-05-14 | 12
    ...

    With the time series operators you can build moving averages over 3, 7, 14, 30 etc. days, or take the value before 10 days etc. You might have a seasonality in the data, in that case you would also care for the values 1 or 2 years before. But probably not with a new disease. And combinations between the values are also interesting to get a trend.

    So the modeling datase would be something like this:

    Date | Cases date-1 | Cases date-2 | Avg 7 days | Avg 14 days | Avg14 - Avg7 | etc.

    You would then use the loop to filter data in a way I described: for the 10 days prediction you would use the most recent data as the label, but all the data that go into the model are filtered 10 days back in time. 

    Cheers,
    Balázs

  • minesmines Member Posts: 12 Learner I
    But need to use svm algorithm, i can use both to the prediction?
    Best regards,
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Yes, SVM works well with a large number of attributes.

    I described the preprocessing necessary for creating the data structure that you use for modeling and validation. The modeling algorithm is your choice.

    Regards,
    Balázs
  • minesmines Member Posts: 12 Learner I
    I build a model and use the optimize grid and again apply a model and my dataset have 136 rows and in final output lost various data. But I don't understand why, can you help me @BalazsBarany?
    Best regards,
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    you can set breakpoints (after or before execution) on operators to see what goes into them and what comes out of them. That way you can easily see where you lose data.

    Regards,
    Balázs
  • timothy_rijtimothy_rij Member Posts: 3 Contributor I
    @mines, did you end up getting this to work? I am trying to do something similar but there are no tutorial videos on using loops or setting up a similar process. 
Sign In or Register to comment.