Tell k-NN (and possibly other models) to ignore training data dated past the Unlabeled record's time

The01GeekThe01Geek Member Posts: 2 Newbie
I have a large database of news records and their published timestamp. I'm currently experimenting with using k-NN to classify the company's stock behavior by comparing the news to similar cases that have occurred in the past. Naturally, I don't want the model to use any news that has been published AFTER the news-in-question as that would not be a realistic approach. 

I'm wondering if there's a way to implement this in RM? Currently, I filter the data into "News before 2021-05-03" and "News published on 2021-05-03" and feed the two streams to the training and unlabeled streams respectively. 



As you can imagine, this is not a very efficient solution as it only gives me the performance results for one day. To get the performance results of 7 days, I'd have to adjust both filters 7 times, run the process and manually record the accuracy outcome. 

I feel like there has got to be a better way to do this?


Thanks
Tagged:

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 803 Unicorn
    Hi,

    your process is looking right. You are cleanly filtering training and validation data.

    Familiarize yourself with loops and macros in RapidMiner. https://academy.rapidminer.com/catalog?query=loop

    A loop on the 7 days you'd like to process will make your process do what it should.

    Regards,
    Balázs
    The01Geek
  • The01GeekThe01Geek Member Posts: 2 Newbie
    Thanks BalazsBarany. 
    I recently found the Sliding Window Validation operator.

    Do you think this operator is going to address what I need, or should I create a custom loop?
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 803 Unicorn
    Hi @The01Geek,

    if you just want to validate your prediction process, Sliding Window Validation is the way to go.

    If you need a reusable process for future predictions, you'll have to build it manually.

    Regards,
    Balázs
Sign In or Register to comment.