The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Tell k-NN (and possibly other models) to ignore training data dated past the Unlabeled record's time
I have a large database of news records and their published timestamp. I'm currently experimenting with using k-NN to classify the company's stock behavior by comparing the news to similar cases that have occurred in the past. Naturally, I don't want the model to use any news that has been published AFTER the news-in-question as that would not be a realistic approach.
I'm wondering if there's a way to implement this in RM? Currently, I filter the data into "News before 2021-05-03" and "News published on 2021-05-03" and feed the two streams to the training and unlabeled streams respectively.
As you can imagine, this is not a very efficient solution as it only gives me the performance results for one day. To get the performance results of 7 days, I'd have to adjust both filters 7 times, run the process and manually record the accuracy outcome.
I feel like there has got to be a better way to do this?
Thanks
I'm wondering if there's a way to implement this in RM? Currently, I filter the data into "News before 2021-05-03" and "News published on 2021-05-03" and feed the two streams to the training and unlabeled streams respectively.
As you can imagine, this is not a very efficient solution as it only gives me the performance results for one day. To get the performance results of 7 days, I'd have to adjust both filters 7 times, run the process and manually record the accuracy outcome.
I feel like there has got to be a better way to do this?
Thanks
0
Answers
your process is looking right. You are cleanly filtering training and validation data.
Familiarize yourself with loops and macros in RapidMiner. https://academy.rapidminer.com/catalog?query=loop
A loop on the 7 days you'd like to process will make your process do what it should.
Regards,
Balázs
I recently found the Sliding Window Validation operator.
Do you think this operator is going to address what I need, or should I create a custom loop?
if you just want to validate your prediction process, Sliding Window Validation is the way to go.
If you need a reusable process for future predictions, you'll have to build it manually.
Regards,
Balázs