The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Algortihms are "cheating" and copying right label from other instances
Hi everyone,
I have a problem with my model. It should predict a monthly product volume from some given attributes.
My (training)data consists of data from ~ 60 past month. Each instance in the dataset represents one day. Two given attributes are the "month" and the "year". The label is the product volume at the end of the month. So in my case every instance of a specific month (~ 30 days/month --> ~ 30 instances) has the same label. Now when I train the algorithm (via Cross Validation / Deep Learning) and look at the performance measure (relative_error) it seems like the algorithm looks at the attributes "month" and "year" and adopts the label value from another row with the same month and year as his prediction for this instance.
I hope you can follow my description. If there is something you don't understand feel free to ask.
I would be very thankfull if someone can tell me if my guess on this is right and how I can avoid this mistake.
Now I am trying to avoid this by just having the month as an attribute, not month+year.
Thanks for your replies,
Sebastian
I have a problem with my model. It should predict a monthly product volume from some given attributes.
My (training)data consists of data from ~ 60 past month. Each instance in the dataset represents one day. Two given attributes are the "month" and the "year". The label is the product volume at the end of the month. So in my case every instance of a specific month (~ 30 days/month --> ~ 30 instances) has the same label. Now when I train the algorithm (via Cross Validation / Deep Learning) and look at the performance measure (relative_error) it seems like the algorithm looks at the attributes "month" and "year" and adopts the label value from another row with the same month and year as his prediction for this instance.
I hope you can follow my description. If there is something you don't understand feel free to ask.
I would be very thankfull if someone can tell me if my guess on this is right and how I can avoid this mistake.
Now I am trying to avoid this by just having the month as an attribute, not month+year.
Thanks for your replies,
Sebastian
0
Answers
Dortmund, Germany
thank you very much for your answer. I guess this validation method could help me a lot in estimating the performance in my current model!
However I think I have to create a new process with a modified dataset (without year and month as an attribut --> maybe only month) to have a valid solution for my problem.
Regards,
Sebastian
Dortmund, Germany
I tried to apply "Sliding Window Validation" on my model but it seems like this type of validation is only applicable for time series data.
I know that my data is "some kind of" time series data, but I am trying to solve the problem by using a Regression with Neural Networks (Deep Learning) .
So I can not use Sliding Window Validation, right?
I tried to apply time series models (ARIMA) on my data (period=day, periode=month) but the result was very bad (quess I have not enogh historic data, just 60 month).
Regards,
Sebastian