The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How can I remove heteroskedasticity from a multiple regression in context of forecasting ?
florianherrmann
Member Posts: 3 Learner II
Hey guys, this is Florian writing,
I`m currently facing an issue regarding a multiple regression where I´m pretty much stuck. The context of modelling is a multivariate forecast.
Long story short:
I have done a residual analysis for the multiple regression, as the the squarred correlation and forecast results itself indicated a poor job of the prediction. The inference of the residual analysis has been heteroskedasticity and not randomly distributed resuduals.
After some research I have figured out that the systematic lack of fit and heteroskedasticity can be solved by transforming variables (e.g. box cox transformation). Unfortunteatly RapidMiner doesn´t provide the box cox transformation. As a result I´m stuck with my research and in need for some expert knowledge.
Is there any other way to solve heteroskedasticity and system lack of fit within RapidMiner without completly restructuring my modell?
Appreciate your help guys!
I`m currently facing an issue regarding a multiple regression where I´m pretty much stuck. The context of modelling is a multivariate forecast.
Long story short:
I have done a residual analysis for the multiple regression, as the the squarred correlation and forecast results itself indicated a poor job of the prediction. The inference of the residual analysis has been heteroskedasticity and not randomly distributed resuduals.
After some research I have figured out that the systematic lack of fit and heteroskedasticity can be solved by transforming variables (e.g. box cox transformation). Unfortunteatly RapidMiner doesn´t provide the box cox transformation. As a result I´m stuck with my research and in need for some expert knowledge.
Is there any other way to solve heteroskedasticity and system lack of fit within RapidMiner without completly restructuring my modell?
Appreciate your help guys!
Tagged:
0
Answers
Did you check if this phenomenon is caused by outliers? If you have outliers then they have to be taken care of first as they will make residuals look like this.
If you have no outliers then I think one way to implement it by using execute python operator in rapidminer and then applying power transformation in scikit learn. I don't think RM has box cox yet.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Unfortunately the Box-Cox transformation is not (yet) added to RapidMiner. We have it on the roadmap nevertheless.
For now, I just have two ideas:
- You can include the box-box transformation from python (or R) by using the Python (R) extension, which allows to integrate python scripts into your workflow
- You can also try to smooth your data beforehand, this may help as well. For example by either using the Exponential Smoothing or the Moving Average Filter (I would recommend the binom filter here)
Hopes this helps and best wishes with your research
Fabian
@tftemme what I have figured out so far, is that my time series prediction is always lagging by +1 step in time series in comparison to the label. As there are two pretty high peaks, due to seasonal patterns, this might be the reason for the heteroskedascitiy.
For the predicition I use either some external attributes and a lagged value (-1) of the label attribute itself. Have you got an idea what might be a solution to remove the lagging prediction?
Appreciate your help
Florian
Keep in mind that it is possible that there is no pattern in your data to predict the future, so just using the last value is maybe the best guess for the prediction and you may not get a better prediction.
That is what my research is actually about. Trying to figure out if a multivariate Forecast based on linear regression is as least as good as a univariate time series prediction ( in my case the Funcional and Seasonal Component Forecast).
I have now elaborated your advices in the model.
Highly appreciated your help:)
Kind regards
Florian