The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Multivariate stepwise robust linear regression learner
michaelhecht
Member Posts: 89 Maven
Since M5P which is in the Weka part of RapidMiner, doesn't perform acceptable (in my opinion)
I would really like to see a Learner for a piecewise multivariate robust linear regression comparable
to the SRT-approach one proposed by HUANG and TOWNSHEND in
http://www.landcover.org/pdf/ijrs24_p75.pdf
In my opinion this would be a quite optimal way to approximate arbitrary numerical measured data.
Although HUANG and TOWNSHEND didn't apply robust regression this should be definitely
implemented to avoid a too strong influence of outliers. Nevertheless the proposed SRT approach
has the big advantage of producing continuous functions in contrast to M5P.
If not possible, I would also appreciate a multivariate spline approximation of numerical data.
This should definitely be available
I would really like to see a Learner for a piecewise multivariate robust linear regression comparable
to the SRT-approach one proposed by HUANG and TOWNSHEND in
http://www.landcover.org/pdf/ijrs24_p75.pdf
In my opinion this would be a quite optimal way to approximate arbitrary numerical measured data.
Although HUANG and TOWNSHEND didn't apply robust regression this should be definitely
implemented to avoid a too strong influence of outliers. Nevertheless the proposed SRT approach
has the big advantage of producing continuous functions in contrast to M5P.
If not possible, I would also appreciate a multivariate spline approximation of numerical data.
This should definitely be available
0
Answers
It seem, that something comparable (even if different approach) seems to be the MARS, i.e
Multivariate Adaptive Regression Splines. I don't think, that such a technique is currently part
of RapidMiner.
This or the already mentioned SRT model should be part ofRapidMiner.
Curious if the RM development team has evaluated MARS and what they think of it as a possible future addition to RM?
Keith
I also found the R implementation called "earth" and I really would be happy if more of data mining and statistics
related functions of programmes like R (or matlab or scilab) would be included in RM to avoid frequent switching between both programs. Usually one has to do a certain amout of statistics prior to data mining application.
Nevertheless, I hope that a MARS like implementation is part of the next RM.
we are aware that MARS exists and I have here some todo lists, dating from two years ago, where MARS is one point left...I realy like this algorithm, because its linear in number of examples and number of attributes and seems to perform promising.
We have some really great improvments for RM 6, which make the work much easier, but still need a huge amount of developer time. So I can't promise, if we get this feature into the first version. But I'm glad to readd it to my current todo list
Greetings,
Sebastian
I'm really happy that you waited for exactly this request
Furthermore, even if I don't want to look greedy I think that a LOESS
algorithm for locally weighted polynomial regression is the other
important approach that is missed in RM. Since there are algorithms
customizing kd-trees for increasing the speed up it is stronlgy
related to data mining. Even if this approach is lazy I think it
should be avalable since ther are almost no demands on the
structure of the fitted data. As for all algorithms I would prefer a
robust approach ::)
I will note LOESS, too, but I doubt, that we will get everything in the next major release. There's a lot of work to accomplish until then..
We evaluated the usage of KD-Trees and Ball-Trees a year ago and didn't find it worth the effort. Especially in high dimensions, they are loosing every performance advantage against a linear search. Perhabs its because of my implementation, but I don't think so...rather I suspect, it's another variant of the curse of the dimensionality...
Greetings,
Sebastian
if it isn't in the next release I can live with this - time doesn't matter ...
By the way, is there any planning to modify the polynomial regression
to a robust regression? Since we have a lot of "real" industrial data
with a certain amount of outliers all methods, but at least the regression,
should be outlier resistant.
I read a lot of data mining literature but it seems that there is no approach
to robust methods (except regression). So main focus is "only" on data preparation.
But how to decide which data points are outliers and which not?
There should be a method as follows:
1. Mark the outliers in a data set automatically e.g. with a robust LOESS/MARS method
2. Apply a "classical" data mining method with decreased weighting on the outliers
This is my request for RM 6 ;D