The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Help on Optimizing Attribute weights for k-NN modeling
I'm having trouble doing something that seems pretty basic:
I have a set of around 50000 examples with about 20 features.
I'd like to find good weights for those features that optimize k-NN predictions.
Intuitively (and this may be wrong), the most natural way to use attribute weights to improve a k-NN prediction model is to scale the (normalized) data by the weight of the attribute. This way two points are considered further apart in feature space if they differ on a more important attribute than if they differ on a less important one.
Problem: I don't see how to do this in Rapid Miner because "Optimize Weights" processes do not provide direct access to the weight vector being tested, and without direct access to the weight vector being tested, I don't see how to use those weights to affect the clustering performance in the sub-process.
Unfortunately, since my data uses numeric labels, there are not many weight-generation schemes available for the purpose of making some dimensions more important than others in my predictive clustering efforts.
Any help is appreciated!
I have a set of around 50000 examples with about 20 features.
I'd like to find good weights for those features that optimize k-NN predictions.
Intuitively (and this may be wrong), the most natural way to use attribute weights to improve a k-NN prediction model is to scale the (normalized) data by the weight of the attribute. This way two points are considered further apart in feature space if they differ on a more important attribute than if they differ on a less important one.
Problem: I don't see how to do this in Rapid Miner because "Optimize Weights" processes do not provide direct access to the weight vector being tested, and without direct access to the weight vector being tested, I don't see how to use those weights to affect the clustering performance in the sub-process.
Unfortunately, since my data uses numeric labels, there are not many weight-generation schemes available for the purpose of making some dimensions more important than others in my predictive clustering efforts.
Any help is appreciated!
0
Answers
the "Optimize Weights" operators perform internal validations using different weight vectors depending on the algorithm they use for optimization. The attributes are scaled / selected / deselected according to the current vector and the modified example set is the piped to the inner learner. The process is repeated several times (again depending on the selected method and parameters) and at the end you will get your example set scaled using those weights that worked best during the internal validation. You may also read those weights from the corresponding output port of the "Optimize Weight" operator. In the help view you can scroll down to get a link to some sample processes showing how the usage of those operators is meant to be. You may also store those weights and use to scale other example sets using "Scale by Weights".
Cheers,
Helge