The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
SOM Reduction
vijaypshah
Member Posts: 30 Maven
in Help
Hi,
WHen we do SOM reduction, do we get reduced feature set (like in PCA, or SVD reduction)? I thought SOM is a unsupervised classification scheme, am I missing something?
The ExampleSet Returned doesnot retain the ID number of the input ExampleSet, is there a way to retain it?
Regards,
Vijay
WHen we do SOM reduction, do we get reduced feature set (like in PCA, or SVD reduction)? I thought SOM is a unsupervised classification scheme, am I missing something?
The ExampleSet Returned doesnot retain the ID number of the input ExampleSet, is there a way to retain it?
Regards,
Vijay
0
Answers
The plotter produces the reduced dimensions together with the map (clustering). The operator SOMDimensionalityReduction, as the name indicates, only produces the transformation and can therefore be used like other dimensionality reductions (PCA etc.). It is also able to produce a preprocessing model so that the transformation can also be applied on test data sets.
Well, I just tried it and the example set still contains the ID. Here is an example: Could you please post your process (XML) so I can check why it might get lost? Please note that the plotters do not show the ID as a column but as tool tip for the points when you move the mouse over plot points.
Cheers,
Ingo
I may have missed to included ID before, not sure what I had done. But it does work now. Sorry for false alarm.
Regards,
Vijay
the net size is only the size in one dimension. If you reduce it to, let's say, two dimensions, then you will get 30 values in each dimension resulting in a total of 900 clusters. If you use two dimensions and want to come up with 7-8 clusters I would suggest a net size of 3. Using more dimensions allows you to keep more of the original value variance. If you want to come up with a single cluster number you could use the AttributeMerge operator like in this example: Not a problem at all. We really appreciate each report of a possible bug. Better a false alarm than not knowing that there is something wrong....
Cheers,
Ingo
I copied the XML code under XML tab and then pressed run button. But I had to save it before I could run so I said save as test.xml...
But it happens that it doesnot create the operator tree. Even the file save is NULL file. Because I quit the application without pressing anywhere else.
If I press the neighboring tab like new_operator, parameter or comment the operater tree is generated. I have tried this now 3-4 times.
Shouldn't the tree be generated when I press the save file?
Regards,
Vijay
thanks for pointing this out. This behaviour is indeed not intended and the operator tree should be generated not only when you change to the parameter or comment tab but also when you run or save the process.
I added this to our todo list. Unfortunately, there is currently plenty of work on our todo list and hence we will not be able to fix this issue in the next few days. Until then, please use the workaround you explained and manifest a new XML process representation by clicking on another tab.
Regards,
Tobias
for me SOM uses sort of a neuronal net to find a good representation of your
high dimensional data.
There exists a paper which suggests using SOM for input-dimension-reduction in a way to exclude
certain parameters by comparing SOM component planes.
de Abajo, N.: ANN quality diagnostic models for packaging manufacturing: an industrial data mining case study
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
"Particularly insightful are the so called component planes rep-
resented in figure 5, that provide us with a big picture of the
input values distribution. Similar maps show an analogous
behavior and, therefore, a redundancy in the information."
So my question is: Can rapidminer provide this functionality too?
And if I push the calculate button in the "SOM chart view"
the chart changes the appearence completely. If that is a correct behaviour
what is the information I can get from that chart?
[attachment deleted by admin]
without having a look into the paper I am assuming that the different attributes are used as "label" for the SOM creation and than a probably matrix based image comparison is calculated to build the attribute groups containing redundant information, right?
So, the answer is quite easy: no. It is currently not possible to calculate a feature "similarity" based on SOM graph visualizations. The basic algorithms are all there (SOM creation, matrix comparisons, clustering) but you would probably have to create an own operator for this (or let us do this for you ). Although I would like to mention that I am not completely convinced that this calculation is too meaningful but that's another question... Yes, this behaviour is desired. Since the SOM calculation is based on random initializations, the result is different for each repeated calculation of the SOM. And as you have probably noticed, the results can sometimes look completely different. And this is exactly the reason why I am not convinced that a SOM based feature similarity calculation is a good idea: the result would depend too much on the initialization. To get around this problem, you would have to repeat the SOM creation with the same initializations several times for all variables which would be hardly feasible for large data sets since SOMs are not exactly what I would call a very fast algorithm. Just my 2 cents.
Cheers,
Ingo