The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
ReadCSV and K-Means Operator
I don't know if I should file this as a bug... It isn't. It would fall under the category of "things I miss about RM 4.6".
1) Centroid Plot (output from Kmeans or Kmedoids)
This is a parallel coordinate plot for the centroids. It was very useful to summarize the clusters found. But before it would color them using the cluster attribute. It was very easy to identify the differences between the different clusters especially when k is large. Now the color is gone from the graph.
2) Read CSV
I work mainly with CSV files. Before it was possible to declare which columns were ID or LABEL. Now we have to add a "Set Role" operator.
3) Attribute Editor
Thanks in advance for any answers and happy holidays to all of you,
Ernesto
1) Centroid Plot (output from Kmeans or Kmedoids)
This is a parallel coordinate plot for the centroids. It was very useful to summarize the clusters found. But before it would color them using the cluster attribute. It was very easy to identify the differences between the different clusters especially when k is large. Now the color is gone from the graph.
2) Read CSV
I work mainly with CSV files. Before it was possible to declare which columns were ID or LABEL. Now we have to add a "Set Role" operator.
3) Attribute Editor
Thanks in advance for any answers and happy holidays to all of you,
Ernesto
0
Answers
you didn't say "Things I miss about RM 4.6"? If you really mean this, pleeeease pleeease update to 5.1. Many things you miss have been added there long time ago...
Greetings,
Sebastian
My two other comments, however, still apply to version 5.1.
1) Centroid Plot in K-means and K-medoids is still without colors.
It is the absense of colors for the different clusters that I'm commenting on. This was a feature that I had seen only in Rapid Miner and which I found really useful. If you have many clusters, the colors help you see what's different about the different clusters.
2) Meta-data editor.
I called it Attribute Editor but what it is is a Meta-data editor. Now that Repositories are so important, it would be nice to be able to edit the meta data of a dataset. I'm thinking for instance about changing the roles played by the different variables without using the operators (Exchage Roles or Set Roles).
I read that you guys are thinking about creating a new type of role, "Ignore". I think that will be very useful. With a Meta-data editor you could change which variables to ignore or include. Again I understand that you can do that presently using Operator, but it'd be nice to be able to do it before you start any process.
Regards,
E.
what colors do you want in the plot? Should each line of each cluster be colored in a unique color? Or should the color change over the line, depending on the values in each dimension?
To your second question: You can ignore columns during import wizard, they simply won't be imported. Regarding the MetaData Editor: We are currently thinking of making it an enterprise feature.
Greetings,
Sebastian
we just had a short discussion since I also missed the different colors for the clusters in the parallel plot. There was a bug introduced in one of the last versions which we have fixed now. The new version will be available tomorrow on the SourceForge mirror or with the next update delivered via the update server. Thanks for pointing this out (and bringing it back to my mind again).
Cheers,
Ingo