ReadCSV and K-Means Operator

earmijo · December 2010

I don't know if I should file this as a bug... It isn't. It would fall under the category of "things I miss about RM 4.6".

1) Centroid Plot (output from Kmeans or Kmedoids)

This is a parallel coordinate plot for the centroids. It was very useful to summarize the clusters found. But before it would color them using the cluster attribute. It was very easy to identify the differences between the different clusters especially when k is large. Now the color is gone from the graph.

2) Read CSV

I work mainly with CSV files. Before it was possible to declare which columns were ID or LABEL. Now we have to add a "Set Role" operator.

3) Attribute Editor

Thanks in advance for any answers and happy holidays to all of you,

Ernesto

land · December 2010

Hi,
you didn't say "Things I miss about RM 4.6"? If you really mean this, pleeeease pleeease update to 5.1. Many things you miss have been added there long time ago...

Greetings,
Sebastian

earmijo · December 2010

Thanks Sebastian. I upgraded to 5.1. I see what you mean about the new wizard to read data in format CSV, XLS, others. It is very nice.

My two other comments, however, still apply to version 5.1.

1) Centroid Plot in K-means and K-medoids is still without colors.

It is the absense of colors for the different clusters that I'm commenting on. This was a feature that I had seen only in Rapid Miner and which I found really useful. If you have many clusters, the colors help you see what's different about the different clusters.

2) Meta-data editor.

I called it Attribute Editor but what it is is a Meta-data editor. Now that Repositories are so important, it would be nice to be able to edit the meta data of a dataset. I'm thinking for instance about changing the roles played by the different variables without using the operators (Exchage Roles or Set Roles).

I read that you guys are thinking about creating a new type of role, "Ignore". I think that will be very useful. With a Meta-data editor you could change which variables to ignore or include. Again I understand that you can do that presently using Operator, but it'd be nice to be able to do it before you start any process.

Regards,

E.

land · January 2011

Hi,
what colors do you want in the plot? Should each line of each cluster be colored in a unique color? Or should the color change over the line, depending on the values in each dimension?

To your second question: You can ignore columns during import wizard, they simply won't be imported. Regarding the MetaData Editor: We are currently thinking of making it an enterprise feature.

Greetings,
Sebastian

IngoRM · January 2011

Hi,

we just had a short discussion since I also missed the different colors for the clusters in the parallel plot. There was a bug introduced in one of the last versions which we have fixed now. The new version will be available tomorrow on the SourceForge mirror or with the next update delivered via the update server. Thanks for pointing this out (and bringing it back to my mind again).

Cheers,
Ingo

earmijo · January 2011

Fantastic. I'll look forward to the new version. Thanks a lot.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

ReadCSV and K-Means Operator

Answers