Just starting with RapidMiner - Some Basic Design Questions

nulspace · September 2011

Hi there,

I've just begun using RapidMiner for a course in university - I'm certainly rusty with this type of software, so I'm facing a steep learning curve!

I understand the idea of the program, I've created a repository and have read in a .csv file, and now I'm trying to glean some information from my data.

Here is a breakdown of my .csv file:

10 columns of generic real attributes which I've called "att1" to "att10". Their roles are "regular" attributes.
2 columns for 3-class and 4-class labellings of the data, which I called label1 and label2. I also chose their role in the wizard to be "label"

Question 1
When I create the dataSet in the repository and double click it, bringing up the Meta Data view of my "exampleSet", it shows me the 10 attribute columns and only one label column (the 4-class one). Why is that? Will a process only look at one column of labels in a dataset?

Question 2
This seems like it should be a simple question to answer, but I'm absolutely stuck: how would I go about calculating the mean and standard deviation of each real attribute? The mean I can actually see when I look at the data set metadata, but I'm stumped on how to find or display the standard deviation for each attribute. Any help on this would be greatly appreciated.

Lastly, what are the best resources for learning these basic skills?

Thanks,

nul

nulspace · September 2011

Arrrgh! Well, I've solved one of my problems. Quite a silly error on my part.

I've noticed (now) that in the meta data, it displays the STDev beside the avg. :-[ Oops.

However, I still don't understand why only 1 of 2 label columns is displayed. Can anybody help me with that?

Thanks very much,

nul

fritmore · September 2011

nulspace wrote:

Arrrgh! Well, I've solved one of my problems. Quite a silly error on my part.

I've noticed (now) that in the meta data, it displays the STDev beside the avg. :-[ Oops.

However, I still don't understand why only 1 of 2 label columns is displayed. Can anybody help me with that?

Thanks very much,

nul

You can have only one Label
If you doing some clustering or classification then use Cluster type.....

hope it helps

nulspace · September 2011

Thank you, that makes sense.

By classifying those two data groups as "Clusters", can I still perform the same operators on them as if they were "labels" (Naive Bayes, for example)?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Just starting with RapidMiner - Some Basic Design Questions

Answers