The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Just starting with RapidMiner - Some Basic Design Questions
Hi there,
I've just begun using RapidMiner for a course in university - I'm certainly rusty with this type of software, so I'm facing a steep learning curve!
I understand the idea of the program, I've created a repository and have read in a .csv file, and now I'm trying to glean some information from my data.
Here is a breakdown of my .csv file:
10 columns of generic real attributes which I've called "att1" to "att10". Their roles are "regular" attributes.
2 columns for 3-class and 4-class labellings of the data, which I called label1 and label2. I also chose their role in the wizard to be "label"
Question 1
When I create the dataSet in the repository and double click it, bringing up the Meta Data view of my "exampleSet", it shows me the 10 attribute columns and only one label column (the 4-class one). Why is that? Will a process only look at one column of labels in a dataset?
Question 2
This seems like it should be a simple question to answer, but I'm absolutely stuck: how would I go about calculating the mean and standard deviation of each real attribute? The mean I can actually see when I look at the data set metadata, but I'm stumped on how to find or display the standard deviation for each attribute. Any help on this would be greatly appreciated.
Lastly, what are the best resources for learning these basic skills?
Thanks,
nul
I've just begun using RapidMiner for a course in university - I'm certainly rusty with this type of software, so I'm facing a steep learning curve!
I understand the idea of the program, I've created a repository and have read in a .csv file, and now I'm trying to glean some information from my data.
Here is a breakdown of my .csv file:
10 columns of generic real attributes which I've called "att1" to "att10". Their roles are "regular" attributes.
2 columns for 3-class and 4-class labellings of the data, which I called label1 and label2. I also chose their role in the wizard to be "label"
Question 1
When I create the dataSet in the repository and double click it, bringing up the Meta Data view of my "exampleSet", it shows me the 10 attribute columns and only one label column (the 4-class one). Why is that? Will a process only look at one column of labels in a dataset?
Question 2
This seems like it should be a simple question to answer, but I'm absolutely stuck: how would I go about calculating the mean and standard deviation of each real attribute? The mean I can actually see when I look at the data set metadata, but I'm stumped on how to find or display the standard deviation for each attribute. Any help on this would be greatly appreciated.
Lastly, what are the best resources for learning these basic skills?
Thanks,
nul
0
Answers
I've noticed (now) that in the meta data, it displays the STDev beside the avg. :-[ Oops.
However, I still don't understand why only 1 of 2 label columns is displayed. Can anybody help me with that?
Thanks very much,
nul
You can have only one Label
If you doing some clustering or classification then use Cluster type.....
hope it helps
By classifying those two data groups as "Clusters", can I still perform the same operators on them as if they were "labels" (Naive Bayes, for example)?