The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Difference between various data types
Hi all,
quite newbie question: in RM5, which is the difference between "numeric" and "real" in defining metadata? Where could I find a quick help on topics like these? the "rapidminer-5.0-manual-english_v1.0.pdf" and "rapidminer-4.6-tutorial.pdf" does not talk about these simple subjects.
Thanks,
Lucian
quite newbie question: in RM5, which is the difference between "numeric" and "real" in defining metadata? Where could I find a quick help on topics like these? the "rapidminer-5.0-manual-english_v1.0.pdf" and "rapidminer-4.6-tutorial.pdf" does not talk about these simple subjects.
Thanks,
Lucian
0
Answers
actually numeric is the supertype of real and integer.
The same is nominal for polynominal, text and binominal.
Greetings,
Sebastian
Lucian
unless you want to take a look into the manual, that would be a perfect idea. Anyway I would think reading the manual until page 12 is more easy...
http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/5.0/rapidminer-5.0-manual-english_v1.0.pdf/download
Greetings,
Sebastian
I have read the page 12 of the manual, and I can't see a difference between both, nominal and polynominal. Because both can handle categorical values. I mean, if you have the variable "color" (red, green and blue), you'll have a categorical variable, and therefore a nominal variable, is it not redundant the "poly" prefix?
What would be the difference, alghorithmically speaking, between them ? (the same for numerical)
Thanks in advance.
Pablo.
you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). But who knows: Maybe there is such a difference later on for a new operator and the used ontology can be seen as a preparation for that. However, in today's practical processses you will be perfectly fine by using one of both options and just make sure that all operators are happy
The same is true for numerical value types although I think that there actually are (or at least: was) some algorithm which really has relied on the fact that the input has to be "real" instead of "numerical"...
Cheers,
Ingo
actually I doubt this because each Polynomial attribute is a nominal attribute.
I think you are trying to compute the distance between "mule" and "donkey". What is the distance? There's only one sane answer: 1. And whats the distance between "mule" and "horse"? Yes, 1. "mule" and "mule" would be zero, if you don't have already guessed...
RapidMiner currently provides only this distance measure between nominal values. So I doubt a process comparing wordlists per row does make any sense at all?
Greetings,
Sebastian
best regards, andre
Hi Ingo,
I indeed noticed an operator where the distinction between nominal and polynominal makes a differnce. I am often building web mining processes where extracted data is incrementally written to a database (appended to a table). The same process is repeated after a few days to collect data that was missed during the first run (timeouts etc.) and recently added contents.
To find only those examples I import the relevant URLs (Read Excel) and load the already collected items from database (Read Database). Both operators are followed by "Set Role" to set IDs. Finally the "Set Minus" operator builds the desired example set. The attribute obtained from database is usually nominal and the one from the Excel file is of type polynominal. Process execetuion is interrupted as the "Set Minus" operator complains about incompatible types and requests an attribute of type polynominal. Since there is no convenient way of changing the attribute obtained from database from nominal to polynominal, I always set nominal instead of polynominal for the "Read Excel" operator. Doesn't mean much trouble for me, but shows a case where there is a difference between both types. I don't know if it is necessary there...
Regards
Matthias
if you want to compare lists containing subsets of each other and you want to count the number of the same entries you can use set operations on the example sets and remove all that are not within both (Intersect) and count the number of examples. You can extract the number of examples also as macro or performance value.
Greetings,
Sebastian