The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Handling multiple nominal values in one category
Hello,
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?
Thank You for Your help
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?
Thank You for Your help
Tagged:
0
Answers
sorry, but I don't understand your question. Could you give an example for that? What do you understand under category?
Greetings,
 Sebastian
Let's say i have some files described by some atrributes, like "name" "category" "location" and "tags".
I want to know if i can somehow handle this last attribute- "tags" to take more than one nominal value.
For instance:
name - article1, category- sport, location- New York, tags- knicks, basketball, celtics
Is it clear enough now? Im a begginer in data mining and may not express myself clearly.
you have several options and which one is the best totally depends on what you are planning to do with the data:
- In general, you could use the operators "Split" and "Merge" to handle those multiple nominal values for one attribute,
- Sometimes is might be better to handle this attribute with value type "text" and use the text processing operators, e.g. in order to determine how often certain tags are used
- In some cases, you might simply want to keep the tag collection as it is (maybe sort it) in order to calculate similarities etc. (although even in that case I would probably go for a text processing approach)
- ...
Which one is the best option depends, but in general you can handle this setting with "Split" and "Merge" and define a separating character like '#' or something else which does not occur in your tags.Hope that helps at least a bit. Cheers,
Ingo
well, what's the difference between a classification scheme which is able to handle this itself and preprocessing the data so that all classification schemes can handle it? Right, with the latter - the more modular option - you have much more option to choose from. So I would always go for a well-thought preprocessing combined with a powerful and already existing classification method.
Cheers,
Ingo