The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Text Mining - Name Collision with special and regular attributes
text_miner
Member Posts: 11 Contributor II
Hi,
Since RapidMiner requires all attribute names to be unique, I've noticed a potential naming conflict when doing text mining. If a special attribute with name X exists, then a regular attribute with the same name cannot also exist (or the regular attribute gets removed when the special attribute is created). For example, the special attributes "id" and "label" are relatively common terms that may also appear in text documents.
Is there anyway to specify a prefix/postfix for all special attributes (e.g., metadata_ or specattr_) so name collisions are less likely to occur? If not, could something be added to the configuration options or on the root Process node to allow for this functionality?
Thanks!
Since RapidMiner requires all attribute names to be unique, I've noticed a potential naming conflict when doing text mining. If a special attribute with name X exists, then a regular attribute with the same name cannot also exist (or the regular attribute gets removed when the special attribute is created). For example, the special attributes "id" and "label" are relatively common terms that may also appear in text documents.
Is there anyway to specify a prefix/postfix for all special attributes (e.g., metadata_ or specattr_) so name collisions are less likely to occur? If not, could something be added to the configuration options or on the root Process node to allow for this functionality?
Thanks!
0
Answers
the Document processing operators will take care that no attribute name is used twice. If words like label or id occur, they will be assigned attributes names label_0 (or label_1 if label_0 already exists). This is remembered in the word list so that the attribtues are named equally during application.
Greetings,
Sebastian