Cleanup of metadata in the process
Hi guys,
I'll try to be short and clear: after "dummyfying" a nominal attribute, I see that the system is generating columns even for the values that has been previously filtered: the image below is just to show that before I'm filtering out all the country except Italy, Germany and Spain, and then I'm applying the "Nominal to Numberical" operator to dummyfy the Country column.
the whole process
In the result, I'm still able to see the dummy columns generated for the coultry that are not in the data anymore.
Result
Now: from my understanding, the system is keeping some kind of "metadata" (like all the possible values of Country, in this example) of the initial state of the data. Same this is happening when I'm opening the file statistics, so is clear that this is a desired (and helpful) behaviour.
statistics
HERE THE QUESTION: is possible to cleanup those information during the execution of the process (so that after the dummy operation, I would have just 3 columns?)
A solution would be to write somewhere the data and load it back, but I was hoping for something more elegant.. Any suggestion?
Thank you in advance!
Matteo
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
just use Remove Useless Values first.
Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany0