The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
DatabaseExampleSource: Metadata
Hi there,
I'm quite new to RapidMiner, but have some experience with SAS Enterprise Miner. I already set up a Decision Tree learning process and got some exciting results (well at least judging how they look ).
But what I'm missing now is the following:
In SAS Enterprise Miner you can tweak the variable roles on every node - like setting it to "ignore", "target" or whatever. I use a DatabaseExampleSource and when using the Database wizard I can set up the roles once - but after that I don't manage to get back to change those roles without doing the whole wizard again. What's the right approach to change the roles of variables from a DatabaseExampleSource? And is there a node type which can be used to do this?
Thanks again in advance and for this great piece of software.
I'm quite new to RapidMiner, but have some experience with SAS Enterprise Miner. I already set up a Decision Tree learning process and got some exciting results (well at least judging how they look ).
But what I'm missing now is the following:
In SAS Enterprise Miner you can tweak the variable roles on every node - like setting it to "ignore", "target" or whatever. I use a DatabaseExampleSource and when using the Database wizard I can set up the roles once - but after that I don't manage to get back to change those roles without doing the whole wizard again. What's the right approach to change the roles of variables from a DatabaseExampleSource? And is there a node type which can be used to do this?
Thanks again in advance and for this great piece of software.
0
Answers
thanks for your kind words. Sure (is there anything in SAS not possible with RM :P )
The node ("operator" in RapidMiner terminology) you are searching for is called "ChangeAttributeRole" - you can search for this in the Add Operator dialog or in the search field at the bottom of the New Operator tab. With this operator you can change regular attributes (used as input variables for the mining method) to labels, ids, weights etc. and back again.
Cheers,
Ingo
thank you very much for your fast and helpful answer.
I wondered if it is possible to access and change the metadata that is set in the DatabaseExampleSource. The problem I have with the ChangeAttributeRole is that it only changes one variable at a time.
If I have an input set of 400 or more variables and for a certain run I want to select those which should be taken into account, it's not really possible to apply 200 of those operators. How would you suggest to deal with that situation?
In SAS EM (sorry, that I come back on that, but it's my reference because I know it) you simply change the metadata of the InputSource node. In the DatabaseExampleSource there is a similar table when you run the wizard - but I didn't find a way to access this table again to change it. I don't know - tell me!
Thanks again for your fast answer, especially at this time
Jörg
you are right, 200 operators of the same kind are not an option. Fortunately, RM offers multiple options avoiding that. But that depends on the attributes you want to deselect for analysis. Are these attributes simply chosen by random or do they exhibit a common characteristic (e.g. a common name stem or a common attribute type)? One solution would be to use an [tt]AttributeFilter[/tt] and simply filter them out by setting the condition appropriately.
Regards,
Tobias
thanks for your answer.
My approach is the following: I have a set of many variables and I would like to use only some of them when e.g. learning a decision tree for a certain target. Instead of changing the input set every time it would be more convenient to see a list of variables in which you can simply adjust the role, e.g. set to "ignore" for not using this variable or set it to "label".
Rapidminer already has such a similar thing in the DatabaseExampleSource Wizard, but I don't manage to access it without processing all the wizard which leads to using all the settings made before (and is clumsy as well). So is there a possibility to get to this table where all variables are shown and their role can be adjusted without having to redo the wizard? Especially in an initial state it's a lot of back and forth until meaningful and valid variables are selected appropriately. And then you might want to exermine different aspect in one dataset. So there should be some easy means to adjust the variable selection. What is your suggestion for that? (I hope I could explain it well enough )
I looked at the AttributeFilter, which is useful, but renaming the variables is not an option (then I would have to know up-front which variables are valid and which are not, but that's determined exploratively by running the mining process). I could attach a certain role "ignore" to the variables I don't want to include and filter them with an AttributeFilter - but then again it only offers built-in attribute roles. Can it be expanded to user defined roles?
Sorry for bothering you continously with this topic, but I'm seriously considering to use RapidMiner and I know some people who are interested in the results, so I somehow have to get away with the obstacles
Thanks and have a nice day,
Jörg
ok, let's see if I got you right:
You want to exploratively select attributes and learn a decision tree, meaning you repeatedly load the data, select some attributes which you think are relevant (or which you think should be included in the tree), then learn the tree and finally investigate whether the attribtues seem relevant by looking at the tree. Is that correct?
Well you can achieve something like that by using an [tt]InteractiveAttributeWeighting[/tt] operator, setting the weights of the attribute you wish to deselect to zero and apply the weights to the data afterwards by using the [tt]AttributeWeightsApplier[/tt] operator. You could also put these operators (and the learner) inside a [tt]IteratingOperatorChain[/tt] which prevents the data from being read multiple times. You have to set a breakpoint after the learner then. However, with more than 400 attributes this is of course still tedious work.
Have you thought about inspecting the results of an attribute weighting scheme like [tt]InfoGainWeighting[/tt] which gives you a hint of how important attributes are for the learning problem? Or letting RM identify an optimal feature set automatically through its feature selection operators? That might save you a lot of work ...
Additionally, it should be mentioned, that a decision tree algorithm also selects the more important attributes and does not bother the less important ones if you allow the algorithm to prune the tree.
Regards,
Tobias
Regards,
Tobias
The InteractiveAttributeWeighting operator is close to what I looked for - thanks for the hint. Does it keep the settings and apply them automatically each time I load and run the process? It probably always displays the dialogue?
The automatic feature selection sounds very exciting. However I did not yet manage to get this right, it's always complaining about some PerformanceVector missing and when I put this in, he needs again something else... but I will try the InfoGainWeighting as well Thanks for all those valuable hints.
And finally, my first question: is there a possibility to adjust the attribute settings in the DatabaseExampleSource like one can do when executing the wizard?
Thanks again and a lot for your help.
Cheers,
Jörg
Hope that helps,
Tobias