The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Flexible Learner Replacement
Hi there,
I'm fairly new to rapid miner, but had to get into it pretty fast due to my new job.
Currently I'm working on predictive analysis for maintenance issues.
My data set contains some ten thousand examples with about a thousand attributes.
Due to speed issues, I've designed a selective preprocess where I split the data into subsets of different attributes, to do some forward selection analysis combined with cross validation and keep the attributes of each subset which have the biggest impact on the result. Then join the results back together to do a final analysis. This process currently involves the usage of about 20 Learner Operators , i.e. SVM, Naïve Bayes or Decision Tree (Process will be shown in the following post).
Switching from one Learner Method to another is a dull and tiring thing, since I have to replace all of the 20 Operators.
So I thought of some kind of macro like 2-Component System for flexible Learner replacement.
These 2 Components could look like:
The first Component should be is a nested Operator, which contains the Learner to be used. It might also needs an ID/Name as parameter for the purpose of running several of the Container Constructions in one Process.
The second Component is linked to the specified first Component (via the ID/Name). It simply retrieves the defined Learner Operator which all its Parameters.
This would also come in really handy, when you want to do some optimization on the whole process (which is my second point of the idea). Regarding a Learner with 3 parameters and 2 choices for each of the parameters, it would make a difference of 2^3^20 combinations - no container available, so each of the 20 Learners has got its own set of parameters - to 2^3 combinations due to the usage of just one Learner configuration throughout the whole process. This would not only save computation time. It would also save time designing the process for not having to choose and set 60 parameter ranges instead of only 3 in the optimization Operator (not regarding the debugging).
I believe that there is some kind of workaround for the problem of too many combinations using macros, but this would probably make the design phase even more complicated and tiring.
Or maybe there is a cool solution using the XML-Code and a replace-function instead of the GUI .
Thanks
Garlef
I'm fairly new to rapid miner, but had to get into it pretty fast due to my new job.
Currently I'm working on predictive analysis for maintenance issues.
My data set contains some ten thousand examples with about a thousand attributes.
Due to speed issues, I've designed a selective preprocess where I split the data into subsets of different attributes, to do some forward selection analysis combined with cross validation and keep the attributes of each subset which have the biggest impact on the result. Then join the results back together to do a final analysis. This process currently involves the usage of about 20 Learner Operators , i.e. SVM, Naïve Bayes or Decision Tree (Process will be shown in the following post).
Switching from one Learner Method to another is a dull and tiring thing, since I have to replace all of the 20 Operators.
So I thought of some kind of macro like 2-Component System for flexible Learner replacement.
These 2 Components could look like:
The first Component should be is a nested Operator, which contains the Learner to be used. It might also needs an ID/Name as parameter for the purpose of running several of the Container Constructions in one Process.
The second Component is linked to the specified first Component (via the ID/Name). It simply retrieves the defined Learner Operator which all its Parameters.
This would also come in really handy, when you want to do some optimization on the whole process (which is my second point of the idea). Regarding a Learner with 3 parameters and 2 choices for each of the parameters, it would make a difference of 2^3^20 combinations - no container available, so each of the 20 Learners has got its own set of parameters - to 2^3 combinations due to the usage of just one Learner configuration throughout the whole process. This would not only save computation time. It would also save time designing the process for not having to choose and set 60 parameter ranges instead of only 3 in the optimization Operator (not regarding the debugging).
I believe that there is some kind of workaround for the problem of too many combinations using macros, but this would probably make the design phase even more complicated and tiring.
Or maybe there is a cool solution using the XML-Code and a replace-function instead of the GUI .
Thanks
Garlef
0
Answers
I hope this makes sense to you!
Best regards,
Marius
I managed to creat a little example on using the Execute Process operator.
Using different learners is nice and easy now. Thanks for that nice trick.
In the following I will use the terms if outer process: the one that sets the macros, loads the data and calls the Execute Process operator, and inner process: the one which contains the learner operator. The learner mentioned is the k-NN operator
I still got some questions on the variation of parameters in the inner process.
Variying numerical values seems easy unsing macros, but only returns the error mesage
Optimize Parameters (Grid): Cannot evaluate performance for current parameter combination because of an error in one of the inner operators: A value for the parameter 'k' must be specified! Expected integer but found '30.0'.
The parameter 'k' comes from the k-NN operator. I tried using a second macro in the inner process that does nothing but gets the value of the macro from the outer process and parse the number. But I can't call the macro name in the Parse Number operator.
And I don't know how to do an outer modification of parameters in the inner process that are done by setting a tag or drop down lists, i.e. weighted vote or measure types @ the k-NN operator. Putting through numerical or nominal values by using macros has no use + the problem described above. And calling them from an outer optimization operator is not possible at all since it as no idea about the design of the inner process.
Also I can't use the Optimize Grid operator to modify macros that are created with the Macros operator. Looks like Optimize Grid can't look inside menus for creating multiple objects or subsets.
Garlef
you could use an Optimize Parameters in the outer process to simply call different inner processes, and add another Optimize Parameters in the inner process to variate the learner parameters.
Best regards,
Marius
I thought about that, but it might lead to in different optimization parameter settings for each of my attribute subsets. And I'm not sure how applicable the results will be.
On the other hand I could save each intermediate result and load it later in the main process. But thats probably not too good for the computational time.
Anyway, thanks a lot for the really fast and really good help
And I still believe, that my idea is worth implementing, but that's mostly because it took me a couple of hours to think of it.
Best Regards
Garlef
Best regards,
Marius