The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Estimate Experiment Time Feature, Amazon EC2
I am interested in a feature that would allow the user to estimate the time required to complete the experiment before launching the experiment.
Many of the experiments I conduct use the 'Optimize Selection (Evolutionary)' operator with a variable number of generations. Adding the above feature would allow me to reduce the maximum number of generations in order to conduct an initial test of an idea and only add additional generations if the test is successful.
I am also working on developing an Amazon EC2 instance that is configured with Ubuntu, RapidMiner, R, Amazon AWS Tools that I would provide free to the RapidMiner community. This would allow those of us with data mining problems that are easy to spread across multiple instances a quick way to conduct larger scale experiments. Having the functionality to estimate the length of time that an experiment would take would allow the user determine how many instances need to be launched in order to complete th processing in the desired amount of time.
I would be happy to provide additional details and/or help test the functionality I described above.
Regards,
Eric
Many of the experiments I conduct use the 'Optimize Selection (Evolutionary)' operator with a variable number of generations. Adding the above feature would allow me to reduce the maximum number of generations in order to conduct an initial test of an idea and only add additional generations if the test is successful.
I am also working on developing an Amazon EC2 instance that is configured with Ubuntu, RapidMiner, R, Amazon AWS Tools that I would provide free to the RapidMiner community. This would allow those of us with data mining problems that are easy to spread across multiple instances a quick way to conduct larger scale experiments. Having the functionality to estimate the length of time that an experiment would take would allow the user determine how many instances need to be launched in order to complete th processing in the desired amount of time.
I would be happy to provide additional details and/or help test the functionality I described above.
Regards,
Eric
0
Answers
I like the idea ... my first idea was to build a database consisting of
predictors: number of attributes (total and according to type), average size of attributes (i.e. number of nominal values), number of examples (and many more...)
response: execution time for a certain pair of (operator, operator-parameter-settings)
Problem: Since the computer architectures can vary, rm is required to create a database for each user (although one could try to build a prior using core/cpu-power + ram ... ).
So as a result, the longer a user uses rm, the better get the predictions. On the other side, at this stage the user maybe already knows which execution time to expect .
just my 2 cents,
steffen
Wrong board. This doesn't go to "Feature Requests", but rather to "Research Proposals". :-)
Seriously, that's one of the things we are working on within the e-LICO project: www.e-lico.eu. I think that's a very interesting thing. Partially, operators are annotated, e.g., with their running time as a function of number of examples, attributes, etc. But than it's a matter of finding out the coefficients, etc. We're open to your ideas here.
Best,
Simon