The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"RM Performance Optimization"
lubomir_karlik
Member Posts: 4 Contributor I
Hi!
I have integrated rapid miner into my application for event predictions. The rapid miner is a clear performance bottleneck. Do you know whether it is possible to optimize it?
Situation:
I have 1 000 prediction model (neural networks) i.e. 1000 RM scripts. Every script expects as an input training sample set that is build in iterations. Let’s say 500 samples is required. Every sample is gathered by a RM script. The RM script connects to a DB, makes a select and then some data transformation (simple ones). Samples are merged afterwards. Java profiler shows that 99.9 percent of time is spend on running the RM script. To get 500 samples takes about 10-15 minutes.
Rapid miner is initialized through RapidMiner.init(false, true, true, true);
It is not possible to reduce the number of the RM scripts e. g. run one script to get all the data. I am interested whether the RM script is not always creating a new DB connection or the pooling is supported. Might be there another pitfall?
Thank you for your response in advance!
Lubomir Karlik
I have integrated rapid miner into my application for event predictions. The rapid miner is a clear performance bottleneck. Do you know whether it is possible to optimize it?
Situation:
I have 1 000 prediction model (neural networks) i.e. 1000 RM scripts. Every script expects as an input training sample set that is build in iterations. Let’s say 500 samples is required. Every sample is gathered by a RM script. The RM script connects to a DB, makes a select and then some data transformation (simple ones). Samples are merged afterwards. Java profiler shows that 99.9 percent of time is spend on running the RM script. To get 500 samples takes about 10-15 minutes.
Rapid miner is initialized through RapidMiner.init(false, true, true, true);
It is not possible to reduce the number of the RM scripts e. g. run one script to get all the data. I am interested whether the RM script is not always creating a new DB connection or the pooling is supported. Might be there another pitfall?
Thank you for your response in advance!
Lubomir Karlik
Tagged:
0
Answers
without knowing, what you are doing, I cannot say, if there are other pitfalls, beside the fact, that NeuralNetworks aren't very fast in general.
I'm not quite sure if we already pool the database connections, but if not, we will set it on our agenda.
Greetings,
Sebastian
Meanwhile, I have used profiler. I have realized that the low performance is caused mainly by operator Nominal2Date (ca. 30% of time for the script execution). The DB connection seems to be remained open. Execute query taskes ca. 13% and the DB is huge, so this is reasonable.
This reminds me that I had to apply conversion of date to nominal and vice versa because OLAP operators cannot group by non-nominal attrbiutes like date or integer. Am I wrong here?
Best regards,
Lubomir
you are correct: Grouping is only possible by nominal values. Perhaps you could save some time if you won't have to convert the attribute back but instead hold both versions?
Greetings,
Sebastian