The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to use RapidMiner in production?
Hi there!
after having trained a classifier model, I want to establish an authomated process that would get production data once a day, run the classifier and store preduction. I would like to use CSV as Import/Export format. But after getting the first batch of production data, I've stumbled upon the warning "The internal nominal mappings are not the same between training and application for attribute XXX", and the model cannot be applied.
I've found this topic somewhat explaining what is happening: http://rapid-i.com/rapidforum/index.php?topic=77.0
But first I don't want to fix the *.aml files manually because the model application process should be automated, and second I haven't found any *aml files in the RapidMiner Studio 6.0.
What is the intended solution to use RapidMiner Studio in a Big Data production system? I cannot use the "Read Database" operator as my ETL logic cannot be expressed in terms of a single SQL query. Does the Server edition have the same problem?
Regards,
Maxim
after having trained a classifier model, I want to establish an authomated process that would get production data once a day, run the classifier and store preduction. I would like to use CSV as Import/Export format. But after getting the first batch of production data, I've stumbled upon the warning "The internal nominal mappings are not the same between training and application for attribute XXX", and the model cannot be applied.
I've found this topic somewhat explaining what is happening: http://rapid-i.com/rapidforum/index.php?topic=77.0
But first I don't want to fix the *.aml files manually because the model application process should be automated, and second I haven't found any *aml files in the RapidMiner Studio 6.0.
What is the intended solution to use RapidMiner Studio in a Big Data production system? I cannot use the "Read Database" operator as my ETL logic cannot be expressed in terms of a single SQL query. Does the Server edition have the same problem?
Regards,
Maxim
0
Answers
Using Python for production is another option, but re-implementing models learned in Rapidminer to Python seems to be a bad idea, because not all algorithms are available, and possibly, some algorithms are implemented differently. So now I'm considering to perform all of my machine learning activites in Python, even though it doesn't have such a comfortable GUI...
What model types have you implemented?
- Is there any XML you can share with us, it's possible that you've made a small error and someone on the forum could help spot it.
How exactly are you deploying the system?
- Are you taking a CSV file into RapidMiner, applying a model and then exporting out scored results? That should be pretty straightforward. Using RapidAnalytics you can do this by setting up a WebService so your system can do the entire process automatically.
- You mention Big Data, so you mean with Hadoop, etc rather than an SQL database? In which case you need the Big Data version of RapidMiner so you're better talking directly with them for advice, particularly as you get paid support with that product. (If you're paying for it, you may as well use it, right? )
Any more information would be appreciated.
Cheers,
JEdward.
thank you for your reply.
I have trained a bayes kernel model using this process: and now trying to apply the model on the live data using this process: Here is my training data: https://www.dropbox.com/s/1a5ylc09kxsz9te/training_data.csv
The models I'm storing in the repository are here: https://www.dropbox.com/s/sooclcwkfjqves0/Rapidminer%20models.zip
When applying this live data https://www.dropbox.com/s/xu7dku0gmktt74t/live_data.csv , I'm getting the following warnings: and the model produces no prediction.
Thanks for reading
Best,
Maxim
I'm still using Rapidminer Studio to learn about other ML algorithms, but currently I don't see any more value in it besides of being a learning tool.
There's an operator called "Add" that is used to declare possible nominal values seen in the training data but which are not seen in the test data. There's also the Remap Binominals operator that can be used to change which values map to true and false. These will eliminate the errors you are seeing in the log file.
Regards
Andrew