The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"PaREn Extension"
dragonedison
Member Posts: 17 Contributor II
Dear everyone,
I found the new update for RapidMiner includes the PaREn Extension, which claims that it can suggest a most suitable classification method for the dataset. I would like very much to know how to use this extension.
Regards,
Gary
I found the new update for RapidMiner includes the PaREn Extension, which claims that it can suggest a most suitable classification method for the dataset. I would like very much to know how to use this extension.
Regards,
Gary
Tagged:
0
Answers
Hi,
Try this
http://madm.dfki.de/rapidminer/wizard
However, perhaps some fixing may still be needed; I have tried to follow the guidelines in a simple test and was not successful in running it till the end.
Regards
Dan
I found the LandMarking operator doesn't work out of the box but by deselecting the "Linear Discriminant" check box I got a successful run.
Here's an example that predicts the KNN operator will do best on the Sonar data set and lo and behold it seems to - so that's quite cool. Andrew
Thank you! The link is exactly what I need.
Regards,
Gary
we are in contact with the guys from DFKI contributing this extension. They found out, it runs fine under linux but fails on windows machines. We will publish a new version as soon as possible.
Greetings,
Sebastian
the fix is on the update server.
Best,
Simon
Thanks,
Tom
It is a great and very useful initiative to provide such an extension as PaREn. This kind of feature is included in other major DM software, so it was time. Many thanks to the PaREn team!
I have tested this feature again since operational on Windows machines, and would like to make some constructive comments that, added to those to follow from other guys, would hopefully be a useful feedback to the developers, for future improvements.
Using a dataset of 1000 rows with a binominal label, the accuracy of a PaREn optimised classifier based on decision trees was 0.692, actually under the accuracy 0.726 of the elementary zeroR model (based on taking the mode as the prediction in all cases). Separately I built a decision tree at a glance, that gave an accuracy of 0.737 - very small improvement, model that was tested via cross validation.
Not sure if the current order in which the figures are is statistically significant, but anyway, one would normally expect the PaREn optimised classifier to outperform both the subsequent DT and the trivial model blindly predicting the most frequent class.
Any other guys with comments on their results?
BTW, most probably the answer is yes - but could the PaREn team tell us whether they made use of the ROC analysis implemented in RM, among others, to optimise accuracy? Thanks.
Regards,
Dan
Cheers,
Faisal
Thanks so much for providing this plugin! It really helps me in my data discovery tasks.
Regards,
Tom
www.neuralmarkettrends.com
Hi Faisal, A similar (though not identical) feature, very effective indeed, is offered by IBM SPSS Modeler for instance as an automatic modeling operator, via which several models are produced automatically, and the best of them are proposed to the user. Moreover, the models may be combined to produce a kind of voting model, which may have better performance in some occasions than the individual models. See a demo here.
http://www.spss.com/media/demos/modeler/demo-modeler-overview/index.htm
Since you asked for suggestions, perhaps you can offer an option expressing how much the models are to be optimised, so that results can be produced in shorter or longer times upon choice. For practical reasons one can offer 3 levels for instance: low, medium, high levels of optimisation (corresponding processing times will increase accordingly). This would offer a balance between processing time and model performance (one of my tests on a dataset of 1000 rows was quite long to run and sometimes we may want to reduce this time).
Also, you may wish to automatically select the best 2-3 models and offer their respective RM processes, or alternatively one may build a process in which these models are put to vote, etc. Potentially your add-in can bring a lot of help to data miners. Thanks again and good luck!
Best,
Dan
just as an aside: Trying different models on a data set is easily possible using a combination of parameter optimization/subprocess selector. Maybe we should have a sample or building block for that :-)
Best,
Simon
thanks for the feedback.
Concerning the run-time of the evaluation (which includes optimization):
We are actually working on the prediction of the run-time as well. For each of the listed classifiers in the wizard you can then not only see the predicted accuracy but also the expected run-time for training on the given data. This should help a lot when certain constraints have to be met, e.g. on embedded systems (where computational power is limited) and if you want to choose a classifier with reasonable performance but also low energy consumption. Maybe we should try to trademark "Green Data Mining" before releasing the next version of the PaREn Automatic System Construction Wizard
Hm, the discussion has not much to do with "Problems and Support" - and I am really happy about this!
Anyway, if you experience any issues, please let us know.
Cheers
Christian
Since I've no improvements to add: Best wishes
if I read this thread, I feel honored that all this discussion takes place in the Problems and Support forum moderated by me, but I wonder if it would be a good idea adding a new forum explicitly for the paren extension. What do you mean?
@Christian
If you are going to estimate the runtime of an operator, it might be useful to contact us. We have been working on the same issue for a while and probably can provide you with some help on that. Might be it would be a good idea to join our new Special Interest Group for Development of RapidMiner. I think you left RCOMM before we established them on the last day, is that possible?
Greetings,
Sebastian
yes, we left on Wednesday evening and did not attend the final training day. Special Interest Group sounds interesting, please send me more info or point me to it if already available.
Well, I don't think that a dedicated forum for the PaREn Wizard is needed. Maybe one for third-party contributions? Kind of a pre-roll for the envisioned marketplace.
Regarding the timing predictions we would be happy to join forces and exchange insight. Looks like you should do a Rapid-I group excursion to Kaiserslautern
Cheers
Christian
the mailling lists of the Special Interest Groups are listed on the sourceforge page of RapidMiner. We are still working on a page explaining in more detail topics and aims of each group, but nevertheless they are online. You might join there directly or write me an email and I will put you on the list.
The idea with a third party forum is good. I will add one.
I've never been in Kaiserslautern. Seems to be a good idea to change that now I will contact you by mail.
Greetings,
Sebastian
The cool thing about the PaREn extension is that it predicts which model is probably the best even without any testing. This is the first time I have actually see this meta learning approach really working and this is probably the reason why we at Rapid-I and many others love it. Kudos to the Christian and the team of the DFKI for this great extension!
I have also a suggestion: would be great if a k-fold cross validation or even a single split would be selectable instead of the rather time consuming LOO evaluation.
Cheers,
Ingo
10-27-10
At the time of my post I thought I had the latest update for PaREn. Once I installed the latest version I am not experiencing this problem any longer.
as Simon has pointed out: there should be a new version on our update server. Do you really use the latest version available? If yes: What's your OS?
Cheers,
Ingo
Factually speaking (by the way as a fan of both software - RM and SPSS Modeler), there are obviously similarities and differences in the features we discuss about, and I am afraid that the differences show for now that SPSS Modeler is incomparably much ahead: time of running to build the best models, reliability and performance of models (see my previous posting above regarding unexpectedly suboptimal optimised PaREn models), the combination of best models in an overall model to use, etc. On the other hand, the estimated accuracies in PaREn were quite far from the actual accuracies in most of my experiments, but the idea is interesting.
@ Christian et al. : I would have an additional suggestion to which I had thought when posting questions earlier in this topic. ROC Analysis can be added to searching the model giving the best accuracy when the output/label attribute is binominal. More precisely, after finding the best parameters for a learner, given a dataset, one can get also the optimised threshold from a ROC curve (as opposed to using the default threshold 0.5), which guarantees the best accuracy.
However, perhaps this suggestion may be useful to consider after the ROC Analysis implemented in Rapid Miner would be revised as it is still unreliable in this package (i.e. AUC calculation needs corrections, as I have shown on the forum http://rapid-i.com/rapidforum/index.php?PHPSESSID=18d6261d2d63b2ca946477f03c2552bc&;topic=2237.0
, and Find Threshold operator does not find the best threshold as expected but provides suboptimal solutions - I emailed a complete report to the RM development team, with relevant processes illustrating this).
PaREn is an excellent initiative towards RM's enrichment. However the extension needs to be more practical and more accurate. Indeed, it requires relatively much processing time and models are not as optimised as expected - see postings in this thread, where it is explained that both - an ad hoc model created with no particular setting, and a trivial model that picks up blindly the most frequent class as prediction - are better in accuracy than the optimised, time consuming to build PaREn model. Improvement would be very beneficial and necessary indeed for the extension. Other users of the extension may wish to generate ad hoc models in addition to the PaREn models, and to compare their accuracies - this would be a useful feedback to the development team.
I hope the feedback and suggestions in this thread help and would be useful to PaREn, as part of the community's contribution to improve the open source software. Good luck!
Regards,
Dan
The producers of C5.0 compare it against C4.5 here..
http://www.rulequest.com/see5-comparison.html
So a lot of the difference we already have, moreover C5.0 is closed source and not free. Really? Check out my recent post http://rapid-i.com/rapidforum/index.php/topic,2237.msg10540.html#msg10540
But what has any of this to do with the PaREn extension? Not much! As Ingo says... So it simply misses the point to state that "models are not as optimised as expected".
Toode Pip!
:http://amine-platform.sourceforge.net/.