Using Poisson distribution in RapidMiner
Hey there,
i'm writing my masterthesis about predictive analytics and text mining. So i got in contact with RapidMiner. Starting with this tool and trying different things out was quite easy. Now i would like to use POISSON distribution to calculate probabilities of different events. But i couldn't find any operator that supports Poisson distribution. Is there one?
So i started to install an extension pack for R and thought i could do this job in an R script. Unfortunatelly R or Python is not supported in RapidMiner 7.1? Every startup i get an error message.
Any ideas or hints how i could do the Poisson calculations?
My current workaround is extracting the values i need using rapidminer. Export them into an excel file and do the poisson manually with an excel function. Then retrieve those result in an other process. But I could imagine that there's a more handy way to do this.
Thanks.
Kind regards from germany
Answers
Hi Dave,
R and Python are supported in RapidMiner 7.1, not sure what kind of error messages you are getting, please share.
I believe @mschmitz wrote an Poisson Distribution operator for RapidMiner. I would touch base with him.
Hey @Thomas_Ott,
thanks for your reply. I'll contact @mschmitz for further information.
In addition my error:
After Installing R Scripting extension and application reboot i'll get an "incompatible extension" warning.
I'm using RapidMiner 7.1.001 on my:
If need any other information...let me know.
Dave
Ps.: Other extensions work fine.
This is a silly question, but do you have Python and R installed and configured in the Preferences?
Oh, I should have add that info.
Yes, I did.
For example i installed and configured R-3.2.5.
Hi Dave,
what i built quite a while ago is a naive bayes using poisson distribution instead of gaussian distribution. But this was rather to learn how to write an operator .
What do you want to do with poisson? Seems like something very easy to built as an operator if you can do some java.
~Martin
Dortmund, Germany
Hey,
what i'm tying to do:
I'm trying to predict the outcomes of soccer matches of the german Bundesliga using a lot of historical data (like Shots on Target, Full Time Goals, ...).
Once calculating the offensive and defensive strength of each team it should be possible to predict results using poisson distribution.
The result should be the probabilty of a score for 0:0, 0:1, 1:0, 1:1 and so on...
For example something like this but using rapidminer instead of excel.
The reason why i'm not just using excel is that i would like to combine mutliple strategies (Poisson, Team Ranking, Text Mining(RSS Feeds),...) and i think
that rapidminer would be perfect to aggregate those data and give me a final result.
Dave,
i think you can do all of that inside RM. You need to calculate the avg(#Leauge) and eval the poisson dist. While Poisson is not included in GenA it should be easily calculateable by copy/pasting the right formula. Since you do not expect values above 10 the there shouldn't be any problem.
Otherwise you can easily use R/Python/Javascript to built the new col. Might be nicer to use Poisson from Scipy or similar.
Please be a bit careful with this article. It does a data science mistake. You take the average of the whole seasnon to calculate the value. Technically you should only take the values until thedate, because you transfer over label information otherwise.
~Martin
Dortmund, Germany