Random Forest performance measure
Hi anybody!
A short question:
I've added the performance operator to my model with a RF algorithm. The accuracy percentage is 57%, however, this is very black and white as i'm predicting league outcome of the English Premier League (soccer ;-) )
Which parameter in the performance operator should I use to see how far the prediciton is from the correct answer?
Thus, if Chelsea was predicted 1st, but was should have been predicted 2nd.. how do i measure the performance of the model that it was not completely wrong but only by 1 place.
I thought to use the relative error, but the examples in my dataset are polynominal.
Any advice is welcome!
Thanx so much!
Frederique
Answers
I am not sure that there is a performance operator that does what you want automatically if you have framed it as a classification problem. But you should be able to do it manually fairly easily. Using "Generate Attributes" you can calculate the difference between predicted rank and actual rank and then look at that. Note that if you reformulate your prediction as regression of a numerical outcome (say league ranking) then you will be able to do this more easily with existing performance operators.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@mschmitz did soccer league predictions once... Maybe he'll give you some hints ;-)
Regards,
Balázs
Hi,
i did do it for Bundesliga and Premiere League but i worked on individual game W-D-L forecast. So i can't help here if this experience.
Anyway, i would design my own performance measure and use Extract Performance to convert it.
Cheers,
Martin
Dortmund, Germany
Hi,
I am very interested. I am currently doing it for Premier leauge, Bundesliga and Seria A. W/L/D. I have abotu 1600 games worth of statistics from the last couple of seasons. I have created a few of my own stats (such a quality shots - % of shots taken in front of goal from inside the 16 yard box) and i run it on season form at home, season form Away, last 4 games home, last 4 games away. Opposition home stats and opposition away stats so i have a lot of features/ attributes. I have found Weka Random forests to be the best getting it up to 56% +/- 3.03. Best i can get the Rapid Miner forest is up to 54.77 +/- 2.88. he prediction % for win and loss on weka is more varied where the rapid miner one is quite static with the most it ever preditcs a win 54%. Any suggestions on how to get these accuracy % up?
I currently rank attributed by weight using chi squared and then select say the top 50-60.
(This is just a hobby i am by no means a maths whizz with most of this being self taught).
Thanks
Hi,
from my experience: You are most likely not interested in accuracy. Predicting that Bayern München will win against Freiburg won't give you much money. I would recommend moving to something different .
I think got up to 60% in accuracy last year though. The trick is the prep.
Best,
Martin
Dortmund, Germany
If you want to make money you want to make sure you have an edge over the market with a modest return on investment. I generally bet on games where the model gives me a 10% edge on price around the $2-$4 mark. Like the stock market if you can get a nice roi over a long period of time your going ok.
I am just looking to improve my model. I am starting to work out that prep is the way to go in improvement. As i have 120+ attributes . Can you suggest any tutorials or methods to prep that data. Its all clean i think i'm mainly looking for how to weight an select. Current using weight by chi squared then select top k.
Also if i am not going for accuracy how do you recommend i judge if my model is better after 10 fold cross validation?
I understand these are pretty rookie questions
Also willing to send you my 1600 games of data if your interested in them
Thanks
Matt