RapidMiner Data Science Competition 3: Fantasy Football
Hello RapidMiners -
Yes here it is! Here's the setup:
“Fantasy Football” is an online game where users choose a “fantasy team” of nine American football players from among current NFL player rosters. This can be done once per season (“season play”) or every week (“daily fantasy” or “DFS”). Once a fantasy team has been chosen, the user gains/loses points depending on how their players perform in actual NFL season games. These players do not need to be on the same actual NFL team; in fact usually they are players from a wide variety of actual teams. This competition will focus on DFS, not season play.
There are two major website platforms for playing DFS online: DraftKings and FanDuel. In either platform, the goal is always the same: maximize “Fantasy Points” (FPTS) while keeping inside a given “Salary Budget” (Salary). FPTS are earned when the players on your fantasy team do good things while playing football for that week.
Hence the key idea here is to look for value of a player: DK Salary vs FPTS. We will define this value as FPTS per $1000 of Salary, or “FPTS_Ratio”.
The goal of this challenge is, given all historical player information and DK Salaries (up to and including games played on December 18, 2017), to predict the FPTS_Ratio with the lowest root mean-squared error (RSME) for all players for Week 16 (December 23-25) of the 2017 NFL season.
Obtaining NFL football statistics is easy to do now due to the freely available nflscrapR-data dataset created by Ron Yurko, Sam Ventura, and Max Horowitz from Carnegie Mellon University and recently posted on Kaggle.com. It is well documented and easily downloaded from the GitHub page. An exact clone of this repository will be used at all times.
In addition, we are including a separate data set, "RotoGuru-DK.csv" which has DraftKings FPTS and Salary for each week from 2014 Week 1 thru 2017 Week 5 (source: rotoguru.com):
It is clear that much of the information that serious Fantasy Football participants use is unstructured data – website text in particular. Therefore it is permitted to use the following external sources in your models: http://www.espn.com/fantasy/football, https://football.fantasysports.yahoo.com, https://www.dailyfantasysports101.com.
All submissions in this competition need to be posted in this thread with the entire XML of the process using the supplied process template. No submission will be accepted if it is submitted in another other form.
The deadline for submissions is December 19, 2017 at 23:59:59 UTC.
The NFL will play Week 16 beginning on December 23 and continuing through December 25. The winners of the competition will be the models with the three lowest RSMEs when applied on the Week 16 data set. This is the ultimate test set – no one has access to it prior to submission. The winners will be announced sometime after December 25, 2017 in the competition’s thread.
RapidMiner will award the following prizes to the winners:
1st place: US$750 (as a VISA debit/gift card)
2nd place: US$250 (as a VISA debit/gift card)
3rd place: US$100 (as a VISA debit/gift card)
4th place: RapidMiner “lightning” t-shirt
5h place: RapidMiner “lightning” t-shirt
6th place: RapidMiner “lightning” t-shirt
NOTE: THIS IS ONLY A SUMMARY. A FULL DESCRIPTION OF THE COMPETITION AND RULES ARE ATTACHED TO THIS POST. PLEASE READ CAREFULLY BEFORE BEGINNING!!
That's about it. Good luck everyone and may the best modeler win!
Scott
Answers
hello all Fantasy Football RapidMiners -
Hope the challenge is going well for. This thread has been VERY QUIET so please begin chatter if you like!
As you may have noticed, RapidMiner Studio 8.0 was released today. Wahoo!! So obviously you are permitted to submit your entries for this competition using RapidMiner Studio 8.0 as well as 7.6.1, 7.6.2, or 7.6.3.
Good luck everyone!
Scott
Scott
Hello all
Let me introduce myself - I am an university student from Europe who got recently very interested in data science and things related to it. I thought I would at least try to work on this challenge as it seems to be well prepared and I think it can help me to learn a lot.
I know the challenge ends very soon but I want to try to build at least some simple, functional model.
I have a question though - is the RotoGuru-DK dataset complete? It's mentioned that it contains games for each week from 2014 Week 1 thru 2017 Week 5. But it only contains first 9 weeks for each year, while weeks 10-17 are not included. Is it correct or a mistake?
Thank you for an answer and also for a great work on the challenge
hello @maros_plsik - aha you're right. I think there may have been an error on my part with that data file. Use this one instead.
Scott
Hello Scott,
I have another question. Since we are not allowed to change the grey blocks, we are also not allowed to do any data imputation on the DK salary attribute, right? Because my model performs better with data imputation. The template process throws all players with salary 0 away.
Best regards
Florian
hello @florian_ziegler - no you can do that. Just put it in Step 5.
Scott
So, here is my submission. I only used the RotoGuru-DK dataset, but maybe, the model still makes some nice predictions.
I am not sure whether it fulfills all the requirements. If something has to be altered, please, let me know.
I am also submitting my solution. The issue is, I have created the whole process in my own empty window (not in the template you provided) and now I am having issues getting all the operators from my process to the template. I have uploaded my file and I am going to rewrite it to the template, hopefully it will be done in a few minutes. I hope it's ok. Thanks for understanding.
Finally! The process is in the template.
Two more things:
1) Scott, is there any way to copy operators from one process file to another? Except of copying parts from one .xml to another -> it's still quite a lot of work if the versions of files are different.
2) It was very simple to work with template but I found it quite limiting. My process was firstly divided into two parts (for Players and for Defense - GID>7000). I was getting quite good results with the process, but unfortunately there was the limit of using only one model for the whole dataset, so I had to disable about a half of my operators including one model. Nevertheless, the challenge was very well prepared and I hope there will be more in the future
ok this competition is CLOSED. Thank you all for your submissions! Stay tuned to see who our winner will be!
Scott
hello all - first of all THANK YOU @maros_plsik @yzan @florian_ziegler for your submissions. I can confirm that they all work (yes even yours @yzan although it locked down my machine for several hours!) and hence it's only a matter of scoring. Week 16 ended yesterday so I will be working on this today and tomorrow to get the final RSMEs published.
Interesting side note: I took each of your models and had it give me predicted FPTSRatios for Week 16 BEFORE the games took place, and picked a fantasy football team for each model. My algorithm was somewhat fudged because there are other aspects to choosing a team besides FPTSRatio - in particular, the budget. You see, if you pick purely FPTSRatio. Here's what happened:
@yzan
According to model choosing best FPTSRatio for each slot, I get:
QB N. Peterman (pred FPTSRatio = 3.652, salary = $4500)
RB K. Hunt (pred FPTSRatio = 2.663, salary = $8400)
RB T. Gurley (pred FPTSRatio = 2.655, salary = $9100)
WR D. Bryant (pred FPTSRatio = 2.647, salary = $6000)
WR K. Wright (pred FPTSRatio = 2.410, salary = $3800)
WR S. Sterling (pred FPTSRatio = 2.351, salary = $6600)
TE G. Olsen (pred FPTSRatio = 2.507, salary = $5200)
FLEX M. Gordon (pred FPTSRatio = 2.654, salary = $7200)
DST Detroit Lions (pred FPTSRatio = 4.949, salary = $2900)
Total salary: $53,700
So I'm over by $3700 PLUS (and this is where things get interesting), DraftKings listed three of these players as "Questionable" to play due to injuries: T. Gurley, S. Sterling, and M. Gordon. So I overruled the predictive model and chose the next highest ranked players to fill those slots AND make the salary fall under $50,000:
yzan's model applied to DraftKings for Week 16
And here's what happened:
results for Week 16 with modified yzan model
so basically we got killed. HOWEVER what if I did not overrule the model for "questionable" players and simply kept reducing salaries until I fell under $50,000 PLUS eliminating N. Peterman (if I knew anything about football, I would have known that he had zero chance of playing this week), I get this:
QB M. Trubisky (pred FPTSRatio = 3.100, salary = $4700) -> 18.12 FPTS
RB M. Gordon (pred FPTSRatio = 2.654, salary = $7200) -> 21.8 FPTS
RB C. McCaffrey (pred FPTSRatio = 2.483, salary = $6400) -> 7.8 FPTS
WR D. Parker (pred FPTSRatio = 2.340, salary = $4200) -> 11.3 FPTS
WR K. Wright (pred FPTSRatio = 2.410, salary = $3800) -> 6.7 FPTS
WR S. Sterling (pred FPTSRatio = 2.351, salary = $6600) -> 9.5 FPTS
TE G. Olsen (pred FPTSRatio = 2.507, salary = $5200) -> 5.7 FPTS
FLEX T. Gurley (pred FPTSRatio = 2.655, salary = $9100) -> 55.6 FPTS
DST Detroit Lions (pred FPTSRatio = 4.949, salary = $2900) -> 12 FPTS
Total salary: $50,000
Total fantasy points: 148.52 - moving from 7553rd place to 1450th place
OK that's still terrible. But here's what is interesting... almost EVERY top players for Week 16 were predicted a very high FPTSRatio by at least ONE of the three models. It's too long to explain here but when I looked at the results, I recognized almost every name who did very well. And furthermore you must note that all three models had a RSME somewhere in the 1.0-2.0 range which makes a huge difference as you can see above. Food for thought.
Anyway thanks all for a very interesting competition. Final results soon...
Scott
OK HERE ARE THE FINAL RESULTS FOR COMPETITION 3: FANTASY FOOTBALL:
1st place: @maros_plsik - RSME = 1.394
2nd place: @yzan - RSME = 1.417
3rd place: @florian_ziegler - RSME = 1.539
Congratulations to all of you. VISA gift cards will be issued as soon as we can.
Stay tuned for Competition 4...
Scott
Hey all,
Thank you very much for the competition and for your work, @sgenzer . And congratulation on 2nd and 3rd places, @yzan and @florian_ziegler.
Looking forward to the next competition.
Best regards,
Maros
You're very welcome. And please help spread the word for us - we can't have the same people winning ALL the time!
Scott