The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Supervised learning data prep for sport prediction
Hi All,
I've got my sports data and looking to build a model for predicting the outcome of sport games.
My data has a single row for each team ie.
Team StatA StatB StatC
A 4 6 8
B 3 9 8
C 4 6 5
.....
Now in my data team A plays team B, Team C plays Team D etc...
Now there's two ways I can do this, the first is a matched pairs (Can rapidminer do this?) so my data would look like
ID Team StatA StatB StatC
G1 A 4 6 8
G1 B 3 9 8
G2 C 4 6 5
Then you tell the program ID is the matched ID field so it knows the first row and second row is a matched pair and builds the model accordingly
Or the other way is to transform the data into one row like this,
AStatA AStatB AStatC HStatA HStatB HStatC
4 6 8 3 9 8
So now all my data from both teams in a single match is on a single row, and build the model this way.
Can I get pros and cons for each? will it yield the same result and is the first matched pair even possible? (I know it is in SAS)
I've got my sports data and looking to build a model for predicting the outcome of sport games.
My data has a single row for each team ie.
Team StatA StatB StatC
A 4 6 8
B 3 9 8
C 4 6 5
.....
Now in my data team A plays team B, Team C plays Team D etc...
Now there's two ways I can do this, the first is a matched pairs (Can rapidminer do this?) so my data would look like
ID Team StatA StatB StatC
G1 A 4 6 8
G1 B 3 9 8
G2 C 4 6 5
Then you tell the program ID is the matched ID field so it knows the first row and second row is a matched pair and builds the model accordingly
Or the other way is to transform the data into one row like this,
AStatA AStatB AStatC HStatA HStatB HStatC
4 6 8 3 9 8
So now all my data from both teams in a single match is on a single row, and build the model this way.
Can I get pros and cons for each? will it yield the same result and is the first matched pair even possible? (I know it is in SAS)
0