Remove duplicate rows including rearrangements across columns
I have generated several "optimized" Fantasy Football Challenge lineups.
I am trying to figure out how to remove duplicate rows where the lineup generated includes rearrangments within same-position columns.
My goal is to remove any row that is a duplicate, exact column matched or not (the same lineup of players is the same lineup regardless of order).
QB | RB | RB | WR | WR | WR | TE | FLEX | DST |
Tom Brady (9864209) | Carlos Hyde (9864280) | Alex Collins (9864148) | Demaryius Thomas (9864449) | Cordarrelle Patterson (9864841) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Todd Gurley (9864760) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Carlos Hyde (9864280) | Alex Collins (9864148) | Michael Thomas (9864513) | Emmanuel Sanders (9864451) | Seth Roberts (9864845) | Austin Seferian-Jenkins (9864619) | Todd Gurley (9864760) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Carlos Hyde (9864280) | Kareem Hunt (9864565) | Seth Roberts (9864845) | Demaryius Thomas (9864449) | Tyrell Williams (9864740) | Austin Seferian-Jenkins (9864619) | Todd Gurley (9864760) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Carlos Hyde (9864280) | Kenyan Drake (9864473) | Emmanuel Sanders (9864451) | Cordarrelle Patterson (9864841) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Todd Gurley (9864760) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Carlos Hyde (9864280) | Kenyan Drake (9864473) | Jermaine Kearse (9864605) | Demaryius Thomas (9864449) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Kareem Hunt (9864565) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Kareem Hunt (9864565) | Carlos Hyde (9864280) | Michael Thomas (9864513) | Demaryius Thomas (9864449) | Tyrell Williams (9864740) | Austin Seferian-Jenkins (9864619) | Marshawn Lynch (9864829) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Kenyan Drake (9864473) | Carlos Hyde (9864280) | Tyrell Williams (9864740) | Demaryius Thomas (9864449) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Kareem Hunt (9864565) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Leonard Fournette (9864406) | Alex Collins (9864148) | Demaryius Thomas (9864449) | Seth Roberts (9864845) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Carlos Hyde (9864280) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Carlos Hyde (9864280) | Leonard Fournette (9864406) | Seth Roberts (9864845) | Emmanuel Sanders (9864451) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Kenyan Drake (9864473) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Leonard Fournette (9864406) | Carlos Hyde (9864280) | Tyrell Williams (9864740) | Michael Thomas (9864513) | Emmanuel Sanders (9864451) | Austin Seferian-Jenkins (9864619) | Marshawn Lynch (9864829) | Los Angeles Chargers (9864756) |
Tom Brady (9864209) | Alex Collins (9864148) | Carlos Hyde (9864280) | Demaryius Thomas (9864449) | Cordarrelle Patterson (9864841) | Michael Thomas (9864513) | Austin Seferian-Jenkins (9864619) | Todd Gurley (9864760) | Los Angeles Chargers (9864756) |
Afterward, I noticed that most of the generated lineups are the same players, simply rearranged across columns (see the Carlos Hyde and Kenyan Drake RB columns above).
To make things worse, any player can be put into the FLEX position column.
My hope is to stay with simple per-cell analysis code so that I can apply a duplicate remover using a per-rule row such as this:
(($A2=$A3)or($A2=$H3))&
(($B2=$B3)or($B2=$C3)or($B2=$H3))&
(($C2=$B3)or($C2=$C3)or($C2=$H3))&
(($D2=$D3)or($D2=$E3)or($D2=$F3)or($D2=$H3))&
(($E2=$D3)or($E2=$E3)or($E2=$F3)or($E2=$H3))&
(($F2=$D3)or($F2=$E3)or($F2=$F3)or($F2=$H3))&
(($G2=$G3)or($G2=$H3))&
(($H2=$G3)or($H2=$H3))&
($I2=$I3)
The above matching would provide a brute-force, per cell check for matches of the same player position category (QB, RB, WR, TE, FLEX, DST), but my hope is someone knows of a better solution.
Any RapidMiner guidance would be appreciated.
Thanks in advance...
Answers
There may be more elegant ways of doing this, but certainly one way that would work would be as follows:
I hope this is helpful.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts