The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
which operator?
MinerGeorge
Member Posts: 1 Learner III
Hi,
I have a dataset containing 1m + rows which I wish to group based on the relationship between several columns.
(customer name / nominal / label),(date of contact / date),(sales rep no./nominal)
smith 1/1/11 001
smith 2/1/11 002
jones 3/2/11 001
brown 2/2/11 003
brown 3/2/11 001
brown 3/2/11 004
black 6/2/11 001
jones 4/2/11 005
black 5/2/11 002
Now for the tough bit,
We need to classify the customers based on the unique group of sales reps they have dealt with, ie,
smith and black are in group A as they have both been contacted by 001 and 002, jones is B, brown is C ......................
Is this possible in RM, which operator/s do you suggest?
Thanks in advance.
I have a dataset containing 1m + rows which I wish to group based on the relationship between several columns.
(customer name / nominal / label),(date of contact / date),(sales rep no./nominal)
smith 1/1/11 001
smith 2/1/11 002
jones 3/2/11 001
brown 2/2/11 003
brown 3/2/11 001
brown 3/2/11 004
black 6/2/11 001
jones 4/2/11 005
black 5/2/11 002
Now for the tough bit,
We need to classify the customers based on the unique group of sales reps they have dealt with, ie,
smith and black are in group A as they have both been contacted by 001 and 002, jones is B, brown is C ......................
Is this possible in RM, which operator/s do you suggest?
Thanks in advance.
0
Answers
the best solution would probably be to pivot the data and then apply a clustering algorithm. You probably don't want a group for each unique set of sales reps, but for similar groups of sales reps, thus clustering will work good enough.
If you have one million rows you may want to train the clustering model only on a subset for performance reasons and then apply it to the rest of the data.
If the dates are not important, you could replace them with 1 if present in the pivoted data, and with 0 otherwise.
Please have a look at the attached process.
Best, Marius