modeling many-to-many matching
Hi...looking for some data science advice. Say we wanted to create a process in RapidMiner that would be similar to a dating website (let's only take male-to-female hetero for the moment):
setup: I have two data sets: one of men with a lot of attributes about them and the women they have been interested in, and another of women with a lot of attributes about them and the men that they have been interested in. Most of these attributes on both sides are binominal / dummy coded categoricals but some are numerical (e.g. age).
goal: build a process where, if a new man logs in and fills out a survey to propagate his attributes (minus dating history - he's new), the output is a list of women that are most likely to be interesting to him - based on the training set above. Vice versa for women.
My initial thought is that this is a classic segmentation problem e.g. k-means clustering or something similar. But I want the output to be predictive with probabilities etc...
[Note: this is actually not my use case - I'm not building a dating site! But the case I'm working on is very similar in structure.]
Thoughts?
Scott
Answers
Well what women the men are interested in might not lead to a good match. I can search for specific criteria of women on a dating site but still not get them to respond. Perhaps the better thing is to indentify what critiera in the men lead to a succesful date from the women.
It's funny that you post this. I just watched this Vice video about Tindr and other dating related websites. It's potentially NSFW for some but I found it interesting from a data science perspective: https://www.youtube.com/watch?v=J9V3fLUSQFM
Maybe I am missing something obvious here, but why not just build two separate predictive recommender models, one for men and one for women? The Recommender extension is designed to do exactly what you are describing, using k-nn either for item or user attributes.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@Thomas_Ott hmm. I do not think what I'm doing is going to help anyone's personal life.
@Telcontar120 yes creating two separate models is exactly what I was planning to do. I have never fiddled with the recommender extension before but I think today is the day to do so. Any nice sample processes I can look at to get a feel for it?
Scott
Sadly I do not have any samples to offer for recommendation models (they all stayed at a former employer) but the operators are not hard to use and I am sure you will figure it out quickly. Or @mschmitz might have something to offer?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi All,
not really something to share. I think it boils down to Item Recommendation / Cross Distances.
What are your demands on the answer time? One has the option to built a shitload of models first (e.g. to predict the correct cluster). In recommender systems you hit a problem with response times here. So maybe this could still be an option
Dortmund, Germany
yes exactly @mschmitz. I could just run NN all the time but it is very slow. I am looking for a low-latency solution. And I am happy to hear that you came up with the same hack that I did (store a ton of models and then choose on the fly). I'm trying to do as much preprocessing as possible but at some point I need a way to create the "match" via applying some model - quickly.
Scott