modeling many-to-many matching

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
edited November 2018 in Help

Hi...looking for some data science advice.  Say we wanted to create a process in RapidMiner that would be similar to a dating website (let's only take male-to-female hetero for the moment):

 

setup: I have two data sets: one of men with a lot of attributes about them and the women they have been interested in, and another of women with a lot of attributes about them and the men that they have been interested in.  Most of these attributes on both sides are binominal / dummy coded categoricals but some are numerical (e.g. age).

 

goal: build a process where, if a new man logs in and fills out a survey to propagate his attributes (minus dating history - he's new), the output is a list of women that are most likely to be interesting to him - based on the training set above.  Vice versa for women.

 

My initial thought is that this is a classic segmentation problem e.g. k-means clustering or something similar.  But I want the output to be predictive with probabilities etc...  

 

[Note: this is actually not my use case - I'm not building a dating site!  But the case I'm working on is very similar in structure.]

 

Thoughts?

 

Scott

 

 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Well what women the men are interested in might not lead to a good match. I can search for specific criteria of women on a dating site but still not get them to respond. Perhaps the better thing is to indentify what critiera in the men lead to a succesful date from the women. 

     

    It's funny that you post this. I just watched this Vice video about Tindr and other dating related websites. It's potentially NSFW for some but I found it interesting from a data science perspective: https://www.youtube.com/watch?v=J9V3fLUSQFM

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Maybe I am missing something obvious here, but why not just build two separate predictive recommender models, one for men and one for women?  The Recommender extension is designed to do exactly what you are describing, using k-nn either for item or user attributes. 

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    @Thomas_Ott hmm.  I do not think what I'm doing is going to help anyone's personal life.  :)  

     

    @Telcontar120 yes creating two separate models is exactly what I was planning to do.  I have never fiddled with the recommender extension before but I think today is the day to do so.  Any nice sample processes I can look at to get a feel for it?

     

    Scott

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Sadly I do not have any samples to offer for recommendation models (they all stayed at a former employer) but the operators are not hard to use and I am sure you will figure it out quickly.  Or @mschmitz might have something to offer?

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi All,

     

    not really something to share. I think it boils down to Item Recommendation / Cross Distances.

     

    What are your demands on the answer time? One has the option to built a shitload of models first (e.g. to predict the correct cluster). In recommender systems you hit a problem with response times here. So maybe this could still be an option

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yes exactly @mschmitz.  I could just run NN all the time but it is very slow.  I am looking for a low-latency solution.  And I am happy to hear that you came up with the same hack that I did (store a ton of models and then choose on the fly).  I'm trying to do as much preprocessing as possible but at some point I need a way to create the "match" via applying some model - quickly.

     

    Scott

Sign In or Register to comment.