Thursday, April 22, 2010

Match Algorithm and Inputs

One of the common questions that I commonly get is:

Tony, what inputs should we use as the basis for our matching algorithm?

Every time I hear that question, it worries me a bit.  eHarmony came with lots of research behind the dimensions of compatibility and how those related to marriage longevity and happiness.  The people asking the above question don’t have that research and they also don’t really know how the matches should occur.  For example, do you match people with similar personality traits or complementary or opposites?  Do opposites really attract?  Without the research, you are in the position of making educated guesses at what makes a good match.

So, there are really several parts to this question:

  • What inputs make sense?
  • How should the resulting algorithm use those inputs to form matches?
  • Is the resulting algorithm okay to use in your startup?

This post is another in my series around Matching that includes Social Media Matching, Matching Algorithms, and Match Performance Support.  If you’ve not read these, you might want to spend some time with them first.


There are likely a lot of possible inputs.  To me, the first step is to do some research on different aspects of the fit between people and projects and create a long list of possible inputs.  For example, you might come up with:

  • Industry / Specific Knowledge
  • Skills / Roles
  • Timeframe
  • Geography / Travel
  • Personality
  • Team Styles
  • etc.

Of course, many of these items will result in more of a filtering algorithm than a matching algorithm.  For example, you can have people specify experience in particular industry and the project can have requirements for experience with particular industry.  This is classic filtering.  Even with scoring, it still will have little mystery (perceived value).

This gets much more interesting when you get to personality and team styles. 

Personality Profiles

So, this leads us to the question:

Should I use a use a personality profile in my matching algorithm?

You can use something like a DISC or MBTI personality profile.  These instruments exist and are fairly well documented.  But do they relate to what will make someone happy on a project, happy in a job, happy with a tutor, etc.?  Chances are that without a fair bit of research, you are not going to know the answer.  And particularly, you won’t know if it makes sense to match people based on similarities, complements or differences. 

In the case of matching people to projects, there’s quite a bit of research already out there on personality types and team effectiveness.  In fact, a cursory review of some of this information suggests that there are quite a few other kinds of inputs that will make a lot of sense such as the kinds of roles that the person naturally falls into.  And, in fact, there are a lot of tools that can assess a given team and tell you about likely issues around communication, leadership, etc.

Bottom line, you should do a fairly significant review of the research and the various tools to come up with models of how personality assessments, communication styles, personal preferences, availability, natural roles, etc. fit into a matching algorithm.

Input Matching

Given our list of inputs, most matching algorithms are based on a few types of rules:

  • Requirements – Yes/No – if these don’t match, you don’t have a match period
  • Scored – calculate a distance between matching factors, multiply these by an importance factor (Scoring coefficient).  For example, how far are you willing to travel, is that an important factor?

Critical Mass Problem

One thing that many startups don’t recognize going into the design of their matching algorithm is the problem of critical mass.  When you first start, often the number of items that you can match are relatively small.  And the value of the matching algorithm goes up as you increase the numbers.

This is probably worth it’s own post.

In terms of the design of the algorithm, you probably need to design your algorithm to be flexible in how it surfaces matches in the case where there are relatively few possible matches.  You absolutely need to avoid returning “0 matches found” and asking the user to continue to change their criteria blindly hoping to find matches.  Goodbye user.

Hypothesis Algorithms

So we’ve defined our inputs and algorithm for matching based on our best understanding of what makes a good match.  We should call this a Hypothesis Algorithm.  It’s our best educated guess.

Of course, over time you can capture results from matches by asking for input from workers and project managers all along the way (prior to start, during, after) to assess whether the match was indeed a good match.  This self-reported data can then be used to tune the algorithm over time to turn it from a Hypothesis Algorithm into an algorithm based on results.

Is a Hypothesis Algorithm okay?


The answer is that there are many startups in the market today that are based on Hypothesis Algorithms.  Likely they are overselling their algorithm, but the reality is that a decent Hypothesis Algorithm plus good Match Performance Support will likely yield better results than existing systems which are often quite random.  Consider the examples:

  • Workers to Projects
  • Recruits to Employers
  • Learners to Tutors

Today, each of these are horribly inefficient, based on incomplete, random information, and performed in ways that are far from expert performance.  So, the real question is whether you can outperform the existing systems more than whether you have more than a hypothesis algorithm.

Even a Hypothesis Algorithm is Hard

When it comes to a hypothesis matching algorithm, I’ve already suggested a few requirements:

  • Needs to have mystery – not a filter.  If it’s obvious where the results come from, people won’t ascribe much value.
  • Must handle the critical mass problem gracefully.
  • Needs to hold up to scrutiny.  Why did I get matched with this person?  Why didn’t I get matched with this person? 

Of course, needing to hold up to scrutiny and being an untested, hypothesis algorithm is a bit challenging.  Once you have more experience, you can get to be like Gallup and it’s Q12 instrument that measures employee engagement.  They have a question in there that I’m sure many people would love to get rid of: “Do you have a best friend at work?”  When I read that question, I’m not quite sure what it’s asking me.  Almost no one is quite sure.  But when asked why Gallup includes the question despite the confusion, their answer is that it has been shown to be highly correlated to engagement.  Basically, they don’t quite know what it means either and likely means something different to different people.  But the answers to this question (as compared to 100s of other variants) correlate higher to engagement levels.  At first, I really didn’t appreciate that answer.  C'mon Gallup, just get rid of it to avoid the question.  But in a way, there’s a beauty to it.

The problem that most startups with Hypothesis algorithms have is that they don’t have the research basis to back them up.  When matches are shown and challenged, how do you defend that this is a good match.  And believe me, you will get challenged. 

Sample Data Sets and Testing

Of course, one of the ways to stand up to scrutiny better out of the gate is to have a good set of sample data that you can use during design and development to test the algorithm.  You make sure it works first via a spreadsheet.  Then in code.

Make sure this is fairly robust.  If you don’t do this, then some of your early matches will be really bad.  And it always seems that its the critical investor/blogger/reporter who tries your system and gets a bad match.   Partly that’s because they aren’t using the system as a real user.  But you do need to test those edge cases.


Anonymous said...

I have been interested in creating a match algorithm for a while now. I've been getting all my date together over several months and I'm dying to start my first draft of my project. I just came across your blog and have found it extremely helpful, so thank you. I was wondering though if you ever teach how to create the script for the algorithm. I have no experience on PHP and so I've come as far as I can.

If you don't teach it, do you know any websites that do? I have been on an endless search.

Thank you for your time,

Tony Karrer said...

That's a good question and I've not seen it. I also would suggest that PHP might not be a good language depending on the scale you believe you will reach.

Eric Uldall said...

Cool article. Glad I came across it. Thanks for the info, Tony. I will be keeping this post in my bookmarks and referring back often to make sure i'm crossing my t's and dotting my i's, so to speak.