The Naive Approach to Hiring People
Every once in a while I read (or write!) something about hiring programmers. What to look for in a résumé. What to put in your résumé. Why _____ is
my favourite interview question. Why _____
sucks as an interview question. Whether we need to
filter the absolute dreck out.
I’ve even written one of those _____ sucks posts on my blog, and I’m here to tell you,
I was wrong. And I’m going to tell you why I was wrong. But first, here is an
interesting programming problem-style interview question. I’m not suggesting it is good or bad, for reasons that will become obvious.
An interesting interview questionYou have a large collection of documents, each of which accurately describes a single person’s properties. One document, one person. To keep this light, perhaps you are looking for a compatible bridge partner. The documents are online player profiles, and you are interested in finding a suitable partner. The properties for people are multi-valued, there is a large set of properties, and for each property in each document, there is either no value or a selection from a set of values. One value might be number of years of experience, another might be whether they overcall in third position with a weak hand (where “no value” means the other person did not answer that question and “no” means they do not overcall).
This is an iterative problem, you have to perform the separation on a regular basis, perhaps once each month. And each month, there is a new set of documents and persons to classify. Having performed the classification, you can check the game results at your local bridge club and see how everybody did, both the people you selected as potential partners and the people you rejected.
Describe a strategy for picking the best partners based on their profiles.
Before we discuss whether it is a
useful problem, let me tell you who I’m interviewing with this hypothetical question: Technical Hiring Managers, specifically people who are technical themselves and are also responsible for hiring other technical people. Part of their job is looking through piles of résumés, picking out the good ones to phone screen.
What I’m looking for in a “correct” answer is a basic understanding of
Document Classification. Given that we are talking about programming and programmers, a really good answer will discuss things like
Naïve Bayes Classifiers. Like programs that can distinguish
Ham from Spam.
1The point is that someone with at least a basic understanding of document classification knows how to apply what we know about document classification to the problem of selecting candidates to phone screen based on their résumés.
Someone with an understanding of document classification knows how to apply what we know about document classification to the problem of selecting questions to ask in phone screens and in face-to-face interviews, not to mention what to do with the answers. (And the emotionally nice thing about this is that it’s an interview question for interviewers to solve.
2)
If Statements vs. ClassificationA very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said.
I could make a reasonable argument that someone who doesn’t think of selecting candidates as a classification problem might miss the fact that the things to look for—years of experience with a specific technology, length of time the most recent position—are merely document features with probabilities attached to them. I could make the argument that they are “thinking in if statements” about hiring programmers.
I could go on about how saying a particular job requires “Five years of JEE” is an if statement, and one that is
far from universal. So someone who thinks like that is not a good interviewer, that they really ought to be thinking in terms of the probability that someone with five years of JEE will be Ham and not Spam.
Oh, the irony. I would be arguing that the interesting question is useful because it identifies people who pose questions like that as being bad interviewers!
There are really two approaches to take in selecting candidates. The first is the approach of the if statement: You form a model of what the candidate ought to do, work out what they ought to know in order to do that, and then you work out the questions to ask (or the features to look for) that demonstrate the candidate knows those things. If they know this and this and this and if they don’t have this bad thing or that bad thing, call them in for an interview (or, if you are interviewing them and they have demonstrated their strength, hire).
The second approach is the classifier approach. Each feature you look for, each question you ask, is associated with a probability. You put them all together and you classify them as interview/no interview or hire/no hire with a certain degree of confidence.
So is the classifier the same thing as the if statements, only with percentages instead of boolean logic? Perhaps we could simply make up a score card (10 points for each year of JEE, 15 points if they use JUnit, &c.)?
No.
The most important thing about most classifiers is that they can be remarkably naïve and still work. In fact, they often work better when they are naïve. Specifically, they do not attempt to draw a logical connection between the features that best classify candidates and the actual job requirements. Classifiers work by training themselves to recognize the differences that have the greatest statistical relevance to the correct classification.
That’s the naïvité at work: they have no idea that
experience in functional programming is irrelevant to a job writing Javascript: they just notice that the people with FP experience tend to do well in Javascript jobs, so they start considering it relevant.
Training dayDocument classification systems are trained, typically using
supervised learning: “These are the résumés of the good people. These are the résumés of the ones we had to fire.”
Here’s a thought experiment: Pretend you are trying to write a mechanical document classifier. Let’s see if designing a machine to perform the process can identify some opportunities to improve the way humans perform the process. (As a bonus, we might actually identify ways machines could augment the process, but that is not our objective.).
If you were writing a document classifier for résumés, the first thing you would probably write would be a feature that updated the training corpus whenever a programmer completed their initial probation: If their first formal review was positive, their résumé would be added to the “Interview” bin. Otherwise, it would be added to the “No Interview” bin.
Programming Collective Intelligence breaks out of “thinking in if statements” and provides practical examples for building systems that reason based on learning from data and behaviour, such as the Naïve Bayesian Filters discussed in this essay and collaborative filters such as recommendation engines.
This is a big, big difference between approaching hiring people as an exercise in if statements and as an exercise in classification. If you are working with if statements, you only change the if statements when something radical changes in the job or in the pool of people applying for the job.
But if you are approaching hiring people as an exercise in classification, you are constantly training your classifier. In fact, the quality of your results is driven by your process for training, for continuous improvement. It’s a process problem: how do we do a good job of training our classifier and keeping it trained?
Consider the training process I mentioned above: you build a document classifier, and you feed it the résumés of people you hire after they complete probation. If they quit or are fired, they are marked as “No Interview.” If they get a lukewarm review, we mark them as “No Interview.” But if they get a good review, they are marked “Interview.” What do you think?
Ok, thanks for using the comment link to tell me what you think. Here’s what I think: this is dangerously incomplete. Pretend we’re sorting emails into Ham and Spam. Training our résumé classifier based on who we thought was originally worth an interview is like training our email classifier based on which emails ended up in our in box. It totally ignores the good emails that were classified as junk. To classify emails properly, you have to go into your junk mail folder every once in a while and find the one or two good emails that were misclassified as junk, then mark them “not junk.”
Our thought experiment has identified a critical component of classification systems: to train such a system, you have to identify your false negatives, just as junk mail filters let you sort through your junk mail and mark some items not junk.
Where hiring people is concerned, what is the process for checking our junk mail filter? How do we find out whether any of the résumés we passed over belonged to people worth hiring? I don’t have an answer to this question, but thinking of résumé selection as an exercise in document classification identifies it as an obvious weakness in the way most companies handle interviewing: as an industry, we don’t do much to train our selection process.
A metric fuckload of processA company really obsessed with hiring well would keep statistics. I know, I can feel your discomfort. More paperwork, more process, more forms to fill out. But honestly,
every process is improved when you start to measure it. Maybe we measure too many things, or the wrong things. My ex-colleague Peter Holden is a terrific operational manager. His metric for metrics is to ask whether a particular measurement is a
management report, meaning—in his operations lingo—is that piece of data used to make an active decision in the business?
For example, if we actually store résumés and also the outcomes—whether we hired them, how they did—and then use that data to constantly improve how we select résumés, then that is a management report and that is data worth collecting.
Likewise, we could ask questions in interviews and actually track who answered correctly and who answered incorrectly and whether the answer had any correlation with a candidate’s eventual job performance. Does that sound like too much work? Seriously? Are you drinking the same kool-aid I’m drinking about the importance of hiring good people and the critical need to avoid bad hires?
The bottom line in my interviewing technique is that smart people can generally tell if they’re talking to other smart people by having a conversation with them on a difficult or highly technical subject, and the interview question is really just a pretext to have a conversation on a difficult subject so that the interviewer’s judgment can form an opinion on whether this is a smart person or not.
Or let’s move up a level. Many people like the touchy-feely voodoo approach to interviewing. Joel Spolsky calls certain questions “a pretext to have a conversation on a difficult subject so that the interviewer’s judgment can form an opinion on whether this is a smart person or not.” So maybe the answer to the question can’t be tracked in a neat yes/no, right/wrong way.
But you know what you
can track? How about tracking whether each interviewer is a reliable filter? Do you keep statistics for which interviewers let too much Spam through, for which interviewers are so conservative that they statistically must be turning good people (Hams) away?
No? I must be honest with you. Until now, neither did I. Although I do not speak for
Mobile Commons, I’ll bet we will be discussing it soon. We’re serious about growing, we’re serious about hiring really good people, and we don’t want to put on the blinders and demand “Five years of JEE.” Which means we want to talk to a lot of people who are “Smart and Get Things Done.” And which also means we need to get really, really good at bringing good people on board.
Which means we want to ask the questions that actually help us distinguish the best from the not-so-best. Which brings me back to my interesting question above, and why I won’t say whether it’s good or bad.
Because I haven’t trained my filter by asking it of a representative sample and then determining the correlation between a supposedly correct answer and actual fitness for the job.
And the only way to know if it is useful is to incorporate it into a classifier and see if it collects a high conditional probability.
SummaryI am not suggesting that naïve Bayesian filters can outperform human interviewers, or that asking fuzzy questions like “How would you design a Monopoly game” have no place in hiring, or that an experienced programmer cannot tell if another person is an experienced programmer by talking to them.
I am especially not suggesting that people do not make false statements: many of the people I have interviewed in my career really believed that working on one Java application for two years made them experienced programmers with strong OO architecture skills.
But as stated clearly above, I am claiming that someone with at least a basic understanding of document classification knows how to apply what we know about document classification to the problem of selecting candidates to phone screen based on their résumés. I am claiming that what we know about training classification systems can be applied to improving the hiring process.
And mostly I am claiming that when we take a single question or feature, like "Years or experience," or perhaps, "Ability to write Fizzbuzz in an interview," the correct way to reason about its applicability to the hiring process is to think of its statistical correlation to our objective, not to try to construct a chain of if statements.
If you find this interesting, Games People Play discusses what to do about the fact that candidates will say or do anything to get a job, including lie about their experience.
An ApologyRemember I told you I was wrong about thinking something sucked?
Did you ever take that test yourself? Deckard?
Once upon a time, I was asked an interview question, and I gave a very thorough answer, including all of the usual correct answers plus an unusual nuänce, a corner case that most people probably would have skipped. It cost me the job, as it turned out, the interviewer told me I was mistaken.
I carried that on my back for years, even though the job probably wasn’t all that great a fit.
But now, I realize that worrying about answering the question correctly is thinking in if statements. If I get it correct, then I must be fit for the job. Not true at all. There could be a classifier question where there is a strong
reverse correlation between getting the question correct and confidence in classifying you as “Ham.”
3The
only thing that matters about that interviewer is whether, on the whole, he does a good job of separating Ham from Spam. Perhaps he does, in which case I was simply one of those statistical necessities, a false negative. Or equally valid, the question itself may have been highly valid, as was his interpretation of the answer: the only thing that matters might have been that answering in the manner the interviewer expected was highly correlated with job success, and that answering in the manner I did was negatively correlated with job success.
Naïve classifiers are brutal in that way. They don’t work the way you expect them to work. Spam filters give relevance to all sorts of words you wouldn’t expect. Or to phrases you don’t expect (thanks to interesting work with Markov Models). It’s a precise, bloodless process.
It isn’t personal. And for that reason, we really ought to back away from thinking about hiring in if statements. It’s a path that leads right towards taking it personally. As an interviewee, we take questions or puzzles that we find difficult very personally. We get angry if we are asked things we consider irrelevant to the job. Secretly, we want interviewers to validate our worth, not just by saying “Hire,” but by valuing the things we value about ourselves, which means we look for interviewer to have if statements that align with our notions of competence.
And as interviewers, it is difficult to take ourselves out of the equation. If we only hire people just like us, we have no opportunity to learn and improve on our hiring practices. Hiring people unlike ourselves is hard if we hire with if statements. It requires valuing our incompetence instead of our competence.
Approaching the problem as a problem in classification is our road out of that emotional swamp. It’s a process we can explain and understand without being personal, without judging ourselves as people or our candidates.
With this new understanding, I apologize to that interviewer for my criticism of the interview process. I will try to improve my approach to discussing interviewing and interview questions in the future.
- There are a lot of classification algorithms, and this essay is not a claim that Naïve Bayes is ideal for any or all hiring purposes. But I use it as an example because most people understand spam filters and roughly how they work. [back]
- Although this isn’t the subject of the essay, please feel free to use this question in the following manner: If you find yourself in an interview where the interviewer bombards you with puzzle after puzzle in an effort to impress you with how smart he is, when he folds his arms and asks you if you have any questions for him, pull this one out. Let me know how it goes :-) [back]
- Reverse-engineering classifiers can be futile, but one can imagine a question that reveals the person answering it is highly overqualified for a basic clerking job. Or something. [back]