school do. Meanwhile, basic credentials like graduate degrees and certifications are valued in applications, but they "have little or no power to explain variation in performance across teachers,” according to one recent study published in March by researchers at the University of Michigan, Columbia University, and Harvard Graduate School of Education.
This study used unique access to applicants, hires, and teacher performance in Washington, D.C., Public Schools. Like many similar papers, it found that undergraduate performance and scores on a teacher screening test both strongly predicted teacher effectiveness. (“Effectiveness,” in this case, was measured using D.C.'s IMPACT system, which evaluates teachers based on their year, subject, classroom observation grades, and reports from principals and assistant principals.)


The researchers found that DC public school principals consistently missed the best teaching prospects. For example, they hired more people who went to college in D.C. (which had nothing to do with better teaching) but ignored SAT scores and GPA (both of which were “significantly positively related to performance”). Some principals might assume that high-achieving workers will feel overqualified teaching in public school and quit after a year. But the study found no correlation between academic credentials and attrition. Instead, their basic conclusion was a stinging indictment of teacher hiring: The attributes of the best teaching prospects were "only weakly, if at all, associated with the likelihood of being hired.”
***
Teaching puts one individual in charge of a classroom. That’s different than being a member of a large product team within a larger company. Hiring team members requires filtering for different hard and soft skills, so that new employees can slip into established patterns of company behavior. In this case, many companies depend on asking their employees to double as HR recruiters by leaning on referrals.
Researchers have long known that referrals surface better job candidates. Referred candidates are more likely to get call backs, more likely to be hired, and more likely to stay at the company. Furthermore, they had a pretty good theory about why referrals work. Most hiring is a blind date, and referrals are an introduction. They give both sides a little bit more certainty and information about fit. But academics couldn’t figure out why referred candidates were actually better. A May 2013 paper suggests an simple answer: Company referrals don’t work because they yield smarter workers. They work because they yield better fits.
The study found that referrals produce "substantially higher profits per worker” who are "less likely to quit," "more innovative,” and "have fewer accidents—all this, even after controlling for factors like college, SAT scores, and IQ. Team-based companies require openness, compatibility, and a willingness to cooperate. Referral programs work because great employees pass along workers who similarly match the company culture.
Although they account for only six percent of total applications, referrals now result in more than a quarter of all hires at large companies, according to a recent paper from the Federal Reserve Bank of New York and MIT. But while referrals are extremely useful, they can create their own problems. Many industries—tech and media, for starters—are infamous for disproportionately hiring white, upper-middle class young men who went to elite colleges. Relying exclusively on referrals could deepen workplace homogeneity.
What’s more, referrals help winnow the applicant pool, but that’s not nearly enough. As the New York Fed study showed, the majority of jobs are still filled without referrals and the majority of referred candidates are still rejected. More important than a strong referral program is a strong interview process. How does a hiring manager distinguish between a merely acceptable candidate and the great one, without devoting thousands of hours learning the secret talents, hobbies, and motivations of every single applicant?
Google, which depends on referrals, once administered up to 25 interviews for each job candidate. Todd Carlisle, the organizational psychology doctorate who administered the company’s surveys in 2006, thought this might be overkill. He tested exactly how many interviews were necessary to be confident about a new hire. The right number of interviews per candidate, he discovered, was four. This new policy, which Google calls the Rule of Four, "shaved median time to hire to 47 days, compared to 90 to 180 days,” Laszlo Bock wrote in his book Work Rules.
But Carlisle’s research revealed something deeper about the hiring process, which has resonance for every industry: No one manager at Google was very good, alone, at predicting who would make a good worker.
Four meticulously orchestrated Google interviews could identify successful hires with 86 percent confidence, and nobody at the company—no matter how long they had been at the company or how many candidates they had interviewed—could do any better than the aggregated wisdom of four interviewers. (Okay, technically, one employee could: a data center worker, named Nelson Abrasion, who interviewed exclusively for a "very distinctive skill set.”)
There are several reasons to aggregate interview scores. First, everybody is a little bit biased in one direction or another—toward older or younger, toward extroverts or introverts. Combining scores mitigates that inevitable bias. Second, sometimes people just have bad interviews, and it’s unreasonable to base an entire hiring decision exclusively on one 30 minute performance.
Third Google’s finding suggests there are no magical hirers in the world. There are no performance oracles who just know a good candidate when they see it.
This is perhaps the most interesting an important conclusion. In a November 2015 study, researchers looked at 15 firms that used a job test, which placed applicants into three buckets: green (positive), yellow (tentative), or red (negative). These job tests accurately predicted worker performance and retention. The greens stayed longer than the yellows. The yellows stayed longer than the reds.
But sometimes, the human-resources managers ignored the data and went with their gut. Why? Perhaps the managers thought they knew better than a cold piece of data. But these "gut” picks were busts, according to Mitchell Hoffman at the University of Toronto, Lisa Kahn at Yale University School of Management, and Danielle Li at Harvard Business School. When managers thought they were smarter than the hiring systems they set up, it's the systems that ended up looking smart.
It will always be difficult to predict fit and performance, because humans are complex and humans interacting in human systems are even more complex. The right lesson is more subtle: Hiring is hard, and nobody is very good at doing it alone, whether you’re a Google boss, a high-school principal, or a sports general manager. They need help—sometimes in the form of standardized tests, and sometimes in the form of aggregated interview reports. When it comes to identifying the best future talent, groups are better than individuals, data-plus-groups is better than groups alone, and nearly anything is better than brainteasers.