Design of hiring algorithms can double diversity in firms
For example, hiring algorithms use information on workers they have previously hired in order to predict which job applicants they should now select. In many cases, relying on algorithms that predict future success based on past success will lead firms to favor applicants from groups that have traditionally been successful.
But this approach only works well if the world is static and we already have all the data we need. In practice, this simply is not the case. Women, for instance, have been entering STEM fields in record numbers, but if firms used their historical employment data to decide whom to hire, they would have very few examples of successful female scientists and engineers. At the same time, the qualities that predicted success (November 22, 2020) may not continue to apply today: just think of how remote work during the pandemic has changed the nature of teamwork, communication, and teaching.
So instead of designing algorithms that view hiring as a static prediction problem, what if we designed algorithms that view the challenge of finding the best job applicants as a continual learning process? What if an algorithm actively seeks out applicants it knows less about, in order to continuously improve our understanding of which candidates will be a good fit?
While there is a growing body of work on the potential gains from following algorithmic recommendations, especially in hiring, no one has examined how algorithm design can shape the quality of firms’ hiring decisions and access to opportunity for job applicants.
In a recent study with my colleagues Lindsey Raymond and Peter Bergman, we sought to do just that. We developed and evaluated hiring algorithms that are designed to explicitly value exploration, in order to learn about people who might not have been previously considered for jobs. The algorithm incorporated exploration bonuses that increase its degree of uncertainty about quality. Those bonuses tend to be higher for candidates who are “underrepresented” in the firm’s existing data, which could mean applicants with unusual majors, who attended less common colleges, who have different types of work histories, who are demographically underrepresented at the firm, etc.
We focused on the decision to grant first-found interviews for positions in consulting, financial analysis, and data science—sectors that offer well-paid jobs, and have also been criticized for their lack of diversity. We analyzed records of job applicants to these types of positions with a Fortune 500 firm. Like many of its peers, the company receives a large number of applications and rejects the majority of candidates on the basis of an initial automated résumé screen. Of those who pass the screen and move on for an interview, hiring rates are still low. Only 10% ultimately receive an offer.
We built and tested three résumé screening algorithms to compare hiring outcomes. Our first model followed a typical static supervised learning approach (SL), which relied on past data sets to make predictions about who would be successful. The second model was similar, except that it updated the training data used throughout the test period with hiring outcomes of applicants selected for interviews. We called this the updating SL model. The third model was the one I mentioned above, which values exploration. We called this the UCB model, for its implementation of an “upper confidence bound.”
We then evaluated the candidates that each algorithm selected relative to each other and to the actual interview decisions made by human recruiters in the firm. Although we evaluated the diversity of applicants selected by the algorithms, we did not incorporate any explicit diversity preferences in their design.
We found significant differences in the candidates selected by the exploratory versus static algorithms. The UCB model more than doubled the share of selected applicants who are Black or Hispanic, from 10% to 23%. In comparison, the SL and updating SL models decreased Black and Hispanic representation to approximately 2% and 5% respectively.
This increase in diversity from the UCB model was persistent throughout the test sample. This is important because, if the additional minority applicants selected were weaker, the model would update and learn to select fewer such applicants over time. However, the model continued to select more minority applicants relative to both the human and SL models. This suggests that the additional Black and Hispanic candidates the algorithm selected were just as good as other candidates—the firm had simply not given them as many opportunities in the past.
We found a difference in gender results, too. All of the algorithms increased the share of selected applicants who are women, from 35% under human recruiting to 41% with the SL model, 50% with the updating SL model, and 39% with the UCB model.
We think the reason the percentage of women was lower in the UCB model was that men tend to be more diverse in dimensions like geography, education, and race, leading them to receive higher exploration bonuses on average.
Our overall findings were clear: When you incorporate exploration into the algorithm, you improve the quality of talent and hire more diverse candidates. Firms that continue to use static approaches in their algorithms risk missing out on quality applicants from different backgrounds.
This could be a game changer for hiring. If firms want to identify and hire the best applicants, they should consider building algorithms that understand the value of exploration and learning.
Danielle Li is a professor at MIT Sloan School of Management and a coauthor of the NBER working paper “Hiring as exploration.”
(24)