A Blog by Jonathan Low

 

Dec 7, 2020

Why Research Finds Hiring Algorithms Designed To Be 'Fair' Contain Gender Biases

Biases inherent in algorithms and artificial intelligence, especially with regard to facial recognition and hiring have become widely acknowledged by researchers, especially in industrial and organizational psychology. 

In response, attempts are being made to de-bias or create unbiased 'fair' models. But new research has found even these can be affected by job context, employer preferences and interaction with other factors in the data, suggesting that human oversight in hiring remains essential for the foreseeable future. JL


Kyle Wiggers reports in Venture Beat:

Researchers examining the effect of “fair” ranking algorithms on gender, (found) ostensibly debiased ranking algorithms treat job candidates inconsistently. Types of ranking algorithms, job contexts, and inherent biases of employers interact with each other. While fair ranking algorithms help boost the number of underrepresented candidates hired, their effectiveness is limited by job contexts in which employers have a preference for particular genders. They found fair ranking to be more effective when underrepresented candidates (women) are similar to those overrepresented (men), but ineffective at increasing representation when employers attempt “demographic parity.” Ranking algorithms are widely used on hiring platforms like LinkedIn, TaskRabbit, and Fiverr. Because they’re prone to biases, many of these platforms have taken steps to ensure they’re fair, balanced, and predictable. But according to a study from researchers affiliated with Harvard and Technische Universität Berlin, which examined the effect of “fair” ranking algorithms on gender, even ostensibly debiased ranking algorithms treat certain job candidates inconsistently. 
The researchers specifically looked at the algorithms used on TaskRabbit, a marketplace that matches users with gigs like cleaning, moving, and delivery. As they note in a paper describing their work, TaskRabbit leverages ranking algorithms to sort through available workers and generate a ranked list of candidates suitable for a given task. Since it directly impacts livelihoods, if the underlying algorithms are biased, they could adversely affect underrepresented groups. The effects could be particularly acute in cities like San Francisco, where gig workers are more likely to be people of color and immigrants. 
The Harvard coauthors studied how biases — specifically gender biases — percolate in TaskRabbit and impact real-world hiring decisions. They analyzed various sources of biases to do so, including the types of ranking algorithms, job contexts, and inherent biases of employers, all of which interact with each other. 
The researchers conducted a survey of 1,079 people recruited through Amazon Mechanical Turk using real-world data from TaskRabbit. Each respondent served as a “proxy employer” required to select candidates to help them with three different tasks, namely shopping, event staffing, and moving assistance. To this end, recruits were shown a list of 10 ranked candidates for each task and asked to select the top 4 in each case. Then, they were given ranked lists generated by one of the three ranking algorithms — one that ranked candidates randomly (RandomRanking), one that ranked candidates based on their TaskRabbit scores (RabbitRanking), and a “fair” ranking algorithm (FairDet-Greedy) — or versions of the algorithms that swapped the genders of candidates from male to female and vice versa. 
In their analysis, the researchers found that while fair ranking algorithms like FairDet-Greedy are helpful in boosting the number of underrepresented candidates hired, their effectiveness is limited by the job contexts in which employers have a preference for particular genders. The respondents were less likely to choose women for moving jobs compared with men, for example, and less likely to hire men for event staffing than women. 
The researchers also report that they found fair ranking to be more effective when underrepresented candidates (e.g., women) are similar to those who are overrepresented (e.g., men). But they also found fair ranking to be ineffective at increasing representation when employers attempt to represent “demographic parity” — i.e., when they actively try but sometimes fail to make a diverse choice. 
“Our study reveals that fair ranking can successfully increase the opportunities available to underrepresented candidates. However, we find that the effectiveness of fair ranking is inconsistent across job contexts and candidate features, suggesting that it may not be sufficient to increase representation outcomes in all settings,” the researchers wrote. “We hope that this work represents a step toward better understanding how algorithmic tools can (or cannot) reduce gender bias in hiring settings.” 
Bias in hiring algorithms is nothing new — in a recent example, Amazon scrapped a recruiting engine that showed a clear bias against women. But it’s becoming more relevant in light of the fact that a growing list of companies, including Hilton and Goldman Sachs, are looking to automate portions of the hiring process. In fact, some 55% of U.S. human resources managers said AI would be a regular part of their work within the next five years, according to a 2017 survey by talent software firm CareerBuilder. 
A Brookings Institution report advocated several approaches to reduce bias in algorithms used in hiring, including identifying a range of model inputs that can be predictive across a whole population and developing diverse datasets containing examples of successful candidates from a variety of backgrounds. But the report also noted that these steps can’t necessarily be taken by debiasing a model. 
“Algorithmic hiring brings new promises, opportunities, and risks. Left unchecked, algorithms can perpetuate the same biases and discrimination present in existing hiring practices,” the Brookings report reads. “Existing legal protections against employment discrimination do apply when these algorithmic tools are used; however, algorithms raise a number of unaddressed policy questions that warrant further attention.”

0 comments:

Post a Comment