Amazon’s facial recognition technology may be misidentifying dark-skinned women, according to U of T Engineering Science undergraduate Inioluwa Deborah Raji and Massachusetts Institute of Technology Media Lab research assistant Joy Buolamwini. This finding helped Raji and Buolamwini win “best student paper” at the Artificial Intelligence, Ethics, and Society (AIES) Conference in Honolulu, Hawaii. Held in January, the prestigious conference was sponsored by Google, Facebook, Amazon, and the like.
Their paper, which caught the Toronto Star’s attention, was a follow-up on an earlier audit by Buolamwini on technology from Microsoft, IBM, and Face++, a facial recognition startup based in China.
Origins of the research
Buolamwini’s earlier study, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” investigated the accuracy of artificial intelligence (AI) systems used by the three technology firms for facial recognition. Then-Microsoft Research computer scientist Timnit Gebru co-authored the paper.
Raji wrote that after reading about Buolamwini’s experiences “as a black woman being regularly misgendered by these models,” she wondered if her personal experience would hold true for a larger dataset containing samples of other dark-skinned women. This proved to be the case in the final analysis.
According to Raji, “Gender Shades” uncovered “serious performance disparities” in software systems used by the three firms. The results showed that the software misidentified darker-skinned women far more frequently than lighter-skinned men.
In an email to The Varsity, Raji wrote that since the release of Buolamwini and Gebru’s study, all three audited firms have updated their software to address these concerns.
For the paper submitted to the AIES Conference, Raji and Buolamwini tested the updated software to examine the extent of the change. They also audited Amazon and Karios, a small technology startup, to see how the companies’ adjusted performance “compared to the performance of companies not initially targeted by the initial study.”
At the time of Raji and Buolamwini’s follow-up study in July, Raji wrote that “the ACLU [American Civil Liberties Union] had recently reported that Amazon’s technology was being used by police departments in sensitive contexts.”
Amazon denied that bias was an issue, saying that it should not be a concern for their “partners, clients, or the public.”
Raji and Buolamwini’s study showed evidence to the contrary. “We found that they actually had quite a large performance disparity between darker females and lighter males, not working equally for all the different intersectional subgroups,” wrote Raji.
Amazon’s response to the study
In a statement sent by Amazon’s Press Center to The Varsity, a representative wrote that the results of Raji and Boulamwini’s study would not be applicable to technologies used by law enforcement.
Amazon wrote that the study’s results “are based on facial analysis and not facial recognition,” and clarified that “analysis can spot faces in videos or images and assign generic attributes such as wearing glasses,” while “recognition is a different technique by which an individual face is matched to faces in videos and images.”
“It’s not possible to draw a conclusion on the accuracy of facial recognition for any use case – including law enforcement – based on results obtained using facial analysis,” continued Amazon. “The results in the paper also do not use the latest version of Rekognition and do not represent how a customer would use the service today.”
In a self-study using an “up-to-date version of Amazon Rekognition with similar data downloaded from parliamentary websites and the Megaface dataset of 1M images,” explained Amazon, “we found exactly zero false positive matches with the recommended 99% confidence threshold.”
However, Amazon noted that it continues “to seek input and feedback to constantly improve this technology, and support the creation of third party evaluations, datasets, and benchmarks.” Furthermore, Amazon is “grateful to customers and academics who contribute to improving these technologies.”
The pair’s research could inform policy
Raji wrote that while it’s tempting for the media to focus on the flaw in Amazon’s software, she thinks that the major contribution of her paper is in helping to uncover how researchers can effectively conduct and present an audit of an algorithmic software system to prompt corporate action.
“Gender Shades introduced the idea of a model-level audit target, a user-presentative test set, a method for releasing results to companies called Coordinated Bias Disclosure,” wrote Raji.
In other words, Raji and Buolamwini’s research showed an effective way for companies and policymakers to investigate and communicate a problem in software systems and take action.
Most importantly, wrote Raji, the studies highlight the need for researchers to evaluate similar software models “with an intersectional breakdown of the population being served.”