By: Brandon Plaster
We oftentimes look at the use of algorithmic systems based on large datasets as a way to create more objective systems. In the case of search engines, we rely on the results to be unbiased and in the case of recommendation systems, we rely on the suggestions to be in our best interest. The issue, however, is that “objectivity” is relative to the system to which it refers. A system may be locally objective while being globally subjective, and as such, an algorithm can only be as objective as its systems underlying data. And so then what happens when the data is biased? What happens, for example, when data originates from a society founded in racism? Do we, as technologists, have a responsibility to embed ethics into our algorithms?
In 2013, Harvard University professor Latanya Sweeney noticed that when she searched her name on Google, that she was served advertisements for background checks . This, she perceived, as hinting to employers that she may have been previously arrested. She hypothesized that “racially associated names”, specifically those categorized as either primarily identifying as white names or primarily identifying as black names would result in different ads being served. Using a sampling of 2000 names, she found that for white-identifying names, 80% were served background check advertisements, whereas for black-identifying names, 92% were served . She contrasted these numbers with 2011 FBI statistics that state that white individuals account for nearly 70% percent of all arrests while black individuals only account for roughly 30%.
When asked to respond, both Google and InstantCheckmate.com (the company that used Google to advertise background checks) claimed that neither used racial profiling to serve advertisements. While this may be true, the nature of algorithmic recommendations is that a machine could theoretically take in a set of features, which on the surface appear to be unbiased (i.e. first name, last name, address), and given the set of metrics used to optimize the system, may uncover biases. In the instance with Google, in theory, the system could have taken the search query, i.e. the name, associate it with a class of similar names or categories, and then associate that category (effectively race) with criminal record. Alternatively, if the algorithm used total clicks as a metric, if more people clicked on those ads when having searched for black-identifying names, then the system could have been adjusted to serve more of those ads with that query. While this is all theoretical, the underrepresentation of minorities in the tech industry, ~70% in the U.S. being white, has the potential to lead to a lack of understanding and sensitivity to these issues . And by presenting biased correlations to the users of the system, the system is perpetuating the bias in the minds of the users. In fact, this incident is not the only case which this type of algorithmic racism has occurred.
Similar to tech, the film industry is notorious for being disproportionately white, as illustrated by this and last year’s Oscar ceremonies . (Specifically, all 20 Oscar nominations this year were for white actors.) This skewed dataset has a significant impact on algorithmic intermediaries dealing with the film industry . In February of 2016, April Joyner, a writer for MarieClaire, essayed about this exact issue with Netflix. After watching a set of movies that featured black actors, Netflix recommended her a category of movies whose only underlying connection was that they contained black actors, effectively lumping all “black” films together, regardless of genre . Though it’s likely that the system associated movies based on what films were viewed together, rather than the actors starring in them, does this justify this sort of closed-eye classification. While Joyner goes on to say that there are some benefits to this, i.e. creating greater visibility for less well known films, this discovery to her was also troubling, in that many of these films may not generally be discovered if they are only lumped in single categories, in this case, being a racial category.
This sort of classification of content may be autonomous, but what is to be done when the classification of content leads to the segregation of content? For issues such as movies and search queries, it may seem inconsequential, but as the digital world has greater reaching impact on the physical world, how do we make sure the objectiveness of the system is not just a local maxima? Imagine going online to look for a new place to live. The website takes in your data, algorithmically decides the best places for you to live, and then suggests them. But what happens when the algorithm decides that the optimal place for you to live is with people just like you? Similar to how Facebook’s news feed presents us a homophilic political bubble , what’s to stop algorithms from racially segregating communities? How do we clean data from these sorts of societally-driven prejudices? And does it mean that we have to bias the system in order to enforce ethically unbiased results?