Baby’s use older people as their main point of reference to understand the world. Whether it’s a parent, a random adult or a child that’s just a couple years older: everybody is a reference. The more time a baby spends with a specific person, the more it will see him or her as their example. They will copy them too. Baby’s will mimic their reference’s behavior, but also their vocabulary and ultimately their thoughts. A baby doesn’t have any references other than the ones that are around him or her the most. These references create the way people absorb information, how it gets saved into their memory and how they use that information- which is now biassed by their own reference- to make decisions.
Being biassed is alright — sometimes
We don’t really refer to people as being biased. We all just have our references and preferences. It’s actually what makes us human. It just means that the information that we have isn’t objective but rather formed by opinions or influences from other people. We look at information through glasses that are colored by our opinion. If two people hear the same story, they will react to it in different ways. They still might share the same emotion, for example when hearing about a tragedy, but they will both associate different feelings and memories to that event. Your own experience, upbringing, and many other factors will determine what that information means to you and how you wish to deal with it. Someone might find the news devastating, while someone else who’s been through a lot, might not be as shocked. This means that world events don’t mean the same to everyone.
Also, this means that being biased is not a disaster. Actually, it’s what makes us human and even sets us apart on an individual level. I love what you hate and you hate what I love. It’s great for human interaction and for being your own self. However, it’s not so great for making a decision that should be unbiassed, for the simple fact that we aren’t unbiased. For example: when you hire someone, your job is to find the best fit for the job at hand and for the company itself. This means you would look at their resume and determine if the person qualifies and if he or she is friendly. But that’s not how we work. You still take into account how you personally feel about that person and his or her story. Someone might have the best papers for the jobs, but if he or she was running a little late for their interview and you personally hate it when people are late, it could influence their chances. If you, however, are not that driven by punctuality, they might get away with it and remain the best candidate.
We load AI with clean data — or do we?
So how does our opinion affect data when looking at the technical side? We feed applications with historical data related to the topic we want it to learn about. Next, we tell it what type of results we would call “a success”. In other words, we tell it what to look for and what to learn. The algorithm starts scanning the data and runs searches to find patterns. Once it has run its tasks, it spits out the results and adds these results into its “historical data”. This is how an algorithm learns. It gets a starting point and then stacks information coming from its results to broaden its knowledge. But the starting point is determined based on the data that was available before the system was invented, hence it’s data that humans created.
For example when creating a system for hiring people. You feed it with old resumés of people who successfully work or worked at your company because you want to hire people who’s profile looks like theirs. This is what we call the “historical data”. You strip it off of any obvious characteristics that would prevent the algorithm from making an unbiased decision. Now you have a poule of comparable data. Then you start putting in fresh resumés, what we call new data. We strip that data too. We remove the name, genders, place of birth, photos, everything. All we leave is a resumé filled with experiences and job roles. So all the algorithm has got left to work with is plain text without any noticeable characteristics. Or does it?
What does AI use to determine success?
It all starts with the historical data, so basically everything that we have gathered up until this point. That could be employee profiles, customer email, number of houses sold per year in a certain area….everything we gather data on. Then we use that entire set of data — the historical data — and teach the algorithm what type of results we are looking for. That’s what we call “a success”. In other words: if you can find us an outcome and X and Y we would be happy.
So the algorithm looks at the old data and compares that to everything we put in from now on. It looks at patterns and signals data that matches the result we are looking for. Then it tells itself that is has reached success because it has spotted new data that matches the historical data. It then saves the new data into its existing dataset, making it a part of historical data. That makes sense because the contents of that new data match the contents of the historical data. All the boxes are checked and it’s now a part of the ever-growing dataset.
But where does the original data come from? Well, actually, we are the ones who create historical data. To start, you need a certain point. In the example of the resumé’s, the old and current employees are used. Great! But now look at where the old resumes are coming from. The people who were hired before AI took part, offered candidates a job based on their own preferences. So in that sense, the pre-AI data is biassed based on the likes and dislikes of the human recruiter. Chances are that his or her personal view on the job role has got some influence on who he or she prefers to hire for that specific job. Meaning, he or she probably didn’t hire the best fit just according to the company, but also the best fit according to his or her own bias.
Can AI un-bias us?
So let’s say: a former recruiter had a personal preference for hiring men for technical jobs. That could be a personal preference or maybe even a sexual bias. No worries! We could assume that AI would just look for the best candidates based upon their resumé. But that’s not entirely true. Even if we strip the resumé’s from characteristics like a name, sex, and photos, the AI would still pick out male resumes. Not because we told it to. We only put in raw data about their experience and previous jobs. The reason AI can still separate men from women without know who’s who is because Artificial Intelligence looks for patterns.
As it turns out, men and women have a different type of using language. For example, men tend to use slightly more directive wording while women use words that are softer. Women have a more descriptive way of communicating. Without us knowing it, the algorithm has figured out that the historical data for this job is mostly men. Actually, it doesn’t KNOW it’s men, but it has spotted a pattern that only men have. So what does it do? It selects all-male resumé’s from the new data because they have the same way or communicating as the ones in the historical data. This means that data is already polluted with human biasses even before we put it in a neutral system.
Like I said before, the historical data of uploaded resumés is based on the preference of the human recruiter. This is a good example of why we always need people checking and understand why certain results are being produced by algorithms. We can hardly prevent data from being biassed because most current data is produced by humans with human opinions. But if we learn to understand why algorithms are feedings us certain results, we can try and filter out as much as we can. And maybe it’s not even a bad thing to have humans make the final call, even if it’s based on human opinions and experience.
What do you think?
Will we reach a point of super clean data in which we can let AI run autonomously or will be always be dealing with biassed data?