Three ways to avoid bias in machine learning

INSUBCONTINENT EXCLUSIVE:
Vince Lynch Contributor Vince Lynch is CEO of IV.AI, an artificial intelligence company that teaches
machines how to understand human language so companies can better engage, understand and serve their customers. At this moment in
history it impossible not to see the problems that arise from humanbias
Now magnify that by compute and you start to get a sense for just how dangerous humanbiasvia machine learning can be
The damage can be twofold: Influence
If theAIsaid so it must be true… people trust outputs ofAI,so if humanbiasis missed in the training it could compound the problem by
infecting more people; Automation
SometimesAImodels are plugged into a programmatic function, which could lead to the automation ofbias. But there is potentially a silver
machine-learned lining
BecauseAIcan help expose truth inside messy data sets, it possible for algorithms to help us better understandbiaswe haven&t already
isolated, and spot ethically questionable ripples in human data so we can check ourselves
Exposing human data to algorithms exposesbias,and if we are considering the outputs rationally, we can use machine learning aptitude for
spotting anomalies. But the machines can&t do it on their own
Even unsupervised learning is semi-supervised, as it requires data scientists to choose the training data that goes into the models
If a human is the chooser,biascan be present
How the heck do we tackle such abiasbeast We will attempt to pick it apart. The landscape of ethical concerns withAI Bad examples abound
Consider thefindingfrom Carnegie Mellon that showed that women were shown significantly fewer online ads for high-paying jobs than men were
Orrecall the sad case of Tay, Microsoft teen slang Twitter bot that had to be taken down after producing racist posts. In the near future,
such mistakes could result in hefty fines or compliance investigation, a conversation that alreadyoccurring in the U.K
parliament
All mathematicians and machine learning engineersshould considerbiasto some degree, but that degree varies from instance to instance
A small company with limited resources will often be forgiven for accidentalbiasas long as the algorithmic vulnerability is fixed quickly; a
Fortune 500 company, which presumably has the resources to ensure an unbiased algorithm, will be held to a tighter standard. Of course, an
algorithm that recommends novelty T-shirts does not need nearly as much oversight as an algorithm that decides what dose of radiation to
give to a cancer patient
It these high-stakes decisions that will become the most pronounced when legal liability enters the discussion. It important for builders
and business leaders to establish a process for monitoring the ethical behavior of theirAIsystems. Three keys to managing biaswhen building
AI There are signs of existing self-correction in theAIindustry: Researchers arelooking at waysto reducebiasand strengthen ethics in
rule-based artificial systems by taking human biases into account, for example. These are good practices to follow; it important to be
thinking proactively about ethics regardless of the regulatory environment
Let take a look at several points to keep in mind as you work on yourAI. 1.Choose the right learning model for the problem. There a reason
allAImodels are unique: Each problem requires a different solution and provides varying data resources
There no single model to follow that will avoidbias, but there are parameters that can inform your team as it building. For example,
supervised and unsupervised learning models have their respective pros and cons
Unsupervised models that cluster or do dimensional reduction can learnbiasfrom their data set
If belonging to group A highly correlates to behavior B, the model can mix up the two
And while supervised models allow for more control overbiasin data selection, that control can introduce humanbiasinto the
process. It better to find and fix vulnerabilities now than to have regulators find them later on. Non-biasthrough
ignorance — excluding sensitive information from the model — may seem like a workable solution, but it still has vulnerabilities
In college admissions, sorting applicants by ACT scores is standard, but taking their ZIP code into account might seem discriminatory
But because test scores might be affected by the preparatory resources in a given area, including the ZIP code in the model could actually
decreasebias. You have to require your data scientists to identify the best model for a given situation
Sit down and talk them through the different strategies they can take when building a model
Troubleshoot ideas before committing to them
It better to find and fix vulnerabilities now — even if it means taking longer — than to have regulators find them later on. 2
Choose a representative training data set. Your data scientists may do much of the leg work, but it up to everyone participating in
anAIproject to actively guard againstbiasin data selection
There a fine line you have to walk
Making sure the training data is diverse and includes different groups is essential, but segmentation in the model can be problematic unless
the real data is similarly segmented. It inadvisable — both computationally and in terms of public relations — to have different models
for different groups.When there is insufficient data for one group, you could possibly use weighting to increase its importance in training,
but this should be done with extreme caution
It can lead to unexpected new biases. For example, if you have only 40 people from Cincinnati in a data set and you try to force the model
to consider their trends, you might need to use a large weight multiplier
Your model would then have a higher risk of picking up on random noise as trends — you could end up with results like &people named Brian
have criminal histories.& This is why you need to be careful with weights, especially large ones. 3
Monitor performance using real data. No company is knowingly creating biasedAI, of course — all these discriminatory models probably
worked as expected in controlled environments
Unfortunately, regulators (and the public) don&t typically take best intentions into account when assigning liability for ethical violations
That why you should be simulating real-world applications as much as possible when building algorithms. It unwise, for example, to use test
groups on algorithms already in production
Instead, run your statistical methods against real data whenever possible
Ask the data team to check simple test questions like &Do tall people default onAI-approved loans more than short people& If they do,
determine why. When you&re examining data, you could be looking fortwo types of equality: equality of outcome and equality of opportunity
If you&re working onAIfor approving loans, result equality would mean that people from all cities get loans at the same rates; opportunity
equality would mean that people whowould have returned the loan if given the chance are given the same rates regardless of city.Without the
latter, the former could still hide if one city has a culture that makes defaulting on loans common. Result equality is easier to prove, but
it also means you&ll knowingly accept potentially skewed data.While it harder to prove opportunity equality, it is at least valid morally
It often practically impossible to ensure both types of equality, but oversight and real-world testing of your models should give you the
best shot. Eventually, these ethicalAIprinciples will be enforced by legal penalties
IfNew York City early attemptsat regulating algorithms are any indication, those laws will likely involve government access to the
development process, as well as stringent monitoring of the real-world consequences ofAI
The good news is that by using proper modeling principles,biascan be greatly reduced or eliminated, and those working onAIcan help expose
accepted biases,create a more ethical understanding of tricky problemsand stay on the right side of the law — whatever it ends up being.