INSUBCONTINENT EXCLUSIVE:
The latest research from OpenAI put its machine learning agents in a simple game of hide-and-seek, where they pursued an arms race of
ingenuity, using objects in unexpected ways to achieve their goal of seeing or being seen
This type of self-taught AI could prove useful in the real world as well.
The study intended to, and successfully did look into the
possibility of machine learning agents learning sophisticated, real-world-relevant techniques without any interference of suggestions from
the researchers.
Tasks like identifying objects in photos or inventing plausible human faces are difficult and useful, but they don''t
really reflect actions one might take in a real world
They&re highly intellectual, you might say, and as a consequence can be brought to a high level of effectiveness without ever leaving the
computer.
Whereas attempting to train an AI to use a robotic arm to grip a cup and put it in a saucer is far more difficult than one might
imagine (and has only been accomplished under very specific circumstances); the complexity of the real, physical world make purely
intellectual, computer-bound learning of the tasks pretty much impossible.
At the same time, there are in-between tasks that do not
necessarily reflect the real world completely, but still can be relevant to it
A simple one might be how to change a robot facing when presented with multiple relevant objects or people
You don''t need a thousand physical trials to know it should rotate itself or the camera so it can see both, or switch between them, or
whatever.
OpenAI hide-and-seek challenge to its baby ML agents was along these lines: A game environment with simple rules (called
Polyworld) that nevertheless uses real-world-adjacent physics and inputs
If the AIs can teach themselves to navigate this simplified reality, perhaps they can transfer those skills, with some modification, to
full-blown reality.
Such is the thinking behind the experiment, anyway, but it entertaining enough on its own
The game pits two teams against one another in a small 3D arena populated with a few randomly generated walls and objects
Several agents are spawned in it and the &hiders& are given a few seconds to familiarize themselves with the environment and hide
They can also interact with the objects, pushing them around and locking them in place
Once the &seeker& looks up they have a certain amount of time to spot the hiders.
All the machine learning program was informed of were the
basic senses — each agent has a view of the world and a sort of &lidar& to tell them the positions of nearby objects — and the knowledge
that they can move objects around
But beyond that they were simply given a point when they succeeded at their job, either seeking or hiding — that their motivation.
From
these basic beginnings came wonderfully interesting results
At first the teams essentially moved randomly
But over millions of games the hiders found that by moving in a certain way — &crudely& running away — they could increase their points
And thus the arms race began.
Since the hiders were running away, the seekers began to develop more structured pursuit behaviors
But then came the tools.
Hiders found that they could push and pull objects on the field, and began constructing forts out of them.
The
seekers responded by using ramps found on the field to ascend over the objects and crash the forts.
Hiders began looking for ramps while the
seekers were still counting and either stealing them or locking them in place (the lock can''t be undone by an agent of the opposite team)
Interestingly this involved a certain amount of division of labor, because there often wasn''t enough time for both hiders to go out in
search of ramps; one had to prepare the fort while the other went on ramp patrol.
The OpenAI team thought maybe that was the end of it,
but the seekers learned to exploit an unintended feature of the game world
They discovered that they could &push& a box while riding on top of it, meaning they could put it near a ramp, climb onto it, then &surf& it
all over the field and find those naughty hiders.
Of course the hiders responded by locking every object they weren''t using to construct
their fort — and that seems to be the end of the line as far as strategy in this game.
So what the point? As the authors of the paper
explain, this is kind of the way we came bout.
The vast amount of complexity and diversity on Earth evolved due to co-evolution and
competition between organisms, directed by natural selection
When a new successful strategy or mutation emerges, it changes the implicit task distribution neighboring agents need to solve and creates a
new pressure for adaptation
These evolutionary arms races create implicit autocurricula whereby competing agents continually create new tasks for each other.
Inducing
autocurricula in physically grounded and open-ended environments could eventually enable agents to acquire an unbounded number of
human-relevant skills.
In other words, having AI models compete in an unsupervised manner may be a far better way to develop useful and
robust skills than letting them toddle around on their own, racking up an abstract number like percentage of environment explored or the
like.
Increasingly it is difficult or even impossible for humans to direct every aspect of an AI abilities by parameterizing it and
controlling the interactions it has with the environment
For complex tasks like a robot navigating a crowded environment, there are so many factors that having humans design behaviors may never
produce the kind of sophistication that necessary for these agents to take their place in everyday life.
But they can teach each other, as
we&ve seen here and in GANs, where a pair of dueling AIs work to defeat the other in the creation or detection of realistic media
The OpenAI researchers posit that &multi-agent autocurricula,& or self-teaching agents, are the way forward in many circumstances where
other methods are too slow or structured
They conclude:
These results inspire confidence that in a more open-ended and diverse environment, multi-agent dynamics could lead to
extremely complex and human-relevant behavior.
Some parts of the research have been released as open source
You can read the full paper describing the experiment here.