Recent research from Stanford and Google has made the worst nightmare of some concerned with artificial intelligence (AI) all the more real. A machine learning agent was caught cheating by hiding information in “a nearly imperceptible, high-frequency signal.”
Clever, but also creepy.
The agent was instructed to turn aerial images into street maps and back again as part of research to improve Google’s process of turning satellite images into the widely used and relied upon Google Maps. The process involves CycleGAN, “a neural network that learns to transform images of type X and Y into one another, as efficiently yet accurately as possible.” Though the agent was performing this task quite well, it quickly became apparent that it was performing the task too well.
While the agent was supposed to be turning a satellite image into a street map and then back into a satellite image again using only the street map, details missing only on the street map version showed the research team that this wasn’t actually occurring. As seen in the example below, details such as skylights can be seen on both aerial versions of the map despite being absent on the street map (center).
What this ultimately means is that the agent wasn’t acting on its own accord or for malicious purposes, it was simply doing exactly what it was told to do – a “problem” with computers that has existed since their inception.
In order to figure out exactly what the machine learning agent was doing, the research team delved into the data generated by the AI. It turns out, because the agent was being graded on how close the resulting aerial map matched the original as well as the clarity of the generated street map, it didn’t learn how to make one using the other. What it did do was learn how to encode features present on the original satellite map into the noise of the street map via tiny color changes undetectable by the human eye. The agent would then read the secret messages it left for itself when creating the third image in the series.
The agent excelled to such an extent that it was actually able to encode any satellite map into any street map without paying attention to the street map. This means, the data to create the resulting aerial image was encoded in features already present on a street map, even on a completely different street map. An example of this can be seen below.
Encoding data into images, a process called steganography, certainly isn’t anything new. The first recorded use of steganography dates all the way back to 1499 when Johannes Trithemius wrote a book on cryptography and steganography, called Steganographia. Today the process is frequently used when adding watermarks images or metadata to image files. What is new is that a machine learning agent was able to create its own steganographic method to avoid actually learning to do a task.
So are computers getting smarter? Or are they just really good at cheating in a way that humans can’t easily detect? Chances are it’s the latter and we don’t have to worry about our worst AI nightmares coming true anytime soon. What we can deduce, however, is that for AI to do exactly what we want it to, we have to be really specific with our instructions. Especially when we’re instructing it to solve problems quickly and efficiently.