The Math of Cats and Dogs: AI in Image Recognition

demo

Say you wanted to write a computer program to recognize pictures of cats. A reasonable starting point might be to ask yourself, “What does a cat look like?” A cat is a furry, four-legged animal with a snout and a tail. So break the cat recognition task down into these more basic components, program your computer to perform each of them independently, and then assemble them into a sort of checklist. Turn your computer on and point it at a group of pictures. It grinds away for a minute, then returns its results. Here you go, a picture of a cat.

Well, nobody said computer programming was easy. Still, the fact that our first attempt incorrectly picked out a dog isn’t necessarily fatal for the overall strategy. Maybe our checklist of cat-like characteristics wasn’t long enough, or the individual criteria weren’t specific enough. If you have a team of programmers working for you, assign some of them the task of encoding what makes for a cat-like shape of the face. Assign others the task of recognizing the distinctive qualities of cat bodies. Have weekly meetings at which all team members are encouraged to brainstorm ideas for new cat-like features. This seems like a viable way forward. If we work hard enough, we will eventually get there, right?

Wrong. The consensus opinion in today’s computing world is that you’ll never get there. You could assemble the world’s best team of computer programmers, veterinarians, and photographers to plumb the very essence of what it means to be a picture of a cat, and 50 years later you’d still be looking at lots of pictures of things that aren’t cats.

 

Why can’t traditional programming handle cats?

 

This is disappointing, sure, but it is also surprising. Computers can do amazing things. They can manage the payrolls of giant corporations, play unbeatable games of chess, and pilot spacecraft to Mars. The average person can’t do any of these things. The average person can’t even do arithmetic as well as an electronic calculator that clips onto a keychain. But neither could the programmers who built these machines. What they could do was break things down to their most basic components, understanding the minute details of arithmetic, chess moves, and spaceflight mechanics, then recomposing that understanding into useful programs.

Divide and conquer. It’s a strategy that has a long and varied history of success from chemists with their molecules and physicists with their quarks to programmers with their algorithms. But for some reason it doesn’t work for cats.

Not just cats, of course, or not just cats in particular. At this point in history, identifying commonplace objects in pictures—cats, dogs, people, automobiles, houses—lies at the cutting edge of computer science research. So does understanding simple spoken commands along the lines of “Alexa, what time is it?” So does having a computer drive a car down the street without hitting anything. This is the exciting stuff. This is hard.

These challenges are quite different, but they have a few things in common beyond just being difficult and exciting. They are, for starters, stubbornly resistant to the rational, atomizing, devil’s-in-the-details methods that have proved so successful for chess, spaceflight, and the like. They are also—not always, but often—things that human beings find easy. So easy that we have a hard time describing how we perform the tasks of looking, listening, and moving. We just kind of do it. Finally, in the programming world these tasks currently fall under the general heading of artificial intelligence.

This is in part just a function of the malleability of the term, and the tendency of programmers to apply it to whatever seems difficult at the moment. (For years, playing chess was considered a benchmark of artificial intelligence, until computers got good at playing chess, at which point everyone shrugged and quietly decided that chess had been an inherently computery thing all along.) But there is nevertheless the sense that an aptitude for these things is a distinctly human characteristic. So how do we do it?

 

How humans (and AI) learn about cats

 

How do I do it? How do I recognize a picture of a cat? Well, for starters, I don’t think about it. I just look and I know. I wouldn’t teach a child the concept of cat by breaking it down into its component parts. I would just point at a cat and say “cat”. If the child did the same I would respond positively. If the child misidentified a dog as a cat I’d say “No, that’s a dog,” but I wouldn’t be too much of a stickler. I would point out neighborhood cats. A picture of a cat would count as a cat, as would Hello Kitty stickers, and ocelots. I wouldn’t sweat the details or worry too much about contradictory stimuli. The trick would not to be to impart to the child a clearly articulated understanding of catness, but rather just to ensure they managed to encounter a whole lot of cats.

The insight that has driven much of the last quarter century of artificial intelligence research is that computers are less like scientists and more like children. The key to learning—or at least learning in this peculiarly human domain—is not understanding but experience. And it turns out that when it comes to simulating experience for a computer we have a notion of what to do.

Give me a million pictures of cats. Give me a million pictures of not-cats. To you and I these are images, but to a computer they are pixels, literally just lists of numbers. Digital camera resolution these days is measured in megapixels, which means that each picture is a list of millions of numbers, but that’s okay. Dealing with millions of numbers at a time is what computers are good at.

It turns out that the numbers in a million cat pictures will on average be statistically different from the numbers in a million non-cat pictures. A perceptual difference reduces to a mathematical one.

The latter difference isn’t obvious. We may not characterize the length of dog versus cat snouts, but we do know our statistics. We have some tricks up our sleeves. These tricks often boil down to adding and multiplying the numbers together in elaborate combinations that would be way too tedious and complicated for a human being to manage, but hey, look, we’ve got these computers lying around, and adding and multiplying numbers happens to be the one thing they are really, really good at. Maybe we could meet in the middle.

That is where the discipline stands right now. Artificial intelligence is currently considered largely synonymous with the narrower term “machine learning.” This term signifies the particular technique of teaching computers by example as if they were children, albeit children who are also arithmetic savants. To embrace this technique takes a certain humility. You must accept that it is merely your job to identify cats without comprehending on a deep level what a cat is. But in exchange, the technique works. It is our most viable way forward.

SparkCognition engineer W.P. McNeill delves further into these subjects on his personal blog, Corner Cases.

Latest blogs

SparkCognition is committed to compliance with applicable privacy laws, including GDPR, and we provide related assurances in our contractual commitments. Click here to review our Cookie & Privacy Policy.