First published in AMS business magazine. Author: Douglas Heingartner

In June of 2015, the University of Amsterdam (UvA) and the American tech giant Qualcomm unveiled their new public-private partnership: a joint research lab known as QUVA. Based in the Amsterdam Science Park, the new lab is dedicated to machine-learning techniques, and is a further extension of academic research in the field. The mission of the QUVA lab is to merge computer vision with machine learning, which will make it even easier to automatically interpret images and videos. Qualcomm, which ships more than one billion processors per year, will then use this new knowledge in improving the visual capabilities of mobile devices such as smartphones.

Computer visionary

Arnold Smeulders, a professor of computer vision at the UvA and part of the new research lab’s management team, has been working in the field since its earliest days in the 1970s. He says that, for the first 30 years, he had quite a bit of trouble getting people to understand why computer vision was important: “They would say, But humans can already see what it is. 'Why do you need a computer to tell you what it is?'’’

But today, the applications of computer vision are plain to see, in everything from cameras and robots, to driver-less vehicles and quality assessments. “Now it’s more like the other way around: Give me a test for which vision does not play a part,” says Smeulders. And at the QUVA lab, they will carry out research into or generating automatic video summaries, or recognising an object in an image from just a single example.

Driven by big data

The recent ascent of the computer-vision field has less to do with computing power than it does with the availability of more and more data. When he started in the 1970s, Smeulders says, “They didn’t have all this data.” But around the year 2000, all of the sensors started becoming digital, all of the images were being stored and many images were exchanged online. “From that point on, you see a huge accumulation of digital data. Think of a thousand pictures from a billion people who each have a smartphone; that’s already a trillion images,” he says. As a result, the quality of image recognition “has taken off rapidly, exponentially, so that it now has real practical implications everywhere.

Smeulders and his ‘fellow travellers’ in the field of vision research have been taking part in competitions for years to judge which algorithms are the best at automatically labelling images. In the early days, their algorithms performed poorly, only slightly better than chance, which at the time was often good enough to win. But about three years ago, computer vision have progressed to the point “where there are maybe 30,000 categories of images that can be identified with reasonable success, about 20,000 of these as well as humans can,” Smeulders explains. In some fine-grained categories, computers can even do better than typical humans, for example in distinguishing between a Kentucky warbler and an ordinary warbler. In this sense, image recognition is now “something that they can check off the list”.

How computers see the world

Computer vision works differently than human vision, in that what might be important to humans in recognising or describing an image, such as someone’s eyes, can form a relatively small and unimportant part of the image as the computer sees it. The computer also uses different strategies: when looking for a table knife, for example, it’s easier to search for a table fork (a much less common shape), as there is a great likelihood that a knife will be next to it.

Smeulders is currently researching how computers are getting better at recognising even abstract concepts, which he had initially believed would be unlikely. In the early days, he thought that, at most, 15% of our words could have a visual equivalent that a computer could recognise. “But I don’t think that’s true anymore,” he says. “Even words such as ‘democracy’ have a visual symbol in people’s heads. That’s a very abstract word, but it is still recognisable if you have enough examples. A word like 'yellow' is based on reality, whereas a word like 'happy' is more abstract,” continues Smeulders. “But if you gave a happy-looking image to a thousand people, the greater majority of them would probably use the word 'happy' to describe it.” The trick is to identify what is visually present in all of the images in a certain category. “If the human and the computer both reach the same name after looking at a certain group of images, then somehow people must be must seeing what is invariably common to all of them.”

Academic foundations

QUVA marks the second such joint lab initiative, the first being the Advanced Research Center for Nanolithography (ARCNL), which was formed in 2013 by the chip-maker ASML and several Amsterdam universities and institutes. This new collaboration between the UvA and Qualcomm speaks to “the long-standing tradition of research into the capabilities of computer vision,” says Smeulders. The QUVA-lab, which will form part of the UvA’s Informatics Institute, came about in September 2014 after Qualcomm had acquired Euvision, a UvA spin-off company that Smeulders had co-founded. The ensuing talks between UvA and Qualcomm about a possible research collaboration ultimately resulted in QUVA, which will employ between 15 and 20 researchers.

“The fact that Qualcomm came here to start a laboratory after the acquisition of Euvision tells you that it is in their interests,” says Smeulders, “otherwise they wouldn’t do it.” And it was the UvA’s role as a university in general that initially provided the foundations for a company like Euvision to develop in the first place. Independent research of that kind, says Smeulders, is “where the unexpected and the unforeseen and the unplanned directions are more likely to come from.”