Computer vision is a branch of artificial intelligence (AI) whose objective is to derive meaningful information — particularly regarding recognition and identification — from visual input. Examples of how this visual input is used include image classification, object detection, object tracking, and content-based image retrieval.
Distinguishing between different variations of one categorical object is a mutual problem among bakeries and hospitals. For example, distinguishing between types of pastries and detecting cancer cells among normal cells, surprisingly, require the same system.
Deep learning and the Japanese pastry problem
In 2012, Alex Krizhevsky, a computer science graduate student, and his team from the University of Toronto created a neural network model called AlexNet, which employed a technique called deep learning to classify images into different object categories. The layers of neural networks like AlexNet resemble the human visual cortex in that different levels perform different functions.
One disadvantage of deep learning is that it requires a lot of sample data. We can only feed these computers so many images, and it is often the case that these images show an archetypal, controlled condition of their referenced object. A deviation from this staged perfection would sometimes render the object unrecognizable, such that it is categorized as a new separate object. It needs to be shown a thousand doughnuts to acquire an idea of what features define a doughnut.
This is particularly a problem for Japanese bakeries that want to employ automated checkout systems. Japanese bakeries prioritize range. Bread has always been an import in Japan, and Japan’s rich history of trade has left consumers with a taste for variety. A large variety of pastries, almost hundreds of different types, are invented all the time.
Analysts at a new bakery venture conducted market research and found that bakeries offering a larger variety sell more and pastries unwrapped sell more than individually wrapped pastries because they appear more fresh. To accommodate these two conditions — selling a vast variety of pastries without wrapping, while making checkout sanitary and quick — the bakery venture wanted to incorporate automated checkout. But what automated checkout system operates without barcodes?
One reason it is so difficult for computers to achieve the sensitivity of human vision when distinguishing objects or parsing a scene is that they lack experiential and spatial context. Humans have a lifetime of context training about how things look in different lighting conditions, how shadows are cast, what different textures look like, et cetera.
There is a repertoire of context that makes us quick at deducing the category of a particular object even if we have never seen it before. For example, you can recognize a doughnut even if it’s a type of doughnut you have never seen before. You know glass is reflective because you have seen it in different contexts, at different times of the day. This circumstantial context complicates how objects appear under different conditions or in different variations, and computers have a limited understanding of it.
Hisashi Kambe and the pastry AI
Hisashi Kambe founded BRAIN Co., Ltd. in 1982 after years working at Matsushita Electric Works, which later became Panasonic.
In 2007, the aforementioned bakery venture approached BRAIN. A financial crisis in 2008 discontinued BRAIN’s other projects, and Kambe bet his company on the bakery’s pastry project. The company developed ten algorithm prototypes in two years. By combining and rewriting algorithms from different prototypes, they achieved a system with 98 per cent accuracy across 50 varieties of bread.
The problem was that this system had only been tested in perfectly controlled conditions, whereas in a bakery, BRAIN’s system had to work at different times of the day, with different lighting conditions, and random placements of the items on the device.
A backlight was employed to eliminate shadows cast by the pastry, including the shadow of the doughnut across the doughnut hole, which would cause the scanner to read the item as a pastry without a hole. Kambe’s team even developed a mathematical model that related baking time to the colour of the pastry.
By 2013, Kambe’s team had spent five years immersed in bread before they built the device that took pictures of the pastries on a backlight, analyzed their features and contours, and distinguished a variety of pastries from one another. They called it BakeryScan. This is a hand-tuned system different from systems that use deep-learning techniques. When BRAIN tried replacing their BakeryScan with a deep neural network that used deep-learning techniques, the network system recognized pastries just as effectively as BakeryScan, but it required thousands of training examples. This is a problem in a Japanese bakery that might introduce new pastry types on a weekly basis. BakeryScan, however, can recognize a new pastry after just five samples with 90 per cent accuracy, and after 20 samples, it’s nearly perfect.
One of BRAIN’s biggest customers, Andersen Bakery, has deployed the system in hundreds of bakeries. The BRAIN team included a feedback mechanism in the physical BakeryScan system for operators, where the system lights up in yellow or red instead of green when it isn’t confident in its identification of the pastry. The system then asks the operator to specify from a short selection of its best guesses. In this way, BakeryScan learns and progressively achieves a higher level of accuracy.
From bakeries to hospitals
In 2017, a doctor at Kyoto’s Louis Pasteur Center for Medical Research noticed that the bread identified by BakeryScan looked like cancer cells under a microscope. The doctor reached out to BRAIN and eventually, the company began researching and developing a pathology-centered prototype of BakeryScan.
In 2018, at a conference in Sapporo about the AI identification of cancer cells, Kambe argued that deep learning was still impractical for certain tasks because of how much data it requires.
In 2021, a prototype of BakeryScan repurposed to detect cancer cells, now called Cyto-AiSCAN, was being tested in two major hospitals in Kobe and Kyoto. It had become capable of looking at an entire microscope slide and identifying the potential cancer cells.
In November of 2021, James Somers — a New York-based writer and programmer — interviewed Kambe, who demonstrated how Cyto-AiSCAN flags the cancer-cell candidates on a sample from a stage-four cancer patient. Cyto-AiSCAN was working at a 99 per cent accuracy in cancer-cell detection. Somers asked Kambe how the model is so successful or if he used deep learning, to which Kambe smiled and said, “Original way, same as bread.”