Why machine vision is the next frontier for AI
The buzz around artificial intelligence, or AI, has been growing strong over the past year. We’ve never been closer to unlocking the benefits of this technology. 2016 will see new kinds of AI-powered devices as we make progress on one of the most difficult challenges in AI: getting our devices to understand what they are seeing.
Why would a machine need to see? Vision is a primary sense and one of the main mediums in which we live our lives. In order for machines to be able to relate to humans and provide the support we need, it is imperative they can observe and behave in the visual realm. This can be in the form of a small camera that helps a blind person “see” and contextualize the world around them or a home surveillance system that can correctly identify the difference between a stray cat, moving tree branches outside, and a burglar.
As devices play a progressively integral part in our daily lives, we have seen an increasing number of applications fail without adequate visual capabilities, including a myriad of midair drone collisions and robot vacuums that “eat” things they shouldn’t.
Machine vision, a rapidly growing branch of AI that aims to give machines sight comparable to our own, has made massive strides over the past few years thanks to researchers applying specialized neural networks to help machines identify and understand images from the real world. From that starting point in 2012, computers are now capable of doing everything from identifying cats on the Internet to recognizing specific faces in a sea of photos, but there is still a ways to go. Today, we’re seeing machine vision leave the data center and be applied to everything from autonomous drones to sorting our food.
A common analogy to understanding machine vision versus our own can be found when comparing the flight of birds to that of airplanes. Both will ultimately rely on fundamental physics (e.g. Bernoulli’s Principle) to help lift them into the air, but that doesn’t mean a plane will flap its wings to fly. Just because people and machines may see the same things and the way those images are interpreted may even have some commonalties, the final results can still be vastly different.
While basic image classification has become far easier, when it comes to extracting meaning or information from abstract scenes, machines face a whole new set of problems. Optical illusions are a great example of how far machine vision still has to go.
Everyone is probably familiar with the classic illusion of two silhouettes facing one another. When a person looks at this image, they aren’t limited to just seeing abstract shapes. Their brain inserts further context, allowing them to identify multiple parts of the image, seeing two faces or a vase, all from the same image.
2つのシルエットがお互い向き合うという昔からある幻覚には馴染みがあることだろう。このイメージを見るとき、抽象的な形状を捉えられないだけにとどまらない。その人の脳は別のコンテキストを挿入して、イメージの複数の部分を認識できるよう、同一イメージから2つの顔あるいは壺状のものを見せるのだ。
二つのシルエットがお互いのほうを向き合っている古典的なトリックアートを知らない人はいないだろう。人がこの絵をみるとき、単に抽象的な形を見ているだけではない。脳がさらなる脈略を挿入し、画像のあらゆる部分を特定させ、同じ絵から二つの顔か花瓶の形を認識する。
When we run this same image through a classifier (you can find several free ones on the Internet), we quickly realize how hard this is for a machine to understand. A basic classifier doesn’t see two faces or a vase, instead, it sees things like a hatchet, hook, bulletproof vest, and even an acoustic guitar. While the system is admittedly uncertain any of those things are actually in the image, it shows just how challenging this can become.
This problem becomes even more difficult if we look at something more complicated, like a painting from Beverly Doolittle. While everyone who sees this image may not be able to spot every face on the canvas, they will almost instantly see there is more to the picture than meets the eye.
Running this image through the same classifier, our results run the gamut from something like a valley or a stone wall to the completely off-base Grifola Frondosa (a type of mushroom) or an African chameleon. While the classifier can understand the general sense of the image, it fails to see the hidden faces within the picture.
先ほどと同じ分類器を使ってイメージを見ると、渓谷や石の壁といったようなものから、完全に型崩れしたマイタケ(キノコの一種)やアフリカにいるカメレオンに至るまで、実に幅広いものが見えるという結果が得られた。分類器はイメージの全体的な感覚は捉えられるものの、絵の中に隠された顔を見分けることはできないのだ。
この画像を同じ分類ツールに通すと、谷や石垣のようなものから完全に的外れなマイタケやアフリカのカメレオンまで網羅した結果になる。分類ツールは画像の一般的な感覚は理解できるが、絵の中の隠れた顔を見つけられない。
To understand why this is such a challenge, you need to consider why vision is so complex. Just like these images, the world is a messy place. Navigating the world isn’t as simple as building an algorithm to parse through data, it requires experience and understanding of real situations that allow us to act accordingly.
Robots and drones face a myriad of these obstacles that may be out of the norm, and figuring out how to overcome these challenges is a priority for those looking to capitalize on the AI revolution.
ロボットやドローンは数多くの想定外の障害に直面するが、AI革命でキャピタライズしようとしている人は、こうした問題の克服方法を理解することが最優先の課題となる。
ロボットと無人機は、こうした基準外の障害に何度も直面する。そして、これらの挑戦を克服する方法を見つけることが、AI革命に投資しようと注目している人々にとっての優先事項である。
With the continued adoption of technologies like neural networks and specialized machine vision hardware, we are rapidly closing the gap between human and machine vision. One day soon, we may even start to see robots with visual capabilities going above and beyond our own, enabling them to carry out numerous complex tasks and operate completely autonomously within our society.
Remi El-Ouazzane is CEO of Movidius, a startup combining algorithms with custom hardware to provide visual intelligence to connected devices.
Remi El-Ouazzane氏はMovidius社という、アルゴリズムとカスタマイズされたハードウェアを組み合わせてコネクテッドデバイスにビジュアルインテリジェンスを提供するスタートアップのCEOである。
Remi El-Ouazzane氏はMovidiusのCEO。Movidiusはデバイスに繋げる視覚知能を供給するため、アルゴリズムとカスタムメイドのハードウェアを結合するスタートアップ。