Latest Developments in Computer Vision: A View From the Summit

Machine Vision is perhaps one of the few remaining areas in technology that can still lead you to say “I didn’t know computers could do that”. The recent pace of development has been relentless. On the one hand you have the industry giants competing to out-do each other on the ImageNet challenge, surpassing human vision recognition capabilities along the way, and on the other there is significant, relentless progress in bringing this technology to smart, mobile devices. May 11 and 12 saw the annual Santa Clara California gathering of industry experts and leaders to discuss latest developments at the Embedded Vision Alliance Summit. Additionally, this year ARM was proud to host a special seminar linked to the main event to discuss developments in computer vision on ARM processor technologies. In this blog I’m going to provide my perspective of some of the highlights from both events.

The Santa Clara Convention Centre, California. Host to both the ARM workshop and the EVA Summit

Computer Vision on ARM Seminar, 11 May, Santa Clara, CA

It was my great pleasure to host this event and for those of you who were there I hope you enjoyed the afternoon’s presentations and panel discussion. The proceedings from the seven partner presentations can all be downloaded from here. The idea of this event – the first of its kind ARM has held on computer vision – was intended to bring together leaders and experts in computer vision from across the ARM ecosystem. The brief was to explore the subjects of processor selection, optimisation, balancing workloads across processors, debugging and more, all in the context of developing computer vision applications. This covered both CPU and NEON™ optimisations, as well as working with Mali™ GPUs.

With a certain degree of cross-over, the seminar program was divided into three broad themes:

Optimising Computer Vision for NEON

Dr. Masaki Satoh, a research engineer from Morpho, talked about the benefits and main technical aspects of NEON acceleration, including a focus on specific algorithmic optimisations using NEON SIMD (Single Instruction, Multiple Data) instructions.
Wikitude are a leader in Augmented Reality applications on mobile devices and in his talk CTO Martin Lechner highlighted the use of NEON to accelerate this particular computer vision use case.

Real Computer Vision Use Cases on ARM

Ken Lee, founder and CEO of Van Gogh Imaging, showcased their work developing real-time 3D object recognition applications using 3D stereoscopic sensors, including optimisation via NEON and their early exploration of further acceleration via Mali GPUs.
Gian Marco Iodice, Compute Engineer at ARM, discussed his work on accelerating a real-time dense passive stereo vision algorithm using OpenCL™ on ARM Mali GPUs.
Real-time image stabilization running entirely in software was the subject of the presentation by Dr. Piotr Stec, Project Manager at FotoNation. His analysis covered the complete processing pipeline for this challenging use case and discussed where optimisations were most effective.

Processor selection, Benchmarking and Optimising

Jeff Bier, president of BDTIand founder of the Embedded Vision Alliance discussed the important area of processor selection and making intelligent choices when selecting benchmarking metrics for computer vision applications.
Tim Hartley (that’s me!) discussed the importance of whole-system measurements when developing computer vision applications and demonstrated profiling techniques that can be applied across heterogeneous CPU and GPU processor combinations.

Jeff Bier from BDTI gets things going with his presentation about processor selection and benchmark criteria

Panel Discussion

In addition to the above presentationsrmijathosted a panel discussion looking at current and future trends in computer vision on mobile and embedded platforms. The panel included the following industry experts:

Laszlo Kishonti CEO of Kishontiand new venture AdasWorks, a company creating software for the heavily computer vision dependent Advanced Driver Assistance Systems market. Working with ARM, AdasWorks has explored accelerating some of their ADAS-related computer vision algorithms using a combination of ARM CPUs and a Mali GPU. timhar01 from ARM talks in this video recorded at a previous event about some of the previous optimisation work around AdasWorks with the use of the DS5 Streamline profiler.
Michael Tusch, CEO of Apical, a developer of computer vision and image processing IP and covering the future algorithm development for imaging along with display control systems and video analytics. Apical are a long-time collaborator on computational photography and have much experience using the GPU as well as the CPU for image processing acceleration. In the previously recorded video here Michael talks about Apical's work and their experience using GPU Compute to enable hardware-based graphics acceleration.
Tim Droz, GM of SoftKinetic, a developer of 3D sensor and camera modules as well as 3D middleware, and covering issues around 3D recognition, time of flight systems, camera reference designs for gesture sensing and shared software stacks. This video recorded at GDC 2013 shows an example of some of SoftKinetic’s work with GPGPU on Mali for their gesture-based systems.

It was a fascinating and wide-ranging discussion with some great audience questions. Roberto asked the panelists what had stood out for them with computer vision developments to date. Laszlo talked about the increasing importance of intelligence embedded in small chips within cameras themselves. Michael Tusch echoed this, highlighting the problem of high quality video in IP cameras causing saturation over networks. Having analysis embedded within the cameras and then only uploading selective portions, or even metadata describing the scene, would mitigate this significantly. Tim Droz stressed the importance of the industry moving away from the pixel count race and concentrating instead on sensor quality.

Roberto then asked about the panelist’s view on the most compelling future trends in the industry. Michael Tusch discussed the importance in the smart home and businesses of the future of being able to distinguish and identify multiple people within a scene, in different poses and sizes, and being able to determine trajectories of objects. This will need flexible vision processing abstractions with the the aim of understanding the target you are trying to identify: you cannot assume one size or algorithm will fit all cases. Michael forsees, just as GPUs do for graphics, the advent of engines cable of enabling this flexible level of abstraction for computer vision applications.

Laszlo Kishonti talked about future health care automation including sensor integration in hospitals and the home, how artificial intelligence in computer vision for security is going to become more important and how vision is going to enable the future of autonomous driving. Laszlo also described the need for what he sees as the 3rd generation of computer vision algorithms. These will require levels of sophistication that will reach, for example, the ability of differentiating between a small child walking safely hand-in-hand with an adult with one where there might be a risk of running out into the road. This kind of complex mix of recognition and semantic scene analysis was, said Laszlo, vital before fully autonomous vehicles can be realized. It brought home to me both the importance of ongoing research in this area and perhaps how much further computer vision technology has to develop as a technology.

Tim Droz talked about the development of new vector processors flexible enough for a variety of inputs, HDR – high dynamic range, combining multiple images from different exposures – becoming ubiquitous, along with low-level OpenCL implementations in RTL. He also talked about Plenoptic, light-field cameras that allow re-focusing after an image is taken, becoming much smaller and more efficient in the future.

The panel ended with a lively set of questions from the audience, wrapping up a fascinating discussion.

Gian Marco Iodice talks about accelerating a real-time dense passive stereo vision algorithm

Overall it was a real pleasure to see so many attendees so engaged with the afternoon and we are grateful to all of you who joined us on the day. Thanks also to all our partners and panellists whose efforts led to a fascinating set of presentations and discussions.

The presentations from the seminar can be downloaded here: EVA Summit 2015 and ARM’s Computer Vision Seminar - Mali Developer Center

Embedded Vision Summit, 12 May, Santa Clara, CA

The annual Embedded Vision Summit is the industry event hosted by the Embedded Vision Alliance, a collection of around 50 companies working in the computer vision field. Compared to the 2014 event, this year saw the Summit grow by over 70%, a real reflection of the growing momentum and importance in embedded vision across all industries. Over 700 attendees had access to 26 presentations on a wide range of computer vision subjects arranged into 6 conference tracks. The exhibition area show-cased the latest work from 34 companies.

See below for links to more information about the proceedings and for downloading the presentations.

Dr. Ren Wu, Distinguished Scientist from Baidu delivered the first of two keynotes, exploring what is probably the hottest topic of the hour: visual intelligence through deep learning. Dr. Wu has pioneered work in this area, from training supercomputers through to deployment on mobile and Internet of Things devices. And for robot vacuum cleaner fans – and that’s all of you surely – the afternoon keynote was from Dr. Mike Aldred from Dyson who talked about the development of their 360° vision (and ARM!) enabled device which had earlier entertained everyone as it trundled around the exhibition area, clearing crumbs thrown at it by grown men and women during lunch.

ARM showcased two new partner demos at the Summit exhibition: SLAMBench acceleration on Mali GPU by the PAMELA consortium and video image stabilization in software with Mali acceleration by FotoNation

The six conference tracks covered a wide range of subject areas. Following on from Ren Wu’s keynote, Deep Learning and CNNs (Convolutional Neural Networks) made a notable mark with its own track this year. And there were tracks covering vision libraries, vision algorithm development, 3D vision, business and markets, and processor selection. In this final track, rmijat followed on from ARM’s previous day’s seminar with an examination of the role of GPUs in accelerating vision applications.