Plenary lectures

Plenary lecture 1: Monday, May 7, 9.15-10.15

Advances in Large Scale Visual Search

Miroslaw Bober, University of Surrey , UK

Abstract:

The volumes of image and video content generated, stored and consumed have been increasing exponentially. There are 36 billions images uploaded to Facebook every year and 2 billions videos watched per day on YouTube.
While instantaneous access to these vast volumes of multimedia content is supported by conventional text-based search engines on a coarse level, there is a strong demand for more sophisticated tools, which do not relay on key-words but extract compact and robust descriptions directly from the visual content. Such tools must be capable of searching hundreds of billions items in real time and reliably detect matching items at a very low level of false alarms.
More specifically, tools are required to efficiently search for near-duplicates of images or videos, including an edited or modified versions, either on the web or in users own personal databases. Furthermore, tools supporting search for visual objects present in the query image or video are also necessary, for example to identify products or find information about buildings, print media or art. Recent work in the research and standards community has resulted in significant advances and first deployments, such as Google Goggles, Amazon Snaptell or Nokia City Lens.
In my talk, I will introduce the technology behind image and video signature tools, recently standardized by MPEG. The latest advances in the current standardization work on Compact Descriptors for Visual Search (CDVS) will also be presented. Comparisons against the state-of-the-art will be provided and potential new developments discussed.

Miroslaw Bober is Professor of Video Processing in the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, UK.
From 1997 to 2011, he was the General Manager of Mitsubishi Electric R&D Center Europe (MERCE, UK) and the Head of Research for its Visual and Sensing Division. Previously, he was with University of Surrey, as a lecturer and the leader of the Image Communication and Multimedia Systems Group.
He received the M.Sc. degree in Electrical Engineering (with distinction) from the AGH University of Science and Technology, Krakow, Poland in 1990. Subsequently he received the M.Sc. with distinction in Signal Processing and Artificial Intelligence in 1991 and the Ph.D. in 1995, both from the University of Surrey. Miroslaw has been actively involved in the development of MPEG-7, chairing the work of MPEG-7 visual group the Compact Descriptors for Visual Search. He developed shape description and image and video signature technologies which are now a part of the ISO standards. Miroslaw is an inventor of over 70 US patents and several of his inventions are deployed in consumer and professional products. His publication record includes over 60 refereed publications and three books and book chapters. His research interests include image and video processing, computer vision and machine learning.

Plenary lecture 2: Tuesday, May 8, 8.30-9.30

Recent advances in perceptual coding of spatial audio signals - what could it mean for visual?

Jürgen Herre, International Audio Laboratories Erlangen & Fraunhofer Institute for Integrated Circuits, Erlangen, Germany

Abstract:

Over the last decade, perceptual low bitrate coding of audio signals has made a number of significant advancements that provided high coding efficiency for monophonic, stereophonic and multi-channel audio signals.
Most significant enhancements result from extending traditional coding paradigms for coding of high quality audio in novel ways by semi-parametric approaches. This talk will highlight these advances from a birds-eye perspective and illustrate the associated performance gains.
Special attention will be given to the representation of spatial sound and the activities of the MPEG standardization group, including the emerging MPEG work item on 3D audio coding. An attempt is made to build a bridge between the audio and the visual world in a spectulative and thought-stimulating way.

Jorgen Herre joined the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, Germany, in 1989. Since then he has been involved in the development of perceptual coding algorithms for high quality audio, including the well-known ISO/MPEG-Audio Layer III coder (aka "MP3"). In 1995, Dr. Herre joined Bell Laboratories for a PostDoc term working on the development of MPEG-2 Advanced Audio Coding (AAC). By the end of '96 he went back to Fraunhofer to work on the development of more advanced multimedia technology including MPEG-4, MPEG-7, and MPEG-D, currently as the Chief Scientist for the Audio/Multimedia activities at Fraunhofer IIS, Erlangen. In September 2011, Dr. Herre was appointed professor at the University of Erlangen and the International Audio Laboratories Erlangen.
Dr. Herre is a fellow of the Audio Engineering Society, co-chair of the AES Technical Committee on Coding of Audio Signals and vice chair of the AES Technical Council. He is a member of the IEEE Technical Committee on Audio and Acoustic Signal Processing, served as an associate editor of the IEEE Transactions on Speech and Audio Processing and is an active member of the MPEG audio subgroup.

Plenary lecture 3: Wednesday, May 8, 8.30-9.30

Standardization of High Efficiency Video Coding (HEVC)

Jens-Rainer Ohm, RWTH Aachen University, Germany

Abstract:

HEVC is the latest in the series of video compression standards developed jointly by ITU-T VCEG and ISO/IEC MPEG. The HEVC project was formally launched in January 2010 following studies by both MPEG and VCEG to assess the readiness and availability of technology simultaneously with an analysis of industry needs for a new standard. Since April 2010, the standard is developed by the Joint Collaborative Team on Video Coding (JCT-VC), a joint team between MPEG and VCEG. The first version of HEVC is expected to be finalized in January 2013 for approval in both ISO/IEC and ITU-T. The major goal of the project is to develop the next generation video coding standard that could achieve the same level of video quality with a substantial savings (e.g. reduction by half) relative to the bit rate required by AVC. Initial measurements of the capability of HEVC at this stage indicate that its performance is already meeting or exceeding the targets set by this goal.
The presentation will summarize the status of the development and give an overview about the technology currently included in the standard. In comparison to AVC, larger variable-size block structures (entitled as coding units) are used with associated prediction and transform structures. More advanced tools are used both for motion-compensated and intra prediction. Several elements such as entropy coding are similar or even simplified as compared to AVC, such that the overall complexity is not exhaustive.
Application areas of the standard are foreseen in a multitude of application areas, where increasing resolution and quality demand urges for improved compression. Indeed, the current draft of first version defines only one "profile", such that identical devices would be usable for various services. The talk will also provide an outlook towards further developments in the areas of scalable and stereo/multi-view coding, which are expected to emerge in subsequent versions.

Jens-Rainer Ohm received the Dipl.-Ing. degree in 1985, the Dr.-Ing. degree in 1990, and the habil. degree in 1997, all from Technical University of Berlin (TUB), Germany. From 1985 to 1995, he was a research and teaching assistant with the Institute of Telecommunications at TUB. Between 1992 and 2000, he has also served as lecturer on topics of digital image processing, coding and transmission at TUB. From 1996 to 2000, he was project manager/coordinator at the Image Processing Department of Heinrich Hertz Institute (HHI) in Berlin. In 2000, he was appointed full professor and since then holds the chair position of the Institute of Communication Engineering at RWTH Aachen University, Germany. His research and teaching activities cover the areas of motion-compensated, stereoscopic and 3-D image processing, multimedia signal coding and content description, transmission of video signals over mobile networks, as well as general topics of signal processing and digital communication systems.
Since 1998, he participates in the work of the Moving Pictures Experts Group (MPEG), where he has been contributing to the development of MPEG-4 (Video and AVC) and MPEG-7 standards. He is chair of the ISO/IEC WG 11 (MPEG) Video Subgroup since May 2002. From January 2005 until November 2009, he was also co-chairing the Joint Video Team (JVT) of MPEG and ITU-T SG 16 VCEG. Currently, he is co-chairing the Joint Collaborative Team on Video Coding (JCT-VC) of ISO and ITU-T, with intended mandate of developing the next generation of high-efficiency video coding technology.
Prof. Ohm has authored textbooks on multimedia signal processing, analysis and coding, on communications engineering and signal transmission, as well as numerous papers in the various fields mentioned above. He is member of various professional organizations including IEEE, VDE/ITG, EURASIP and AES.