The Vernissage Data Set

HUMAVIPS Project (FP7-ICT-2009-247525)

For the scenario we modeled the interaction of an art guide in a small private vernissage setting. The robot welcomes visitors and guides them through the vernissage by explaining several paintings and initiating a small quiz.
By inviting naive participants for the interaction with the robot, the data set allows to analyze the human behavior towards the robot in a realistic interaction situation in addition to it’s availability for benchmarking.

All recordings took place in a single prepared room.
Room layout

Orange: NAO, cameras: blue – Vicon, red – HD, green lines: paintings, red: Vicon coordinate system, P1 and P2 indicate calibration positions for the participants’ head orientations.


In detail, the scenario unfolded as follows:

Prototype scenes

Besides the recordings of the vernissage scenario, several additional scenes have been recorded with variations of the scenario. While the vernissage recordings provided a basis for a more empirical evaluation in a controlled environment and scenario, these prototypical scenes serve the two general purposes:
  1. Providing data including ground truth for examining the impact of less structure in the scenario and environment. This examination, on the one hand, helps to find perspectives for the algorithmic realization and, on the other hand, is a test of how the data set scenario can be improved for future recording sessions.
  2. Providing data specifically intended to examine the performance of certain algorithms with special requirements on the recorded data which could not be fulfilled in the general scenario, e.g. for mapping and localization.

Stereo Prototype Scenes

Three additional prototypical scenarios were recorded after the initial roughly synchronized access to the images from both cameras of the robot was available.

The Vernissage


The original vernissage scenario was recorded again after the availability of the stereo recording solution but without naive users due to the short time frame in which the recording became possible. Two recordings were with the original setting including two people in front of the robot. A third recording included an additional person visible in the scene.

Mapping and Localization


Several recordings were made in which the robot was walking on the table in front of the paintings. Four trials were done in total that will serve to evaluate and benchmark the mapping, localization and eventually navigation algorithms developed in the project.

Speaker Tracking


To evaluate algorithms on the audio/visual detection and tracking of speaking persons, a special scene with three persons was recorded. At the beginning of this recording, all persons successively say their names, one after the other. As a second step, the positions of all three persons are exchanged. One of the persons was always seated while the other two were standing upright to have some variation both in elevation and in azimuth as seen from the robot’s perspective. This process of speaking and exchanging places is then repeated several times.

Monocular Prototype Scenes

Apart from the extra stereo recordings, some variations on the scenario were also recorded only with monocular vision directly after the initial vernissage recordings.

Increased Number of People


The main scenario has additionally been performed with more people appearing in the scene. This allows to inspect the performance of tested algorithms and how they scale with the number of people.

Moving Robot


In future datasets one idea is to also include variants of the normal vernissage scenario where the robot walks from painting to painting. To provide a first testbed in this direction for algorithms, we also recorded a scene in the vernissage room where the robot walks while several people are present. This scene allows inspecting the effects of egomotion on algorithmic processing or analyzing the performance of auditory algorithms with respect to the noise produced by the robot while walking.