The Vernissage Data Set

HUMAVIPS Project (FP7-ICT-2009-247525)

The data set recorded in a "Wizard of Oz" environment is available to the research community on request. Please use the contact page to request access to the data set.

Sample data

The following videos show an extract from the Vernissage data set.
The first video is recorded from Naos internal camera and shows the two participants he is interacting with. The second video shows the same scene from behind, recorded by a common HD video camera.

The following screenshot will give you an overview of the data recorded within one session:

            containing recorded data

A detailed specification of all streams recorded using RSBag is available in the following table. It lists the RSB scope on which events for the respective stream were informed with more detailed information of the contents. Mentioned data are taken from the RST release 1.0.
Recorded data set
Besides the aforementioned data recorded through the RSB middleware, the 3 external HD cameras were recorded with 1920 × 1080 pixels video resolution at 25 fps and 48000 Hz, 16 bit signed, 5.1 channel audio.

Data formats

Besides the data being available in TIDE/RSB format, we offer an export in widley spread formats.
As an example, the Vicon data has been exported as described in the following:


The primary aims of the post-processing phase were to
  • synchronize the HD videos with the remaining data
  • validate the synchronicity of all recorded RSB data
  • export the data to the annotation tool
  • re-import the generated annotation into the RSBag files.
Generally, the recorded RSBag files form the fundamental base of the data set, containing the important information from Nao’s sensors. Hence, other supplemental data, like the HD cameras, is consequently adjusted to be usable with the RSBag files and not the other way around. Moreover, data generated during the post-processing and annotation steps was, whenever applicable, made available inside the RSBag files, as well, in order to have a common interface to the whole data set. With the abilities of the RSB middleware, this facilitates the evaluation of system components without modifying them to interface with different data sources.
In general, the post-processing procedure can be seen as following three basic steps. First, the channels that do not have explicit absolute timing information need to be synchronized to the rest of the timestamped data set. In a second step, we can generate views on selected parts of the data. These views can span just sub-episodes of the larger recordings and can include a subset of the recorded streams of data needed for a specific task. As a last step, we envision that annotations and similar secondary data will be reintegrated back into the original files of the data set, so it can be used e.g. in evaluating or as inputs to parts of the overall system.


Apart from the ground truth data that is automatically generated by the recording process and the devices used in it, we have also carried out manual annotations. The Vernissage Corpus includes the following:
  • 2D Head location annotation
  • Visual Focus of Attention annotation (VFOA)
  • Nodding annotation
  • Utterance annotation
  • Addressee annotation