The Vernissage Data Set

The data set recorded in a "Wizard of Oz" environment is available to the research community on request. Please use the contact page to request access to the data set.

Sample data

The following videos show an extract from the Vernissage data set.
The first video is recorded from Naos internal camera and shows the two participants he is interacting with. The second video shows the same scene from behind, recorded by a common HD video camera.

The following screenshot will give you an overview of the data recorded within one session:

A detailed specification of all streams recorded using RSBag is available in the following table. It lists the RSB scope on which events for the respective stream were informed with more detailed information of the contents. Mentioned data are taken from the RST release 1.0.
Recorded data set
Besides the aforementioned data recorded through the RSB middleware, the 3 external HD cameras were recorded with 1920 × 1080 pixels video resolution at 25 fps and 48000 Hz, 16 bit signed, 5.1 channel audio.

Data formats

Besides the data being available in TIDE/RSB format, we offer an export in widley spread formats.
As an example, the Vicon data has been exported as described in the following:

Trial Time Synchronization
- clapperboard-timestamp.txt: contains the RSB timestamp of the vicon frame which contained the clapperboard event.
- time-sync-change-points.txt: The first column contains the vicon frame number of the clapperboard event.
- time-sync-clapperboard-distances.svg: a visualization of the result of the automatic clapperboard detection in the vicon data.
- time-sync.csv: Raw vicon data of the relevant trial time containing the clapperboard event, the above calculations were based on this.
Trial data - Recorded at 100 Hz. Each line in the file represents one Vicon frame, hence 100 frames form one second of recording time.

nao.csv: Nao's head pose
persons.csv: Head poses of both persons

Person Head Pose Calibration (for both persons)
- person-calibration-1-clapperboard.csv: Vicon tracking of the clapperboard marking the time at which the person looked at the spot.
- person-calibration-1-head.csv: Tracking results of the person's head.
- person-1-change-points.txt: Person's neutral pose at the time of the clapperboard (averaged over a short window to improve stability).
- person-1-clapperboard-distances.svg: visualization of the clapperboard detection.

Post-processing

The primary aims of the post-processing phase were to

synchronize the HD videos with the remaining data
validate the synchronicity of all recorded RSB data
export the data to the annotation tool
re-import the generated annotation into the RSBag files.

Generally, the recorded RSBag files form the fundamental base of the data set, containing the important information from Nao’s sensors. Hence, other supplemental data, like the HD cameras, is consequently adjusted to be usable with the RSBag files and not the other way around. Moreover, data generated during the post-processing and annotation steps was, whenever applicable, made available inside the RSBag files, as well, in order to have a common interface to the whole data set. With the abilities of the RSB middleware, this facilitates the evaluation of system components without modifying them to interface with different data sources.
In general, the post-processing procedure can be seen as following three basic steps. First, the channels that do not have explicit absolute timing information need to be synchronized to the rest of the timestamped data set. In a second step, we can generate views on selected parts of the data. These views can span just sub-episodes of the larger recordings and can include a subset of the recorded streams of data needed for a specific task. As a last step, we envision that annotations and similar secondary data will be reintegrated back into the original files of the data set, so it can be used e.g. in evaluating or as inputs to parts of the overall system.

Annotation

Apart from the ground truth data that is automatically generated by the recording process and the devices used in it, we have also carried out manual annotations. The Vernissage Corpus includes the following:

2D Head location annotation
Visual Focus of Attention annotation (VFOA)
Nodding annotation
Utterance annotation
Addressee annotation

The Vernissage Data Set

HUMAVIPS Project (FP7-ICT-2009-247525)

Sample data

Data formats

Post-processing

Annotation