Nao is the main sensor because developed software in HUMAVIPS should be usable with the data provided by this robot. All other devices provide additional help or ground truth information. From Nao’s sensors and internal state, the following aspects have been recorded: monocular video (stereo from some trials as explained in Scenario), audio from four microphones, joint angles, odometry, status information like CPU usage and tem peratures.
Three cameras recording HD videos were installed in the recording room. All of them mainly serve as ground truth for the manual annotation process of the data by providing additional views on the scenery. These cameras allow to visually inspect the head orientations and visual focus targets of the participants and provide an in-depth view on facial expression and discussions. All three cameras also provide low-quality sound recordings of the scenery with their internal microphones. The cameras were not attached to the network and recorded on internal SD cards.
While not being recorded, a network camera was installed at an elevated place close to one wall of the room. This camera served the purpose of providing a live overview for the operators and, hence, was placed in a way that the view on the robot was not blocked by interacting participants of the recording.
Vicon Motion Capture System
In order to have ground truth data for locations of partic ipants as well as their head orientations, needed for the specific research topics addressed in HUMAVIPS, a Vicon motion capturing system was installed in the room, covering the whole space of the room where the interaction took place.
To get reliable information of spoken utterances from each par ticipant, e.g. for the annotation process, participants were equipped with wireless micro phones.
SoftwareThe overall software setup was based on a number of components operating in a distributed system which was connected through the middleware RSB. RSB is an event-based message oriented middleware which is based on communication in a hierarchical bus structure. The hierarchy is established by Scopes, e.g. in the form of /a/test/scope. Once a client sends information on this scope (i.e. the client informs), the information is visible to all other clients in the distributed system which listen on this scope and all parent scopes, i.e. /a/test/scope, /a/test, /a, and /. This system facilitates introspection and logging of whole system runs, e.g. by logging all events sent on the root scope /. To facilitate robotics research with repeatable trials and data sets, we have implemented a set of generic RSB tools, called RSBag. These tools allow to record RSB events on certain or all scopes of the system and replay them back with different strategies. This includes replay with original speed, interactive stepping and replay as fast as possible. The generic handling of all events in these tools is possible, because RSB encodes timing aspects in a generic structure which is common to all events, independent of the concrete payload. While RSB does not enforce special data types for the payload of events, we have decided to consequently use Google Protocol Buffers as our representation format in HUMAVIPS. With this decision we further improved the possibility to use generic tools In the context of the data set recording, all sensory hardware attached to the network (Nao and external devices like microphones, excluding the HD cameras) was made available using RSB-enabled streaming software. For instance, several components were deployed on Nao which streamed audio and video as well as internal states like joint angles or the processor load using RSB events. RSBag was running on the recording server and captured all streaming events. Hence, the major part of data recorded for this data set is captured in RSBag-compliant files.
To ensure the consistency if all data recorded over RSB, the different computers (including Nao) were synchronized over NTP. Hence, the recorded timing information present in all RSB events refer to the same clock.