Pointing'04 ICPR Workshop
Cambridge, United Kingdom - 22 August 2004

Pointing Gestures: video sequence database

data-hand/plan-monica2.png

Top-down reprensentation of the capture environment.

data-hand/vue-monica.jpg

Example quadriscopic view of the environment.

Introduction

The database consists of 8 video sequences of people successively pointing at different positions on a whiteboard with a finger.

Each person is recorded twice, once with a known ground truth of pointed positions, once with a hidden ground truth.

The author (Julien Letessier) can be contacted for details not mentioned here.

Capture setup

The sequences are captured in the Prima "Smart Office" environment at the INRIA Rh™ne-Alpes, Grenoble, France. They are taken simultaneously from four ceiling cameras oriented towards the user.

Each capture goes as follows: the user

  1. enters the office and sits;
  2. clicks to display a pattern on the desk (for synchronization purposes);
  3. successively points 8 different positions on the whiteboard;
  4. stands up and exits the office.

Lighting conditions

Scene illumination roughly consists of 60% natural diffuse light and 40% neon light.

Capture

Capture is performed using video4linux and ffmpeg from a 25Hz, non-interleaved PAL stream at CIF size (352x288 pixels).

The four view are synchronized a posteriori; the maximum delay between two different view is one frame (40 ms).

Provided data

All of the data lives in this directory, read below for details.

Videos

Quadriscopic videos of each capture are provided, in an downsampled and compressed format, for reference purposes.

These videos are labeled with a frame counter in the upper left corner, and the sequence name in the lower left corner.

The videos are named _montaged-<a><b>.avi, where <a> is the capture ID of the filmed person and <b> is 1 for the sequence where the ground truth is known, and 2 when the ground truth is hidden.

These videos were assembled with ImageMagick-5.5.7 and compressed in mpeg4 format with ffmpeg-0.4.8.

Image sequences

The sequences are provided as a set of tarballs containing zlib-compressed PNG format images.

The tarballs are named sequence-<a><b><c>.tar, where <c> is the view ID and <a> and <b> have the same meaning as above. They contain a directory named sequence-<a><b><c>, which itself contains a set of frame-<n>.png image files (<n> is a four-digit integer).

The sequences are also provided as heavily-compressed mpeg4 videos, named sequence-<a><b><c>.avi (same conventions).

The PNG files were generated and compressed from raw PPM files using ImageMagick.

Geometric information

Correspondences between points in world coordinates and points in view coordinates are provided in the geometry.txt file (tab-separated format). Column 1 gives the point names, columns 2-5 give the coordinates in each of the views (point 0,0 is in the lower left of a view), and column 6 gives the world coordinates.

The office-map.pdf is a top-down map of the office with axes and coordinates (in centimeters).

Ground truth

The ground truth is provided for sequences where <c> is 1. In file ground-truth.txt is a list of frame indices where the user points to a known position.

This ground truth could be used e.g. for system calibration. The points are the corners of the room's whiteboard and the midpoints of the whiteboard's borders.