First, we present how we build our 3D models. Then we describe a method to embed virtual cameras in the 3D model that represent the cameras in real world. Finally we will see how a human subject detected in the image of a real world camera can be projected onto a point in our 3D model.
Modeling 3D geometry:
We model the 3D geometry of the environment like floors, walls, hallways, etc. using Google Sketchup, a 3D modeling tool. Figure 1 depicts the 3D model of a building constructed using existing floor plans to obtain the measurements and dimensions. We then export the 3D model using a common digital asset exchange format [1] called COLLADA file format which we later use for rendering and understanding the 3D environment. COLLADA Document Object Model (DOM) library is used to load and save this 3D model into an application, and then we use OpenGL to interact with this 3D data in the application. The rendered model of one of the floors using OpenGL is shown in Figure 2.
Figure 1.
Figure 2.
Embedding virtual cameras and calibration:
An initial step is to create virtual cameras in our 3D model which represent the cameras in real world. In order to do this we first determine the internal camera parameters of the existing real world camera by using a general calibration approach using a checkerboard. Once the camera's internal parameters are obtained, we can use OpenGL to create virtual cameras in our model which render perspective projections of the 3D model that are conceptually equivalent to the real world cameras. Now in order to determine the location and orientation of the camera in our 3D model, we take an image from the real world camera and try to manually register it with the corresponding camera's perspective projection in our graphics model, by manually changing the parameters in the transformation matrix using OpenGL. When the images register as shown in Figure 3, we extract the transformation matrix of the camera which gives us the approximate location and orientation of the camera in the 3D model [2].
Figure 3.
Delaunay triangulation of the floor mesh:
We choose to represent the floor using a triangular mesh though other representation are possible. For our purpose we would like a rich description of the triangular mesh representing the floor where human subjects walk. Triangles in the mesh should have adequate height and base with respect to the normal human motion characteristics. Assuming that the humans walk at an average pace of 3 ft/sec and the camera in use has a frame rate of 30 fps, if we plan to take a sample every 10 seconds or for every feet the human moves, it would be convenient if the triangles have a height and base that are at least 1 ft. long. We use Delaunay triangulation to obtain a mesh that is uniformly spaced as shown in Figure 4. An implementation of the Delaunay triangulation is available in the Computational Geometric Algorithms Library (CGAL) [3].
Figure 4.
References:
[1] R. Arnaud and M. C. Barnes. Collada: Sailing the Gulf of 3d Digital Content Creation. AK Peters Ltd, 2006.
[2] P. Shirley and M. Ashikhmin. Fundamentals of Computer Graphics, Second Edition. Ak Peters Series. Peters, 2005.
[3] Cgal, Computational Geometry Algorithms Library. http://www.cgal.org.