Wednesday, December 10, 2014

Step 3.0: Implemantation

In this section, i'd like to describe the construction of the 3D environment and the setup for collecting data. This was part of my previous research and this is just a recap to make the work complete.

First, we present how we build our 3D models. Then we describe a method to embed virtual cameras in the 3D model that represent the cameras in real world. Finally we will see how a human subject detected in the image of a real world camera can be projected onto a point in our 3D model.

Modeling 3D geometry:

We model the 3D geometry of the environment like floors, walls, hallways, etc. using Google Sketchup, a 3D modeling tool. Figure 1 depicts the 3D model of a building constructed using existing floor plans to obtain the measurements and dimensions. We then export the 3D model using a common digital asset exchange format [1] called COLLADA file format which we later use for rendering and understanding the 3D environment. COLLADA Document Object Model (DOM) library is used to load and save this 3D model into an application, and then we use OpenGL to interact with this 3D data in the application. The rendered model of one of the floors using OpenGL is shown in Figure 2.

Figure 1.

Figure 2.


Embedding virtual cameras and calibration:

An initial step is to create virtual cameras in our 3D model which represent the cameras in real world. In order to do this we first determine the internal camera parameters of the existing real world camera by using a general calibration approach using a checkerboard. Once the camera's internal parameters are obtained, we can use OpenGL to create virtual cameras in our model which render perspective projections of the 3D model that are conceptually equivalent to the real world cameras. Now in order to determine the location and orientation of the camera in our 3D model, we take an image from the real world camera and try to manually register it with the corresponding camera's perspective projection in our graphics model, by manually changing the parameters in the transformation matrix using OpenGL. When the images register as shown in Figure 3, we extract the transformation matrix of the camera which gives us the approximate location and orientation of the camera in the 3D model [2].

Figure 3.

Delaunay triangulation of the floor mesh:

We choose to represent the floor using a triangular mesh though other representation are possible. For our purpose we would like a rich description of the triangular mesh representing the floor where  human subjects walk. Triangles in the mesh should have adequate height and base with respect to the normal human motion characteristics. Assuming that the humans walk at an average pace of 3 ft/sec and the camera in use has a frame rate of 30 fps, if we plan to take a sample every 10 seconds or for every feet the human moves, it would be convenient if the triangles have a height and base that are at least 1 ft. long. We use Delaunay triangulation to obtain a mesh that is uniformly spaced as shown in Figure 4. An implementation of the Delaunay triangulation is available in the Computational Geometric Algorithms Library (CGAL) [3].
Figure 4.


References:

[1] R. Arnaud and M. C. Barnes. Collada: Sailing the Gulf of 3d Digital Content Creation. AK Peters Ltd, 2006.
[2] P. Shirley and M. Ashikhmin. Fundamentals of Computer Graphics, Second Edition. Ak Peters Series. Peters, 2005.
[3] Cgal, Computational Geometry Algorithms Library. http://www.cgal.org.

Monday, December 1, 2014

Step 3.1: Implementation - Identifying Regions with High Human Activity Cont...

The steps involved in Identifying Regions with High Human Activity are

  1. Identify nodes and assign probabilities
  2. Sample start and end nodes based on the probabilities
  3. Simulate trajectories from the starting node to the end node
  4. Calculate occupancy map from the trajectories
  5. Cluster regions based on their occupancy
To summarize, given the geometry of an infrastructure, the nodes are assigned probabilities as described in the earlier steps. Later the start and the end nodes are sampled as described in 3. Given the start and end, human motion trajectories are generated as described in the previous step. These tools provide a way to simulate an entire scenario in the infrastructure. 


Calculate occupancy map from the trajectories:

By simulating the previous steps multiple times and observing the trajectories provides a way to identify regions that have high human activity. 
Figure 1.

Figure 1 shows the occupancy map by simulating the 500 trajectories as described in the previous steps. 


Cluster regions based on their occupancy:

The next step is to cluster regions with high human activity. In the step, the regions that belong to the same cluster should have a high value of occupancy and also be located in the same spacial location. The feature set representing any point are the spatial co-ordinates and their occupancy i.e. 
(x, y, z, o), where x,y,z are the 3D co-ordinates of the points and o the occupancy of the points. Expectation Maximization was used for cluster and the results are as shown below. 
Figure 2.

The mean and the standard deviation of the obtained clusters are.

ClusterXZO
RedMean:1784.91
Var:502.98
Mean:426.34
Var:48.08
Mean:0.0029
Var:0.0029
BlueMean:227.28
Var:25.95
Mean:2627.19
Var:252.04
Mean:0.002
Var:0.0016
PinkMean:725.58
Var:280.34
Mean:219.67
Var:25.62
Mean:0.0011
Var:0.0009
GreenMean:2617.26
Var:258.58
Mean:633.89
Var:31.24
Mean:0.0011
Var:0.0008
AquaMean:2617.2649
Var:258.5819
Mean:633.8928
Var:31.2469
Mean:0.001
Var:0.0008
Light PinkMean:1577.61
Var:1017.78
Mean:420.17
Var:163.06
Mean:0
Var:0

In this scenario red cluster is identified to have the highest human activity followed by blue and pink.

The next step is to find camera calibration parameters that maximizes the view of these clusters and also provides maximal frontal view of the humans.