Wednesday, February 25, 2015

Step 3.3: Implementation - Optimization

In the last post, we have defined Camera Coverage Quality Metric (CCQM) that defines the quality of a camera network based on the view of the cameras. In the next step, we would like to perform an optimization search to find the view of the camera that maximizes this metric.

In summary, the first step was to simulate the possible trajectories followed by humans in a scenario, followed by identifying the high occupancy regions in the scenario. Expectation Maximization algorithm was used to identify these regions which gave us clusters of regions that have high occupancy. To begin with, we assume that each cluster is viewed by a single camera and the idea is to perform optimization to maximize the CCQM for the cluster. Depending on the size of the cluster, there might be a need for employing multiple cameras for a single cluster which will be addressed later.

Setup: We start by optimizing the camera network for the floor depicted in figure 4. of the previous blog. This triangle mesh constitutes 4562 triangles. We use the same mesh for the ceiling but at a height of 10 ft. To begin with we pursue optimization in a discrete fashion. We assume that the only possible location of the cameras are the centroids  of the triangles in the ceiling triangle mesh. The only possible orientations of the cameras are those pointed towards the centroids of the triangles on the floors triangle mesh. As a brute force algorithm would have a complexity of $O(n^2)$. Further more for every possible configurations. we need to locate points in the view of the camera (points not obstructed by geometry) to accumulate their area and human occupancy. This would multiply the complexity to be $O(n^3)$. Given the number of the triangles, this would require significant amount of time and computations. This calls for a heuristic optimization algorithm. 

The optimization is applied on one cluster at a time. Given a cluster, first the points in the ceiling that have a view of the centeroid of the cluster are identified and these points are considered as the possible location of the cameras. The only possible orientation for each camera are pointing towards the points in the cluster. This would simplify the problem to finding two points, one on the ceiling to position the camera and the second on the floor to point the camera towards. We use a variation of a Hill Climbing algorithm called the Random-Restart Hill Climbing. Random-Restart Hill Climbing is a relatively simple optimization search approach that provides near optimal performance[1, 2]. The idea is to search a certain number of points randomly and choose the best start location for hill climbing. But this is done at two levels since two variables need to be optimized. Given a point on the ceiling we perform random restart hill climb for points on the floor to find its optimized pair. This process is repeated for a certain percentage of points on the ceiling to find a suitable starting point. This is used as the starting point to perform hill climbing. The search continues through the neighbors until more progress is noticed.


The obtained results are as shown below
To recap the cluster are







Cluster 0Blue
Cluster 1Red
Cluster 2Green
Cluster 3Aqua
Cluster 4Light Pink
Cluster 5Pink
Cluster parameters Info

ClusterMeanXVarXMeanZVarZMeanOVarO
02627.1913252.0491227.281525.9540.0020.0016
11784.9107502.9819426.347448.08790.00290.0029
2725.5888280.347219.670625.62760.00110.0009
32617.2649258.5819633.892831.24690.0010.0008
41577.61431017.7892420.1754163.068300
5700.9391292.696643.345322.54340.00110.0008
Considering 5 valid clusters
Optimized parameters
Here var1 = Eye  and var2 = center
ClusterEyeXEyeYEyeZCenterXCenterYCenterZ
03001.88119189.9712282.3568460.1
12281.82119380.4911169.2468650.761
21206.07119209.531216.73168216.935
32245.89119671.5842177.1368219.217
4------
51205.58119637.524185.95768637.978

Going ahead we will perform experiments to quantify these results and perform comparison with state of the art.

References

[1] Yuan Zhang, Tao Lei, Regina Barzilay, Tommi Jaakkola Greed is Good if Randomized: New Inference for Dependency Parsing, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
[2] Filho, C.F.F.C.; Costa, M.G.F.; Filho, J.E.C.; de Oliveira, A.L.M., "Using a random restart hill-climbing algorithm to reduce component assembly time in printed circuit boards," Industrial Technology (ICIT), 2010 IEEE International Conference on , vol., no., pp.1706,1711, 14-17 March 2010

Wednesday, January 28, 2015

Step 3.2: Implementation - Camera Coverage Quality Metric(CCQM)

To this extent we have devised a method to
  1. Simulate the possible trajectories in the scenario.
  2. Identify regions on the floor with high human occupancy.
This marks the end of the first part. 

In the second part, we would like to use this information generated in the first part to optimize the camera network. To recap, the optimization would involve the maximization of
  • area of view coverage ($A$),
  • view locations with human activity volume ($H$), and
  • frontal view of humans ($F$).
  • resolution of the obtained image($R$)
The idea is to define a metric that quantifies the quality of the location and orientation of a camera based on the above three parameters. Let the location and orientation of the camera be defined by the parameters $\Omega$, which we will define later. The metric Camera Coverage Quality Metric (CCQM) is a function of $\{A, H, F,R\}$. Hence,
\begin{equation}
CCQM(\Omega) = g(A, H, F, R) = A(\Omega) * H (\Omega) * F(\Omega) * R(\Omega)
\end{equation}
where, $A$ is a function of $\Omega$ quantifying the area in view, $H$ is the function $\Omega$ quantifying the human activity in view, $F$ is a function of $\Omega$ quantifying the possible frontal images that can be obtained from the view and $R$ is a function of $\Omega$  quantifying the possible resolution that can be obtained from the view. Given this metric, the problem is to find the camera parameters $\Omega$ that maximizes the above metric. if $\Omega^*$ is the optimum parameters then.
\begin{equation}
\Omega^* = arg_{\Omega} max\{CCQM(\Omega)\}
\end{equation}
 The clusters alone provides adequate information to maximize the parameters $\{A, H\}$, the record of simulated trajectories can be used for maximizing the parameter $\{F, R\}$. Given the the parameter $\Omega$ lets define the functions $\{A, H, F, R\}$. Assuming that the floor is represented by a triangular mesh containing triangles $\{t_1, t_2, ..., t_n\}$ with centroids $\{c_1, c_2, ..., c_n\}$. Given the configuration $\Omega$ let $\{t^{\Omega}_1, t^{\Omega}_2, ..., t^{\Omega}_m\}$ be all the triangles in view of the camera.

Area of View (A):

Then the area function $A(\Omega)$ is define as
\begin{equation}
A(\Omega) = \frac {area\_in\_view}{total\_area\_of\_floor}
 = \frac {\sum\limits_{i=1}^m area(t^{\Omega}_i)}{\sum\limits_{i=1}^n area(t_i)}
\end{equation}

Human Activity Volume (H):

The human occupancy map is calculated from the simulated trajectories and an occupancy value is assigned to every triangle in the floor mesh as described in the previous blog. Let $O(t)$ the occupancy of the traingle $t$. Then the function $H(\Omega)$ is defined as
\begin{equation}
H(\Omega)  = \frac{1}{C}{\sum\limits_{i=1}^m O(t^{\Omega}_i)}
\end{equation}
where $C$ is a normalizing constant.

Frontal View of Humans (F):

To quantify the probable amount of frontal view of humans for the the configuration $\Omega$, we make use of the simulated trajectories. For every triangle $t_i$ in the floor mesh, direction discretization is performed and eight direction vectors $\{v^i_1, v^i_2, ..., v^i_8\}$ are defined as described by Zhou et al. in [1] Figure 1
Figure 1
Figure 2

In the following step a vector transition histogram/matrix (Figure 2) is constructed based on the simulated trajectories. For every simulated trajectory, the consecutive points in the trajectory are considered to create direction vectors. Let $T$ be a simulated trajectory of length $l$, $T = \{p_1, p_2, ..., p_l\}$.  For all sets of consecutive points $\{p_{i-1}, p_i\}$ in the trajectory $T$, the trajectory's local direction vector is defined as $(p_i - p_{i-1})$, and the bin corresponding to the triangle $t$ in which the point $p_{i-1}$ is located and the descretized direction vector closest to the direction of $(p_i - p_{i-1})$ is updated by $1$. Let $\Psi(t,v)$ be the histogram function, then the function $F(\Omega)$ is defined as

\begin{equation}
F(\Omega) = \frac{1}{m}(((c-p_{i-1})\cdot v_{k-1})\Psi(t_j, v_{k-1}) + ((c-p_{i-1})\cdot v_{k})\Psi(t_j, v_{k}) +((c-p_{i-1})\cdot v_{k+1})\Psi(t_j, v_{k+1}) )
\end{equation}
, where $t_j$ is the triangle in the floor mesh, the point $p_{i-1}$ lies in and $v_k$ is the direction vector closest to $(p_i - p_{i-1})$.
\begin{equation}
k = \arg \max_k v_k \cdot (p_i - p_{i-1})
\end{equation}

Resolution of the Image($R$)

This component of $CCQM$ quantifies the resolution of the images. If the obtained images is far from the camera, the obtained resolution is very low and the image might not add any value to the system. This component is application dependent, it could be customized to obtain a sufficient resolution of any object, which could be just the face or the entire body of a human. We follow the methodology described by Janoos et al. in [2]. The function $R(\Omega)$ is defined as
\begin{equation}
R(\Omega) = \frac{1}{m} \sum \limits_{i=1}^{m} \frac{\rho^\Omega (t_i)}{\rho_{min}}
\end{equation}

Let $C$ be the center of the camera, then
\begin{equation}
\rho^\Omega = \frac{\sigma_{k-1}(c-p_{i-1})\cdot v_{k-1}+\sigma_{k}(c-p_{i-1})\cdot v_{k}+\sigma_{k+1}(c-p_{i-1})\cdot v_{k+1}}{2\pi * d(C,p_i-1)^2(1-\cos(\gamma/2) )}
\end{equation}

where
$\gamma$ is the Y-field of view defined for the camera
$d(p_1,p_2)$ is the Euclidean distance between the points $p_1$ and $p_2$
$\sigma$ is the number pixels that the object occupies on the image
and $\rho_{min}$ is the used defined value that defines a minimum required resolution of an object in $pixels/inch$

To calculate the number of pixels $\sigma$, the bounding box of the object is considered and perspective transformation is applied to the corners to find their location in the image. The area of the quadrilateral is used as the $\sigma$ value. if $(a, b, c, d)$ are the location of the corners in the image. Then
\begin{equation}
\sigma = \frac{1}{2}||ac \times bd||
\end{equation}

Now that we have a way to quantify the configuration of a camera using $CCQM$, in the next step we consider a heuristic optimization algorithm to maximize this quantity to obtain the optimum configuration.

[1] Wenjun Zhou; Hui Xiong; Yong Ge; Yu, J.; Ozdemir, H.; Lee, K.C., "Direction clustering for characterizing movement patterns," Information Reuse and Integration (IRI), 2010 IEEE International Conference on , vol., no., pp.165,170, 4-6 Aug. 2010
[2] F. Janoos, R. Machiraju, R. Parent, J. W. Davis, and A. Murray, "Sensor con guration for coverage optimization for surveillance applications," 2007.