Camera Network Placement Optimization: 2015

In the last post, we have defined Camera Coverage Quality Metric (CCQM) that defines the quality of a camera network based on the view of the cameras. In the next step, we would like to perform an optimization search to find the view of the camera that maximizes this metric.

In summary, the first step was to simulate the possible trajectories followed by humans in a scenario, followed by identifying the high occupancy regions in the scenario. Expectation Maximization algorithm was used to identify these regions which gave us clusters of regions that have high occupancy. To begin with, we assume that each cluster is viewed by a single camera and the idea is to perform optimization to maximize the CCQM for the cluster. Depending on the size of the cluster, there might be a need for employing multiple cameras for a single cluster which will be addressed later.

Setup: We start by optimizing the camera network for the floor depicted in figure 4. of the previous blog. This triangle mesh constitutes 4562 triangles. We use the same mesh for the ceiling but at a height of 10 ft. To begin with we pursue optimization in a discrete fashion. We assume that the only possible location of the cameras are the centroids of the triangles in the ceiling triangle mesh. The only possible orientations of the cameras are those pointed towards the centroids of the triangles on the floors triangle mesh. As a brute force algorithm would have a complexity of $O(n^2)$. Further more for every possible configurations. we need to locate points in the view of the camera (points not obstructed by geometry) to accumulate their area and human occupancy. This would multiply the complexity to be $O(n^3)$. Given the number of the triangles, this would require significant amount of time and computations. This calls for a heuristic optimization algorithm.

The optimization is applied on one cluster at a time. Given a cluster, first the points in the ceiling that have a view of the centeroid of the cluster are identified and these points are considered as the possible location of the cameras. The only possible orientation for each camera are pointing towards the points in the cluster. This would simplify the problem to finding two points, one on the ceiling to position the camera and the second on the floor to point the camera towards. We use a variation of a Hill Climbing algorithm called the Random-Restart Hill Climbing. Random-Restart Hill Climbing is a relatively simple optimization search approach that provides near optimal performance[1, 2]. The idea is to search a certain number of points randomly and choose the best start location for hill climbing. But this is done at two levels since two variables need to be optimized. Given a point on the ceiling we perform random restart hill climb for points on the floor to find its optimized pair. This process is repeated for a certain percentage of points on the ceiling to find a suitable starting point. This is used as the starting point to perform hill climbing. The search continues through the neighbors until more progress is noticed.

The obtained results are as shown below

To recap the cluster are

Cluster 0	Blue
Cluster 1	Red
Cluster 2	Green
Cluster 3	Aqua
Cluster 4	Light Pink
Cluster 5	Pink

Cluster parameters Info

Cluster	MeanX	VarX	MeanZ	VarZ	MeanO	VarO
0	2627.1913	252.0491	227.2815	25.954	0.002	0.0016
1	1784.9107	502.9819	426.3474	48.0879	0.0029	0.0029
2	725.5888	280.347	219.6706	25.6276	0.0011	0.0009
3	2617.2649	258.5819	633.8928	31.2469	0.001	0.0008
4	1577.6143	1017.7892	420.1754	163.0683	0	0
5	700.9391	292.696	643.3453	22.5434	0.0011	0.0008

Considering 5 valid clusters

Optimized parameters

Here var1 = Eye and var2 = center

Cluster	EyeX	EyeY	EyeZ	CenterX	CenterY	CenterZ
0	3001.88	119	189.971	2282.35	68	460.1
1	2281.82	119	380.491	1169.24	68	650.761
2	1206.07	119	209.531	216.731	68	216.935
3	2245.89	119	671.584	2177.13	68	219.217
4	-	-	-	-	-	-
5	1205.58	119	637.524	185.957	68	637.978

Going ahead we will perform experiments to quantify these results and perform comparison with state of the art.

References

[1] Yuan Zhang, Tao Lei, Regina Barzilay, Tommi Jaakkola Greed is Good if Randomized: New Inference for Dependency Parsing, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
[2] Filho, C.F.F.C.; Costa, M.G.F.; Filho, J.E.C.; de Oliveira, A.L.M., "Using a random restart hill-climbing algorithm to reduce component assembly time in printed circuit boards," Industrial Technology (ICIT), 2010 IEEE International Conference on , vol., no., pp.1706,1711, 14-17 March 2010

To this extent we have devised a method to

Simulate the possible trajectories in the scenario.
Identify regions on the floor with high human occupancy.

This marks the end of the first part.

In the second part, we would like to use this information generated in the first part to optimize the camera network. To recap, the optimization would involve the maximization of

area of view coverage ($A$),
view locations with human activity volume ($H$), and
frontal view of humans ($F$).
resolution of the obtained image($R$)

The idea is to define a metric that quantifies the quality of the location and orientation of a camera based on the above three parameters. Let the location and orientation of the camera be defined by the parameters $\Omega$, which we will define later. The metric Camera Coverage Quality Metric (CCQM) is a function of $\{A, H, F,R\}$. Hence,
\begin{equation}
CCQM(\Omega) = g(A, H, F, R) = A(\Omega) * H (\Omega) * F(\Omega) * R(\Omega)
\end{equation}
where, $A$ is a function of $\Omega$ quantifying the area in view, $H$ is the function $\Omega$ quantifying the human activity in view, $F$ is a function of $\Omega$ quantifying the possible frontal images that can be obtained from the view and $R$ is a function of $\Omega$ quantifying the possible resolution that can be obtained from the view. Given this metric, the problem is to find the camera parameters $\Omega$ that maximizes the above metric. if $\Omega^*$ is the optimum parameters then.
\begin{equation}
\Omega^* = arg_{\Omega} max\{CCQM(\Omega)\}
\end{equation}
The clusters alone provides adequate information to maximize the parameters $\{A, H\}$, the record of simulated trajectories can be used for maximizing the parameter $\{F, R\}$. Given the the parameter $\Omega$ lets define the functions $\{A, H, F, R\}$. Assuming that the floor is represented by a triangular mesh containing triangles $\{t_1, t_2, ..., t_n\}$ with centroids $\{c_1, c_2, ..., c_n\}$. Given the configuration $\Omega$ let $\{t^{\Omega}_1, t^{\Omega}_2, ..., t^{\Omega}_m\}$ be all the triangles in view of the camera.

Area of View (A):

Then the area function $A(\Omega)$ is define as

\begin{equation}
A(\Omega) = \frac {area\_in\_view}{total\_area\_of\_floor}
= \frac {\sum\limits_{i=1}^m area(t^{\Omega}_i)}{\sum\limits_{i=1}^n area(t_i)}
\end{equation}

Human Activity Volume (H):

The human occupancy map is calculated from the simulated trajectories and an occupancy value is assigned to every triangle in the floor mesh as described in the previous blog. Let $O(t)$ the occupancy of the traingle $t$. Then the function $H(\Omega)$ is defined as

\begin{equation}
H(\Omega) = \frac{1}{C}{\sum\limits_{i=1}^m O(t^{\Omega}_i)}
\end{equation}
where $C$ is a normalizing constant.

Frontal View of Humans (F):

To quantify the probable amount of frontal view of humans for the the configuration $\Omega$, we make use of the simulated trajectories. For every triangle $t_i$ in the floor mesh, direction discretization is performed and eight direction vectors $\{v^i_1, v^i_2, ..., v^i_8\}$ are defined as described by Zhou et al. in [1] Figure 1

Figure 1

Figure 2

In the following step a vector transition histogram/matrix (Figure 2) is constructed based on the simulated trajectories. For every simulated trajectory, the consecutive points in the trajectory are considered to create direction vectors. Let $T$ be a simulated trajectory of length $l$, $T = \{p_1, p_2, ..., p_l\}$. For all sets of consecutive points $\{p_{i-1}, p_i\}$ in the trajectory $T$, the trajectory's local direction vector is defined as $(p_i - p_{i-1})$, and the bin corresponding to the triangle $t$ in which the point $p_{i-1}$ is located and the descretized direction vector closest to the direction of $(p_i - p_{i-1})$ is updated by $1$. Let $\Psi(t,v)$ be the histogram function, then the function $F(\Omega)$ is defined as

\begin{equation}
F(\Omega) = \frac{1}{m}(((c-p_{i-1})\cdot v_{k-1})\Psi(t_j, v_{k-1}) + ((c-p_{i-1})\cdot v_{k})\Psi(t_j, v_{k}) +((c-p_{i-1})\cdot v_{k+1})\Psi(t_j, v_{k+1}) )
\end{equation}
, where $t_j$ is the triangle in the floor mesh, the point $p_{i-1}$ lies in and $v_k$ is the direction vector closest to $(p_i - p_{i-1})$.
\begin{equation}
k = \arg \max_k v_k \cdot (p_i - p_{i-1})
\end{equation}

Resolution of the Image($R$)

This component of $CCQM$ quantifies the resolution of the images. If the obtained images is far from the camera, the obtained resolution is very low and the image might not add any value to the system. This component is application dependent, it could be customized to obtain a sufficient resolution of any object, which could be just the face or the entire body of a human. We follow the methodology described by Janoos et al. in [2]. The function $R(\Omega)$ is defined as

\begin{equation}
R(\Omega) = \frac{1}{m} \sum \limits_{i=1}^{m} \frac{\rho^\Omega (t_i)}{\rho_{min}}
\end{equation}

Let $C$ be the center of the camera, then
\begin{equation}
\rho^\Omega = \frac{\sigma_{k-1}(c-p_{i-1})\cdot v_{k-1}+\sigma_{k}(c-p_{i-1})\cdot v_{k}+\sigma_{k+1}(c-p_{i-1})\cdot v_{k+1}}{2\pi * d(C,p_i-1)^2(1-\cos(\gamma/2) )}
\end{equation}

where
$\gamma$ is the Y-field of view defined for the camera
$d(p_1,p_2)$ is the Euclidean distance between the points $p_1$ and $p_2$
$\sigma$ is the number pixels that the object occupies on the image
and $\rho_{min}$ is the used defined value that defines a minimum required resolution of an object in $pixels/inch$

To calculate the number of pixels $\sigma$, the bounding box of the object is considered and perspective transformation is applied to the corners to find their location in the image. The area of the quadrilateral is used as the $\sigma$ value. if $(a, b, c, d)$ are the location of the corners in the image. Then
\begin{equation}
\sigma = \frac{1}{2}||ac \times bd||
\end{equation}

Now that we have a way to quantify the configuration of a camera using $CCQM$, in the next step we consider a heuristic optimization algorithm to maximize this quantity to obtain the optimum configuration.

[1] Wenjun Zhou; Hui Xiong; Yong Ge; Yu, J.; Ozdemir, H.; Lee, K.C., "Direction clustering for characterizing movement patterns," Information Reuse and Integration (IRI), 2010 IEEE International Conference on , vol., no., pp.165,170, 4-6 Aug. 2010

[2] F. Janoos, R. Machiraju, R. Parent, J. W. Davis, and A. Murray, "Sensor con guration for coverage optimization for surveillance applications," 2007.

Camera Network Placement Optimization

Wednesday, February 25, 2015

Step 3.3: Implementation - Optimization

References

Wednesday, January 28, 2015

Step 3.2: Implementation - Camera Coverage Quality Metric(CCQM)

Area of View (A):

Human Activity Volume (H):

Frontal View of Humans (F):

Resolution of the Image($R$)