- Simulate the possible trajectories in the scenario.
- Identify regions on the floor with high human occupancy.
This marks the end of the first part.
In the second part, we would like to use this information generated in the first part to optimize the camera network. To recap, the optimization would involve the maximization of
- area of view coverage ($A$),
- view locations with human activity volume ($H$), and
- frontal view of humans ($F$).
- resolution of the obtained image($R$)
The idea is to define a metric that quantifies the quality of the location and orientation of a camera based on the above three parameters. Let the location and orientation of the camera be defined by the parameters $\Omega$, which we will define later. The metric Camera Coverage Quality Metric (CCQM) is a function of $\{A, H, F,R\}$. Hence,
\begin{equation}
CCQM(\Omega) = g(A, H, F, R) = A(\Omega) * H (\Omega) * F(\Omega) * R(\Omega)
\end{equation}
where, $A$ is a function of $\Omega$ quantifying the area in view, $H$ is the function $\Omega$ quantifying the human activity in view, $F$ is a function of $\Omega$ quantifying the possible frontal images that can be obtained from the view and $R$ is a function of $\Omega$ quantifying the possible resolution that can be obtained from the view. Given this metric, the problem is to find the camera parameters $\Omega$ that maximizes the above metric. if $\Omega^*$ is the optimum parameters then.
\begin{equation}
\Omega^* = arg_{\Omega} max\{CCQM(\Omega)\}
\end{equation}
The clusters alone provides adequate information to maximize the parameters $\{A, H\}$, the record of simulated trajectories can be used for maximizing the parameter $\{F, R\}$. Given the the parameter $\Omega$ lets define the functions $\{A, H, F, R\}$. Assuming that the floor is represented by a triangular mesh containing triangles $\{t_1, t_2, ..., t_n\}$ with centroids $\{c_1, c_2, ..., c_n\}$. Given the configuration $\Omega$ let $\{t^{\Omega}_1, t^{\Omega}_2, ..., t^{\Omega}_m\}$ be all the triangles in view of the camera.
\begin{equation}
CCQM(\Omega) = g(A, H, F, R) = A(\Omega) * H (\Omega) * F(\Omega) * R(\Omega)
\end{equation}
where, $A$ is a function of $\Omega$ quantifying the area in view, $H$ is the function $\Omega$ quantifying the human activity in view, $F$ is a function of $\Omega$ quantifying the possible frontal images that can be obtained from the view and $R$ is a function of $\Omega$ quantifying the possible resolution that can be obtained from the view. Given this metric, the problem is to find the camera parameters $\Omega$ that maximizes the above metric. if $\Omega^*$ is the optimum parameters then.
\begin{equation}
\Omega^* = arg_{\Omega} max\{CCQM(\Omega)\}
\end{equation}
The clusters alone provides adequate information to maximize the parameters $\{A, H\}$, the record of simulated trajectories can be used for maximizing the parameter $\{F, R\}$. Given the the parameter $\Omega$ lets define the functions $\{A, H, F, R\}$. Assuming that the floor is represented by a triangular mesh containing triangles $\{t_1, t_2, ..., t_n\}$ with centroids $\{c_1, c_2, ..., c_n\}$. Given the configuration $\Omega$ let $\{t^{\Omega}_1, t^{\Omega}_2, ..., t^{\Omega}_m\}$ be all the triangles in view of the camera.
Area of View (A):
Then the area function $A(\Omega)$ is define as
\begin{equation}
A(\Omega) = \frac {area\_in\_view}{total\_area\_of\_floor}
= \frac {\sum\limits_{i=1}^m area(t^{\Omega}_i)}{\sum\limits_{i=1}^n area(t_i)}
\end{equation}
A(\Omega) = \frac {area\_in\_view}{total\_area\_of\_floor}
= \frac {\sum\limits_{i=1}^m area(t^{\Omega}_i)}{\sum\limits_{i=1}^n area(t_i)}
\end{equation}
Human Activity Volume (H):
The human occupancy map is calculated from the simulated trajectories and an occupancy value is assigned to every triangle in the floor mesh as described in the previous blog. Let $O(t)$ the occupancy of the traingle $t$. Then the function $H(\Omega)$ is defined as
\begin{equation}
H(\Omega) = \frac{1}{C}{\sum\limits_{i=1}^m O(t^{\Omega}_i)}
\end{equation}
where $C$ is a normalizing constant.
H(\Omega) = \frac{1}{C}{\sum\limits_{i=1}^m O(t^{\Omega}_i)}
\end{equation}
where $C$ is a normalizing constant.
Frontal View of Humans (F):
To quantify the probable amount of frontal view of humans for the the configuration $\Omega$, we make use of the simulated trajectories. For every triangle $t_i$ in the floor mesh, direction discretization is performed and eight direction vectors $\{v^i_1, v^i_2, ..., v^i_8\}$ are defined as described by Zhou et al. in [1] Figure 1
Figure 1
Figure 2
In the following step a vector transition histogram/matrix (Figure 2) is constructed based on the simulated trajectories. For every simulated trajectory, the consecutive points in the trajectory are considered to create direction vectors. Let $T$ be a simulated trajectory of length $l$, $T = \{p_1, p_2, ..., p_l\}$. For all sets of consecutive points $\{p_{i-1}, p_i\}$ in the trajectory $T$, the trajectory's local direction vector is defined as $(p_i - p_{i-1})$, and the bin corresponding to the triangle $t$ in which the point $p_{i-1}$ is located and the descretized direction vector closest to the direction of $(p_i - p_{i-1})$ is updated by $1$. Let $\Psi(t,v)$ be the histogram function, then the function $F(\Omega)$ is defined as
\begin{equation}
F(\Omega) = \frac{1}{m}(((c-p_{i-1})\cdot v_{k-1})\Psi(t_j, v_{k-1}) + ((c-p_{i-1})\cdot v_{k})\Psi(t_j, v_{k}) +((c-p_{i-1})\cdot v_{k+1})\Psi(t_j, v_{k+1}) )
\end{equation}
, where $t_j$ is the triangle in the floor mesh, the point $p_{i-1}$ lies in and $v_k$ is the direction vector closest to $(p_i - p_{i-1})$.
\begin{equation}
k = \arg \max_k v_k \cdot (p_i - p_{i-1})
\end{equation}
F(\Omega) = \frac{1}{m}(((c-p_{i-1})\cdot v_{k-1})\Psi(t_j, v_{k-1}) + ((c-p_{i-1})\cdot v_{k})\Psi(t_j, v_{k}) +((c-p_{i-1})\cdot v_{k+1})\Psi(t_j, v_{k+1}) )
\end{equation}
, where $t_j$ is the triangle in the floor mesh, the point $p_{i-1}$ lies in and $v_k$ is the direction vector closest to $(p_i - p_{i-1})$.
\begin{equation}
k = \arg \max_k v_k \cdot (p_i - p_{i-1})
\end{equation}
Resolution of the Image($R$)
This component of $CCQM$ quantifies the resolution of the images. If the obtained images is far from the camera, the obtained resolution is very low and the image might not add any value to the system. This component is application dependent, it could be customized to obtain a sufficient resolution of any object, which could be just the face or the entire body of a human. We follow the methodology described by Janoos et al. in [2]. The function $R(\Omega)$ is defined as
\begin{equation}
R(\Omega) = \frac{1}{m} \sum \limits_{i=1}^{m} \frac{\rho^\Omega (t_i)}{\rho_{min}}
\end{equation}
R(\Omega) = \frac{1}{m} \sum \limits_{i=1}^{m} \frac{\rho^\Omega (t_i)}{\rho_{min}}
\end{equation}
Let $C$ be the center of the camera, then
\begin{equation}
\rho^\Omega = \frac{\sigma_{k-1}(c-p_{i-1})\cdot v_{k-1}+\sigma_{k}(c-p_{i-1})\cdot v_{k}+\sigma_{k+1}(c-p_{i-1})\cdot v_{k+1}}{2\pi * d(C,p_i-1)^2(1-\cos(\gamma/2) )}
\end{equation}
where
$\gamma$ is the Y-field of view defined for the camera
$d(p_1,p_2)$ is the Euclidean distance between the points $p_1$ and $p_2$
$\sigma$ is the number pixels that the object occupies on the image
and $\rho_{min}$ is the used defined value that defines a minimum required resolution of an object in $pixels/inch$
To calculate the number of pixels $\sigma$, the bounding box of the object is considered and perspective transformation is applied to the corners to find their location in the image. The area of the quadrilateral is used as the $\sigma$ value. if $(a, b, c, d)$ are the location of the corners in the image. Then
\begin{equation}
\sigma = \frac{1}{2}||ac \times bd||
\end{equation}
Now that we have a way to quantify the configuration of a camera using $CCQM$, in the next step we consider a heuristic optimization algorithm to maximize this quantity to obtain the optimum configuration.
[1] Wenjun Zhou; Hui Xiong; Yong Ge; Yu, J.; Ozdemir, H.; Lee, K.C., "Direction clustering for characterizing movement patterns," Information Reuse and Integration (IRI), 2010 IEEE International Conference on , vol., no., pp.165,170, 4-6 Aug. 2010
[2] F. Janoos, R. Machiraju, R. Parent, J. W. Davis, and A. Murray, "Sensor con guration for coverage optimization for surveillance applications," 2007.