Wednesday, December 10, 2014

Step 3.0: Implemantation

In this section, i'd like to describe the construction of the 3D environment and the setup for collecting data. This was part of my previous research and this is just a recap to make the work complete.

First, we present how we build our 3D models. Then we describe a method to embed virtual cameras in the 3D model that represent the cameras in real world. Finally we will see how a human subject detected in the image of a real world camera can be projected onto a point in our 3D model.

Modeling 3D geometry:

We model the 3D geometry of the environment like floors, walls, hallways, etc. using Google Sketchup, a 3D modeling tool. Figure 1 depicts the 3D model of a building constructed using existing floor plans to obtain the measurements and dimensions. We then export the 3D model using a common digital asset exchange format [1] called COLLADA file format which we later use for rendering and understanding the 3D environment. COLLADA Document Object Model (DOM) library is used to load and save this 3D model into an application, and then we use OpenGL to interact with this 3D data in the application. The rendered model of one of the floors using OpenGL is shown in Figure 2.

Figure 1.

Figure 2.


Embedding virtual cameras and calibration:

An initial step is to create virtual cameras in our 3D model which represent the cameras in real world. In order to do this we first determine the internal camera parameters of the existing real world camera by using a general calibration approach using a checkerboard. Once the camera's internal parameters are obtained, we can use OpenGL to create virtual cameras in our model which render perspective projections of the 3D model that are conceptually equivalent to the real world cameras. Now in order to determine the location and orientation of the camera in our 3D model, we take an image from the real world camera and try to manually register it with the corresponding camera's perspective projection in our graphics model, by manually changing the parameters in the transformation matrix using OpenGL. When the images register as shown in Figure 3, we extract the transformation matrix of the camera which gives us the approximate location and orientation of the camera in the 3D model [2].

Figure 3.

Delaunay triangulation of the floor mesh:

We choose to represent the floor using a triangular mesh though other representation are possible. For our purpose we would like a rich description of the triangular mesh representing the floor where  human subjects walk. Triangles in the mesh should have adequate height and base with respect to the normal human motion characteristics. Assuming that the humans walk at an average pace of 3 ft/sec and the camera in use has a frame rate of 30 fps, if we plan to take a sample every 10 seconds or for every feet the human moves, it would be convenient if the triangles have a height and base that are at least 1 ft. long. We use Delaunay triangulation to obtain a mesh that is uniformly spaced as shown in Figure 4. An implementation of the Delaunay triangulation is available in the Computational Geometric Algorithms Library (CGAL) [3].
Figure 4.


References:

[1] R. Arnaud and M. C. Barnes. Collada: Sailing the Gulf of 3d Digital Content Creation. AK Peters Ltd, 2006.
[2] P. Shirley and M. Ashikhmin. Fundamentals of Computer Graphics, Second Edition. Ak Peters Series. Peters, 2005.
[3] Cgal, Computational Geometry Algorithms Library. http://www.cgal.org.

Monday, December 1, 2014

Step 3.1: Implementation - Identifying Regions with High Human Activity Cont...

The steps involved in Identifying Regions with High Human Activity are

  1. Identify nodes and assign probabilities
  2. Sample start and end nodes based on the probabilities
  3. Simulate trajectories from the starting node to the end node
  4. Calculate occupancy map from the trajectories
  5. Cluster regions based on their occupancy
To summarize, given the geometry of an infrastructure, the nodes are assigned probabilities as described in the earlier steps. Later the start and the end nodes are sampled as described in 3. Given the start and end, human motion trajectories are generated as described in the previous step. These tools provide a way to simulate an entire scenario in the infrastructure. 


Calculate occupancy map from the trajectories:

By simulating the previous steps multiple times and observing the trajectories provides a way to identify regions that have high human activity. 
Figure 1.

Figure 1 shows the occupancy map by simulating the 500 trajectories as described in the previous steps. 


Cluster regions based on their occupancy:

The next step is to cluster regions with high human activity. In the step, the regions that belong to the same cluster should have a high value of occupancy and also be located in the same spacial location. The feature set representing any point are the spatial co-ordinates and their occupancy i.e. 
(x, y, z, o), where x,y,z are the 3D co-ordinates of the points and o the occupancy of the points. Expectation Maximization was used for cluster and the results are as shown below. 
Figure 2.

The mean and the standard deviation of the obtained clusters are.

ClusterXZO
RedMean:1784.91
Var:502.98
Mean:426.34
Var:48.08
Mean:0.0029
Var:0.0029
BlueMean:227.28
Var:25.95
Mean:2627.19
Var:252.04
Mean:0.002
Var:0.0016
PinkMean:725.58
Var:280.34
Mean:219.67
Var:25.62
Mean:0.0011
Var:0.0009
GreenMean:2617.26
Var:258.58
Mean:633.89
Var:31.24
Mean:0.0011
Var:0.0008
AquaMean:2617.2649
Var:258.5819
Mean:633.8928
Var:31.2469
Mean:0.001
Var:0.0008
Light PinkMean:1577.61
Var:1017.78
Mean:420.17
Var:163.06
Mean:0
Var:0

In this scenario red cluster is identified to have the highest human activity followed by blue and pink.

The next step is to find camera calibration parameters that maximizes the view of these clusters and also provides maximal frontal view of the humans.

Tuesday, November 25, 2014

Step 3.1: Implementation - Identifying Regions with High Human Activity Cont...

Simulate trajectories from the starting node to the end node:

Contextual Trajectory Forecasting (CTF) is used to simulate trajectories from the start node to the end node. Given the 3D geometry of the environment and the starting point and destination of a human, CTF is assembled on two assumptions. First, the human would follow a path that requires the shortest time to reach the destination, and second, the human would adhere to certain behavioral norms that are observed when walking. CTF assigns probabilities to points on the floor such that consecutive points can be sampled from start to destination and form a trajectory that represents the shortest path while conforming to observed behavioral norms.

Distance Map:  

CTF algorithm takes as input a distance map to find points that are closer to the destination. This map calculates the distance to the destination from every other point on the floor.  Euclidean distance between two points is not altered by the presence of inaccessible areas in the path. Hence using Euclidean distance can potentially be erroneous. Martinez et al. defined geodesic distances in [1], which is used instead. Geodesic distance is measured around the inaccessible areas along the ground plane and gives a more accurate sense of distance for human navigation. A rendering of the distance map for geometry A with a given destination is shown in Figure 1.

Figure 1.

Accessibility Map: 

CTF also takes a human accessibility map as a second input. Hypothetically, if a large number of trajectories followed by human subjects from a particular start to destination are observed, it is possible that certain points on the ground plane are accessed more often then other points. This might imply the existence of a certain distribution or an accessibility map to the points on the ground plane. Estimating this map can assist the CTF in choosing points that are accessed often by complying with behavioral norms. Human motion is influenced by a multitude of factors, many of which are driven by perception. CTF specifically focuses on modeling this human motion by taking into account the constraints imposed by 3D geometry and the physical world. When traversing on the ground plane, the immediate decision of movement is influenced by the objects in the path and the surrounding geometry like walls. For example, the way humans navigate around tables and chairs when moving from one corner of a classroom to the opposite corner. Since the human behavior is assumed to be influenced by the 3D geometry, the aim is to model the relationship between them. This model would provide a means to estimate the accessibility map for any novel location based on its 3D geometry. 

This relationship between them was modeled based on empirical data. The model first represents a point on the ground plane using a set of geometric features that capture the 3D geometry of the environment surrounding that point. Then establishes a linear relationship between geometric features of the point and its observed occupancy.  Initially, the occupancy map of a known geometry was observed. Occupancy map is different from accessibility map. accessibility map depends only on the geometry of the environment and defines what areas are accessible by humans, on the other hand, occupancy map is generated by observing the trajectories taken by the human subjects in the environment and represents the frequency with which they are occupied. Occupancy map depends on the geometry and also the frequency with which the nodes are accessed in the geometry. Given a geometry, the accessibility map is fixed, but the occupancy map can change, if the location of the node or their purpose in the environment change. For example in a movie theater, if the location of the ticket counter changes, the occupancy map changes but the accessibility map remains the same. In this case, we are observing the actual trajectories of the humans and hence we are observing the occupancy map and not the accessibility map. We use this occupancy map to estimate the accessibility map of the environment. Consider a dataset of humans traversing the ground plane whose surrounding 3D geometry is known. The occupancy of the point on the floor is proportional to the number of times the humans in the dataset has occupied that point. The observed occupancy map in a hallway observed over a period of 5 days is shown in Figure 2.
Figure 2.

Since CTF assumes that the occupancy of a point on the ground plane is influenced by the 3D geometry surrounding it, the geometric features of a point on the ground plane in the 3D model are represented as a set of numbers, which are its distances from the walls and objects surrounding. So, to obtain the geometric features the distances are measured to walls or objects in the hallway along vectors pointing at a certain inclination from the ground plane at regular interval spanning an entire circle with its tail fixed at the point as shown in Figure 3.

Figure 3.

The distances are measured consistently in either clockwise or anti-clockwise direction always starting from the closet object or wall. In order to confine the effect to only objects with in the close vicinity of the point, the distances are thresholded by a hemisphere as shown in Figure 3. The radius of this hemisphere is inferred from the Theory of Proxemics [2]. This is a theory based on observation that defines how human beings unintentionally make use of physical space around them. Proxemics classifies the space close to a human subject into four broad regions, intimate, personal, social  and public distance. It is assumed that the interaction between human subjects in closed hallways take place within the social distance. 

We assume a linear relationship between the geometric features of a point and its observed occupancy. Linear regression is performed to estimate the values of the parameters by minimizing the sum of squares of the error term as described in LR. To determine the accessibility of any point on the ground plane in a new geometry, first the geometric features of that point are computed and then the estimated parameter values from linear regression are used to estimate accessibility. Figure 4 depicts the estimated accessibility map.

Figure 4.

It can be observed how the occupancy of the points in the center of the hallway is higher than those along the edges. The rotational in-variance of the features allow for the expected estimation of the occupancy even along curved hallways.

Trajectory Forecasting:

CTF combines these two maps and assigns an energy value to every point on the ground plane. Let O be the occupancy map function and let D be the distance map function. Then the energy of the point p is defined by the function E as:

E(p)= -D(p)/O(p)

Figure 5

The energy function for geometry A is shown in Figure 5.  The energy is higher in the center of the hallway than along the edges, and the energy increases as the points get closer to the destination.  To forecast the trajectory from the starting node to destination node, points are sampled consecutively with a probability defined by the energy map. The points closer to the destination are sampled with a probability which is proportional to the difference in there energies

 ~= E(i) - E(i-1)

Figure 6 (a) shows the distribution of simulating the trajectory prediction algorithm 5,000 times without the use of the accessibility map, but only using distance minimization. Figure 6 (b) simulates with the help of the accessibility map. We can see how the estimated occupancy map complements the geodesic distance minimization and forms a more desirable trajectory, that conforms to expected human behavior.
(a)
(b)
Figure 6.


References

[1] D. Martinez, L. Velho, and P. C. Carvalho, “Computing geodesics on triangular meshes,” Comp. Graph., 2005.
[2] E. T. Hall, The Hidden Dimension. Anchor Books. ISBN 0-385-08476-5, 1966.

Step 3.1: Implementation - Identifying Regions with High Human Activity.

Given the 3D geometry of the environment, the goal is to find the location and orientation of cameras to construct an effective surveillance scenario as described in Step 1. A two stage approach is considered to solve this problem.

  1. Identify regions in the geometry that might have high human activity.
  2. Optimize camera placement to maximize the view of these regions.

Identifying regions of High Human Activity: 

Given an infrastructure we assume that humans follow trajectories with the goal of reaching a destination like an entrances, exit or a doorways. Let these destination be called nodes, based on the purpose a node serves in the infrastructure there is a certain probability associated with accessing it. For example, at an airport the ticket counter might have an higher probability of access than a coffee shop or a restroom. The knowledge of this probability can help us estimate the likely human motion activity in the infrastructure. Given the geometry of the environment along with the nodes and their assigned probabilities, the likely human motion in the scenario are simulated to identify regions of high human activity. The steps involved in this process are
  1. Identify nodes and assign probabilities
  2. Sample start and end nodes based on the probabilities
  3. Simulate trajectories from the starting node to the end node
  4. Calculate occupancy map from the trajectories
  5. Cluster regions based on their occupancy
Let us consider the following floor plan. 
Figure 1

Identify nodes and assign probabilities: 

In Figure 1, we would like to install a camera network that provides effective surveillance in the hallway. The nodes identified are numbered as shown. Given this geometry and the nodes, the first task is to assign the probabilities. The probability of accessing a node is defined by its purpose in the scenario. In this case, we assume that it is proportional to the accommodation capacity of the room.  Which means the higher the capacity of a room to seat humans, the higher is the probability of accessing it. The following probabilities were assigned. 
Node
Capacity
Probability
0
60
0.46
1
10
0.07
2
10
0.07
3
10
0.07
4
-
entry/exit
5
10
0.07
6
20
0.15
7
-
entry/exit
8
10
0.07
Assuming that all humans entering an infrastructure will exit at some point, the nodes marked as entry/exit points are not assigned a probability.

Sample start and end nodes based on the probabilities:

We assume that any human entering the hallway will exit at some point of time. First an entry (4,7) was chosen with equal probability, then a node was chosen that is not an exit based on the probability assigned as shown in the table. Now that the human has transitioned to the node, the human can either choose to transition to another node or exit with equal probabilities. If the human chooses to exit, the closest exit is chosen else, the human chooses to go another node based on a calculated probability. The probability of choosing the second node changes because the node that the human is currently in is eliminated when calculating the probability for other nodes. Sample output obtained from simulating this scenario is shown in Figure 2.
Figure 2




Friday, November 14, 2014

Step 2.2: Literature Review

Camera network optimization is an important problem in computer vision and has been explored by many researchers. Most of the early works were based on a single camera focused on a static object, and the problem was to find the best position for the camera that maximize the quality of features on the object [1, 2].  Later, Chen and Davis in [3] proposed a metric that evaluates the quality of multiple camera network configurations. The metric assesses the system based on their resolution and occlusion characteristics. The configuration is optimized based on this metric such that minimum occlusion occurs by ensuring a certain resolution. Mittal and Davis in [4] suggested a probabilistic approach for visibility analysis. The probability of visibility of an object from at least one camera was calculated. Then a cost function is defined that maps the sensor parameters to the probability. Simulated annealing is performed to minimize the cost function.

Erdem and Sclaroff in [5] suggested a binary optimization approach for the camera placement problem. The polygon representing the space is fragmented into an occupancy grid and and the algorithm tries to minimize the camera set while maintaining some specified spatial resolution. Horster and Lienhart in [6, 7, 8] proposed a linear programming approach that determines the calibration for each camera in the network that maximizes the coverage of the space assuring a certain resolution. Ram et al. in [9] proposed a performance metric that evaluates the probability of accomplishing a task as a function of set of cameras and their placement. The metric allows the camera system to be evaluated in a "directional aware" sense, i.e. that metric realizes that only images obtained in a certain direction (frontal image of the person) are useful.  This metric is maximized to estimate the camera configuration. Bodor et al. in [10] proposed a method, where the goal is to maximize the aggregate observability across multiple cameras. An objective function is defined that measures the resolution of the image and the object motions (trajectories) in the scene. A variant of hill climbing method was used to maximize this objective function.

Murray et al. in [12] applied coverage optimization combined with visibility analysis to address this problem. For each camera location, the coverage was calculated using visibility analysis. Maximal covering location problem (MCLP) and backup coverage location problem (BCLP) were used to model the optimum camera combinations and locations. Malik and Bajscy in [13] suggested a method for optimizing the placement of multiple stereo cameras for 3D reconstruction. An optimization framework was defined using an error based objective function that quantifies the stereo localization error along with some constraints. Genetic algorithms were used to generate a preliminary solution and later refined using gradient descent. Kim and Murray in [14] also employed BCLP to solve the camera coverage problem. They suggested an enhanced representation of the coverage area by representing it as an continuous variable as opposed to using a discrete variable to represent the whole area. The work in [15, 16] also employed a combination of MCLP and BCLP for solving the optimum camera coverage problem. The former problem takes into consideration the 3D geometry of the environment and supplemented the MCLP/BCLP problem by including a minimal localization error variable for both monoscopic and stereoscopic cameras. The optimization problem was solved using simulated annealing. In the latter the MCLP/BCLP problem was supplemented using visibility analysis for optimization. Huang et al. in [17] proposed a 2-approximation algorithm, the first part proposes a solution for the minimum watchmen tour problem and places cameras along the estimated tour, the second one finds the solution to art gallery problem and adds extra cameras to connect the guards.

Considering the 3D geometry of the environment is of significant value for the camera coverage optimization problem. This work deals with indoor scenarios and a complete 3D model of the environment where the camera network would be deployed is designed. To the best of our knowledge, this is the first work that does not need any observations of the human activity in scenario for designing an optimal camera network. The only input to this model is the 3D geometry of the environment. In [10, 18]  the observed human activity (trajectories) were used to find an optimal camera position, unlike this, in the proposed work the human trajectories are simulated in order to identify areas with high volume of human activity. Furthermore in [9] the camera position is optimized to maximize the frontal view of the humans, which again required observation, again the proposed work does not require any training to maximize the frontal view. The directional information of the trajectories were used to maximize the frontal view of the humans. Finally, the human behavior in a given scenario is influenced by the 3D geometry of that environment. To the best of our knowledge, this is the first work that incorporates this information to optimize the camera network locations for video surveillance.


Update: not the first to use 3d model "automated camera placement for large scale surveillance networks"

References

[1] K. Tarabanis, P. Allen, and R. Tsai, "A survey of sensor planning in computer vision," Robotics and Automation, IEEE Transactions on, vol. 11, pp. 86-104, Feb 1995.
[2] S. Fleishman, D. Cohen-Or, and D. Lischinski, "Automatic camera placement for image-based modeling," in Computer Graphics and Applications, 1999. Proceedings. Seventh Pacific Conference on, pp. 12-20, 315, 1999.
[3] X. Chen and J. Davis, "Camera placement considering occlusion for robust motion capture," tech. rep., 2000.
[4] A. Mittal and L. Davis, "Visibility analysis and sensor planning in dynamic environments," in Computer Vision - ECCV 2004 (T. Pajdla and J. Matas, eds.), vol. 3021 of Lecture Notes in Computer Science, pp. 175-189, Springer Berlin Heidelberg, 2004
[5]U. M. Erdem and S. Sclaro , "Optimal placement of cameras in floorplans to satisfy task requirements and cost constraints," in In Proc. of OMNIVIS Workshop, 2004.
[6] E. Hrster and R. Lienhart, "Calibrating and optimizing poses of visual sensors in distributed platforms," Multimedia Systems, vol. 12, no. 3, pp. 195-210, 2006.
[7] E. Horster and R. Lienhart, "Approximating optimal visual sensor placement," in Multimedia and Expo, 2006 IEEE International Conference on, pp. 1257-1260, July 2006.
[8] E. Horster and R. Lienhart, "On the optimal placement of multiple visual sensors," in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, VSSN '06, (New York, NY, USA), pp. 111-120, ACM, 2006.
[9] S. Ram, K. R. Ramakrishnan, P. K. Atrey, V. K. Singh, and M. S. Kankanhalli, "A design methodology for selection and placement of sensors in multimedia surveillance systems," in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, VSSN '06, (New York, NY, USA), pp. 121-130, ACM, 2006.
[10] R. Bodor, A. Drenner, P. Schrater, and N. Papanikolopoulos, "Optimal camera placement for automated surveillance tasks," Journal of Intelligent and Robotic Systems, vol. 50, no. 3, pp. 257-295, 2007.
[11] F. Janoos, R. Machiraju, R. Parent, J. W. Davis, and A. Murray, "Sensor con guration for coverage optimization for surveillance applications," 2007.
[12] A. T. Murray, K. Kim, J. W. Davis, R. Machiraju, and R. Parent, "Coverage optimization to support security monitoring," Computers, Environment and Urban Systems, vol. 31, no. 2, pp. 133-147, 2007.
[13] R. Malik and P. Bajcsy, "Automated Placement of Multiple Stereo Cameras," in The 8th Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras - OMNIVIS, (Marseille, France), Rahul Swaminathan and Vincenzo Caglioti and Antonis Argyros, 2008.
[14] K. Kim and A. T. Murray, "Enhancing spatial representation in primary and secondary coverage location modeling*," Journal of Regional Science, vol. 48, no. 4, pp. 745-768, 2008.
[15] K. Yabuta and H. Kitazawa, "Optimum camera placement considering camera specification for security monitoring," in Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pp. 2114-2117,May 2008.
[16] B. Debaque, R. Jedidi, and D. Prevost, "Optimal video camera network deployment to support security monitoring," in Information Fusion, 2009. FUSION '09. 12th International Conference on, pp. 1730-1736, July 2009.
[17] H. Huang, C.-C. Ni, X. Ban, J. Gao, A. Schneider, and S. Lin, "Connected wireless camera network deployment with visibility coverage," in INFOCOM, 2014 Proceedings IEEE, pp.  204-1212, April 2014.
[18]  F. Janoos, R. Machiraju, R. Parent, J. W. Davis, and A. Murray, "Sensor con figuration for coverage optimization for surveillance applications," 2007.

Tuesday, November 11, 2014

Step 2.1: Literature Search

Some very interesting research has been done in this area. Here are a list of papers.

2014

  • Connected wireless camera network deployment with visibility coverage. [1] 

2009

  • Optimal video camera network deployment to support security monitoring. [2]

2008

  • Optimum camera placement considering camera specification for security monitoring. [3]
  • Enhancing Spatial Representation In Primary And Secondary Coverage Location Modeling. [4]
  • Automated Placement of Multiple Stereo Cameras. [5]

2007

  • Coverage optimization to support security monitoring. [6]
  • Sensor configuration for coverage optimization for surveillance applications. [7]
  • Optimal Camera Placement for Automated Surveillance Tasks. [8]

2006

  • A Design Methodology for Selection and Placement of Sensors in Multimedia Surveillance Systems. [9]
  • On the Optimal Placement of Multiple Visual Sensors. [10]
  • Approximating Optimal Visual Sensor Placement. [11] 
  • Calibrating and optimizing poses of visual sensors in distributed platforms. [12]

2004

  • Visibility Analysis and Sensor Planning in Dynamic Environments. [13]
  • Optimal placement of cameras in floorplans to satisfy task requirements and cost constraints. [14]

2000

  • Camera Placement Considering Occlusion for Robust Motion Capture. [15]

1999

  • A survey of sensor planning in computer vision. [16]

1995

  • Automatic camera placement for image-based modeling. [17]
References


[1] H. Huang, C.-C. Ni, X. Ban, J. Gao, A. Schneider, and S. Lin, "Connected wireless camera network deployment with visibility coverage," in INFOCOM, 2014 Proceedings IEEE, pp. 1204{1212, April 2014.

[2] B. Debaque, R. Jedidi, and D. Prevost, "Optimal video camera network deployment to support security monitoring," in Information Fusion, 2009. FUSION '09. 12th International Conference on, pp. 1730{1736, July 2009.

[3] R. Malik and P. Bajcsy, "Automated Placement of Multiple Stereo Cameras," in The 8th Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras - OMNIVIS, (Marseille, France), Rahul Swaminathan and Vincenzo Caglioti and Antonis Argyros, 2008.

[4] K. Kim and A. T. Murray, "Enhancing spatial representation in primary
and secondary coverage location modeling*," Journal of Regional Science, vol. 48, no. 4, pp. 45{768, 2008.

[5] K. Yabuta and H. Kitazawa, "Optimum camera placement considering camera specification for security monitoring," in Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pp. 2114{2117, May 2008.

[6] R. Bodor, A. Drenner, P. Schrater, and N. Papanikolopoulos, "Optimal camera placement for automated surveillance tasks," Journal of Intelligent and Robotic Systems, vol. 50, no. 3, pp. 57{295, 2007.

[7] A. T. Murray, K. Kim, J. W. Davis, R. Machiraju, and R. Parent, "Coverage optimization to support security monitoring," Computers, Environment and Urban Systems, vol. 31, no. 2, pp. 133 { 47, 2007.

[8] F. Janoos, R. Machiraju, R. Parent, J. W. Davis, and A. Murray, "Sensor con figuration for coverage optimization for surveillance applications," 2007.

[9] E. Horster and R. Lienhart, "Approximating optimal visual sensor placement," in Multimedia and Expo, 2006 IEEE International Conference on, pp. 1257{1260, July 2006.

[10] E. Hrster and R. Lienhart, "Calibrating and optimizing poses of visual sensors in distributed platforms," Multimedia Systems, vol. 12, no. 3, pp. 195{210, 2006.

[11] S. Ram, K. R. Ramakrishnan, P. K. Atrey, V. K. Singh, and M. S. Kankanhalli, "A design methodology for selection and placement of sensors in multimedia surveillance systems," in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, VSSN '06, (New York, NY, USA), pp. 121{130, ACM, 2006.

[12] E. Horster and R. Lienhart, "On the optimal placement of multiple visual sensors," in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, VSSN '06, (New York, NY, USA), pp. 111{120, ACM, 2006.

[13] A. Mittal and L. Davis, "Visibility analysis and sensor planning in dynamic environments," in Computer Vision - ECCV 2004 (T. Pajdla and J. Matas, eds.), vol. 3021 of Lecture Notes in Computer Science, pp. 175{189, Springer Berlin Heidelberg, 2004.

[14] U. M. Erdem and S. Sclaro , "Optimal placement of cameras in floorplans to satisfy task requirements and cost constraints," in In Proc. of OMNIVIS Workshop, 2004.

[15] X. Chen and J. Davis, "Camera placement considering occlusion for robust motion capture," tech. rep., 2000.

[16] S. Fleishman, D. Cohen-Or, and D. Lischinski, "Automatic camera placement for image-based modeling," in Computer Graphics and Applications, 1999. Proceedings. Seventh Pacific Conference on, pp. 12{20, 315, 1999.



[17] K. Tarabanis, P. Allen, and R. Tsai, "A survey of sensor planning in computer vision," Robotics and Automation, IEEE Transactions on, vol. 11, pp. 86{104, Feb 1995.


Step 1: Motivation and Idea

Motivation
Surveillance is an integral part of many public area infrastructures like airports, banks and train stations. Networked camera systems are the most common mode of surveillance. Given the infrastructure or the geometry of the environment, the positioning and orientation of the cameras can play a significant role in constructing an effective surveillance scenario. The definition of effective surveillance is subjective and is primarily defined by the specific scenario. For example, the deployment of cameras in a public area like a movie theater is more relaxed compared to an airport. In a movie theater it might be sufficient to deploy cameras at locations that exhibit large human activity, but in an airport it is imperative to deploy cameras such that a maximum visibility coverage can be obtained irrespective of the amount of human activity in the area. Despite this, some common factors have to be taken into consideration before deploying the cameras.

Visibility Coverage: In high security scenarios, the camera configuration should be optimized such that a maximal coverage of the infrastructure can be obtained. In low security scenarios, the camera configuration should at least guarantee the coverage of all the areas that have large human volume. The configuration should also guarantee the coverage of the most frequently used entry and exit points in the infrastructure. Furthermore, a camera configuration that captures the frontal image of the humans as opposed to their posterior images is more effective. 

Deployment Cost: On the other hand it is important to take into consideration the cost effectiveness of a camera deployment configuration. The configuration should guarantee maximal coverage while deploying the least number of cameras. Furthermore, having a minimal required number of  cameras has a significant impact on the available storage space. Eliminating a single redundant camera can save storage space up to 720 hours of video every month. HD cameras are becoming more prevalent these days and require higher storage space.

Given the geometry of the infrastructure, designing a camera deployment configuration manually by taking into consideration the above factors can be extremely tedious and error prone. Automated camera network deployment optimization techniques are very essential for a cost effective and safe environment. Furthermore, it is unusual to consider the configuration of the camera network during the design and construction of an infrastructure. Factoring this in during the design planning can be very cost effective. It can help in making design decision that result in a safer environment by avoiding areas where the deployment of cameras may be hard. Most camera networks require cabling, knowing the location of the cameras a head of time can decrease any excess cost associated with it.

Idea
The problem is that, given the 3D geometry of the infrastructure, what configuration of camera deployment will provide effective surveillance. As discussed before, the idea of  effective surveillance is subjective. Though a scenario that has maximal visibility coverage can be considered to be effective, such problems have been explored before as a more prominent Art Gallery Problem [1] which are considered NP-hard. In this work, a network configuration is considered to provides effective surveillance if,
  • it provides coverage of the most common entries and exits in the geometry.
  • it provides coverage of all the areas in the geometry that have high volume of human activity.
  • it provides a maximal amount of frontal view of the person as possible instead of their posterior.
Identifying the prominent entrances/exits: All public infrastructures have entrances, exits and points of interest. Any doorway can be considered as an entrance or an exit, for simplicity they are referred to as a node. In an infrastructure different nodes are accessed with different frequencies. A node representing a common entrance or an exit has a high frequency of access as opposed to an employees personal office. So the first step would be to identify the nodes in an infrastructure along with their probability of access. 

Identifying areas with high volume of human activity: Areas with high volume of human activity can be identified by simulating the trajectories humans would follow in an infrastructure and locating the areas that have high volume of access. In general Humans traverse hallways with the objective of reaching a destination or a goal (unless loitering). Assuming that the trajectories within the infrastructure are generated by humans trying to reach between different nodes, and as the probability of the nodes were already estimated, the start and the end nodes can be sampled according to the probabilities and a trajectory can be simulated from the beginning to the end. By simulating the trajectories multiple times, areas of high access can be identified. 

Identifying orientation to maximize frontal view: The simulated trajectories in the previous step also has information regarding the direction of motion. The idea is to choose a camera location and orientation such that the camera points in a direction that is opposite to direction of motion of most simulated trajectories. 

To summarize, the idea is to identify camera locations and orientations in the infrastructure that optimizes the above constraints.


References