The paper is published in the International Conference on Contemporary Computing (IC3) in 2019. For more details click here.
- A Faster RCNN based object detection model is used to identify human heads.
- A congestion control early warning system (CCEWS) is proposed to generate alert signals in case of crowd congestion.
- The model is trained and evaluated on the private dataset, simulating the crowd behavior on the foot over bridges.
- The proposed approach performed better as compared to single shot detecter and F-RCN models.
- The source code and setup detials are available on my github page here.
Faster RCNN: Overview
The overall architecture design is shown in Fig. 1
- Given an image, we first employ a pre-trained CNN, such as VGG16, to extract feature maps.
- Use Region Proposal Network (RPN), to detect the regions that might contain objects in the feature map with the help of anchor boxes.
- Faster RCNN uses 9 anchor boxes at a position of an image as shown in Fig. 2.
- RPN outputs bounding boxes/features of different sizes with some confidence value of being an object. Different sized regions means different sized CNN feature maps. More details are available here.
- Region of Interest (RoI) pooling is applied to filter and resize the bounding boxes/features into same size. More details are available here.
- RoIPooling splits the input feature map into a fixed number of equal sections, and then apply Max-Pooling on every region.
- Finally objects are detected and classified with the classification score for corresponding label.
Consider two flags Good_flag representing normal situation and Bad_flag representing congestion situation.
- Train faster RCNN model.
- Do object detection and get the coordinates of bounding boxes.
- Find the area and centroid for each detected bounding box.
- Maintain the previous area and previous centroid to compare the boxes in the consecutive frames.
- Track the objects by comparing centroids in two frames:
- Points which are closer to each other in the two frames belong to the same object.
- Closeness is measured by finding the Euclidean distance of a centroid with other centroids in the previous frame and hence closest centroid means that it is the same object.
- Compare the area of the bounding boxes in two frames belonging to the same object to get the direction of motion of the crowd.
- If the motion is in one direction then increment the good_flag by 2.
- If an equal number of people are moving in the opposite direction and the total number of people is greater than a number of people allowed in the frame then increment the bad_flag by 2.
- If an unequal number of people are moving in the opposite direction and the total number of people is greater than the number of people allowed in the frame then increment the bad_flag by 1.
- This also follows the sensitivity parameter which ranges between 0 and 1 with 1 being more sensitive. 1 means that if people are moving in one direction then no one can come in opposite direction if comes then increment bad_flag by 1.
- Do steps 7,8,9 for 20 frames and compare good_flag and bad_flag.
- If bad_flag > good_flag then stampede alert and save the frame in the output for analysis.
The CCEWS is introduced to analyze the crowd behavior and generate the warning/alert signals to execute the contingency plans in order to control the congestion and prevent from mishap The system follows three tasks of object detection, object tracking, and object motion direction. Each of this task is achieved by following the modified faster R-CNN architecture (because of the requirement of one class classification only first module of faster R-CNN, RPN is required), centroid-based algorithm and analysis of the output of these two tasks for abnormality detection concerning the crowd motion respectively.
For more details please refer my paper here.