Bartosz Ptak’s PhD Defence on Computer Vision!
Yesterday, our team member Bartosz Ptak successfully defended his PhD thesis titled “Point-oriented object localization and tracking in low-altitude aerial imagery”!
Abstract
Drone-based crowd monitoring is a key technology for applications in surveillance, public safety, and event management, primarily due to its dynamic, aerial perspective that surpasses the limitations of traditional ground-based systems. Recently, a new trend has emerged in tiny object localization and tracking, characterized by the use of point-oriented object sensing, which enables accurate monitoring of densely packed individuals in low-altitude aerial imagery. In this dissertation, advancements in this area are presented, including novel approaches for point-oriented object localization and a new solution for point-oriented object tracking. For localization and counting tasks, a series of enhancement mechanisms is introduced. These include the integration of motion-based features, the use of task-oriented synthetic data, and addressing the influence of varying image input resolutions in neural networks. A direct incorporation of drone altitude into the neural network architecture is also investigated, a new module that processes all pixels of high-resolution images without downscaling is proposed, and a novel loss function tailored to point-oriented localization is introduced. For object tracking and trajectory counting, an algorithm is proposed that enhances trajectory continuity and unique counting reliability in drone-based crowd monitoring, enabling the accurate tracking of individuals across video sequences. The approach extends the Simple Online and Real-time Tracking (SORT) framework by replacing the bounding-box assignment with a point-distance metric. It is further enhanced with three cost-effective techniques: camera motion compensation, altitude-aware assignment, and classification-based trajectory validation. Additionally, Deep Discriminative Correlation Filters (DDCF) are integrated, which reuse spatial feature maps from localization algorithms to improve computational efficiency and handle missed detections. To support this research, two new datasets, UP-COUNT and UP-COUNT-TRACK, are introduced, addressing challenges in modern drone imagery, including simultaneous camera and object motion, as well as changing flight altitudes. All proposed methods are quantitatively evaluated on both the publicly available DroneCrowd dataset and new datasets, demonstrating significant improvements in localization and tracking performance and achieving state-of-the-art results in drone-based people and trajectory counting. This dissertation makes substantial contributions to computer vision in aerial robotics, offering practical tools for rapid crowd size and movement estimation. These tools have been demonstrated to be applicable in real-world scenarios.
Comments