COCO Traffic Dataset: Extension & Refinement

The Challenge

The widely-used COCO dataset is a benchmark for object detection, but its traffic_light class was too broad for applications requiring precise state recognition (e.g., autonomous driving systems). Models needed to differentiate between red, green, or N/A traffic lights. Furthermore, existing datasets often contained inconsistencies or lacked sufficient diversity for specific real-world scenarios. The challenge was to meticulously refine and extend this critical dataset to enable more accurate and context-aware traffic light detection.

My Solution

I initiated and led the effort to create COCO Traffic, a significant extension and refinement of the official COCO 2017 dataset. This involved:

Precision Relabeling: Meticulously reviewing and relabeling over 10,000 existing traffic_light annotations into three distinct states: traffic_light_red, traffic_light_green, and traffic_light_na (not applicable), directly addressing the need for state-specific detection.

Dataset Extension: Curating and annotating additional images from the LISA Traffic Light dataset, extending COCO Traffic to include more diverse traffic light scenarios, particularly from a driver's perspective.

Custom Tooling: Developing bespoke tools like dataLabeller (to efficiently modify category IDs), make_yolo_labels.py (for format conversion), and utilizing pre-labeling with DETR models to streamline the extensive annotation process.

Quality Assurance: Identifying and addressing inconsistencies and mislabeled instances within the original COCO dataset, ensuring higher data integrity for the refined subsets (COCO Refined, COCO Traffic, COCO Traffic Extended).

The result was a highly curated and robust dataset specifically tailored for traffic-related object detection tasks.

The Outcome

Official Recognition & Impact: My work is featured on the official COCO dataset website, a significant acknowledgment of its quality and contribution to the computer vision community.

Enhanced Model Utility: Created a specialized dataset that enables the training of object detection models capable of discerning traffic light states, crucial for applications like driver assistance systems (as demonstrated by my iOS Driver Assistant App).

Improved Data Quality: Identified and corrected thousands of inconsistencies in a widely-used benchmark dataset, contributing to more reliable model training for future research.

Demonstrated Data Engineering Expertise: Showcased end-to-end capabilities in custom dataset creation, large-scale annotation management, tool development for data curation, and quality assurance in machine learning.

COCO Traffic Dataset: Extension & Refinement

The Challenge

My Solution

The Outcome

Driver Assistant iOS App

Basics of Object Detection with CNNs