COCO Traffic Dataset: Extension & Refinement

Created and curated COCO Traffic, an extension of the COCO dataset, by relabeling over 10,000 traffic light annotations into specific states (red, green, N/A) and extending with external data, significantly enhancing its utility for traffic-related object detection. Featured on the official COCO dataset website.
The Challenge
The widely-used COCO dataset is a benchmark for object detection, but its traffic_light
class was too broad for applications requiring precise state recognition (e.g., autonomous driving systems). Models needed to differentiate between red
, green
, or N/A
traffic lights. Furthermore, existing datasets often contained inconsistencies or lacked sufficient diversity for specific real-world scenarios. The challenge was to meticulously refine and extend this critical dataset to enable more accurate and context-aware traffic light detection.
My Solution
I initiated and led the effort to create COCO Traffic, a significant extension and refinement of the official COCO 2017 dataset. This involved:
Precision Relabeling: Meticulously reviewing and relabeling over 10,000 existing traffic_light
annotations into three distinct states: traffic_light_red
, traffic_light_green
, and traffic_light_na
(not applicable), directly addressing the need for state-specific detection.
Dataset Extension: Curating and annotating additional images from the LISA Traffic Light dataset, extending COCO Traffic to include more diverse traffic light scenarios, particularly from a driver's perspective.
Custom Tooling: Developing bespoke tools like dataLabeller
(to efficiently modify category IDs), make_yolo_labels.py
(for format conversion), and utilizing pre-labeling with DETR models to streamline the extensive annotation process.
Quality Assurance: Identifying and addressing inconsistencies and mislabeled instances within the original COCO dataset, ensuring higher data integrity for the refined subsets (COCO Refined, COCO Traffic, COCO Traffic Extended).
The result was a highly curated and robust dataset specifically tailored for traffic-related object detection tasks.
The Outcome
Official Recognition & Impact: My work is featured on the official COCO dataset website, a significant acknowledgment of its quality and contribution to the computer vision community.
Enhanced Model Utility: Created a specialized dataset that enables the training of object detection models capable of discerning traffic light states, crucial for applications like driver assistance systems (as demonstrated by my iOS Driver Assistant App).
Improved Data Quality: Identified and corrected thousands of inconsistencies in a widely-used benchmark dataset, contributing to more reliable model training for future research.
Demonstrated Data Engineering Expertise: Showcased end-to-end capabilities in custom dataset creation, large-scale annotation management, tool development for data curation, and quality assurance in machine learning.