High-level API design for an Object Detection service that includes counting of object classes (e.g., counting cars, persons, animals, etc. in an image or video frame). This design follows a pattern similar to anomaly detection, sensor fault detection, etc., but now focuses on computer vision tasks—training, detecting, and retrieving detections/counts from images or video frames.


1. Overview

  1. Purpose
  2. Data Flow
    1. Data Ingestion: The system receives labeled or unlabeled images/videos.
    2. Model Training: Train object detection models on labeled data (bounding boxes, classes).
    3. Object Detection: For new images or frames, detect objects, generate bounding boxes, classify object types, and produce counts.
    4. Result Retrieval: Clients can retrieve the detection results, including bounding boxes, classes, confidence scores, and aggregated counts per class.

2. Common Object Detection Algorithms

  1. Faster R-CNN
  2. YOLO (You Only Look Once)
  3. SSD (Single Shot MultiBox Detector)
  4. RetinaNet
  5. Transformer-based Approaches (e.g., DETR)

3. API Endpoints

3.1. Image/Video Ingestion (Optional)

Depending on your system architecture, you might store images/videos in a dedicated storage layer (S3, local file system, etc.) rather than sending them directly through the API. If you prefer to ingest them via the API:

Endpoint:

POST /api/v1/object-detection/data

Description: