Camera Image Ingestion — Overview

This document is the single entry point for how images get from a physical sensor into the central ROS 2 host in the PhotogrammetricWAAM system.

There are exactly three hardware pipelines and two roles each camera can serve. Everything else in this folder (and in PhotogrammetricWAAM-Edge/ and ros2_ws/launch/) is one concrete instance of that 3 × 2 matrix.


TL;DR — The 3 × 2 matrix

HardwareEdge hostWireEdge softwareRole (1) HQ stillsRole (2) low-latency MJPG
IMX708 (Pi Cam 3)Raspberry Pi (CSI)WiFi/ETHsimple_picamera2_streamer/app.pypicamera2cv2.imencode → HTTP /stream /jpg /set✅ (planned)✅ (current default, 8 Hz)
OV2640 / OV5640XIAO ESP32-S3 Sense (DVP)WiFi/ETHCameraWebServer_for_esp-arduino_3.0.x.inoesp_camera → MJPG on :81/stream✅ (planned)✅ (current default)
DSLR (Canon / Nikon / Sony)Raspberry Pi (USB)WiFi/ETHmqtt__gphoto2_delegate.pygphoto2 capture-and-download✅ (only role)✗ (not supported)

ROS 2 client side (the kernel host) is uniform across all three: it runs image_publisher_node against either an MJPG stream URL (roles 2, IMX708 + ESP32S3) or consumes the on-disk JPG/RAW that the gphoto2 delegate dumps (role 1, DSLR and on-demand IMX708/ESP32S3). See ros2_ws/launch/image_publisher_client/README.md.


The two roles, in detail

Every photogrammetry-grade camera in this system serves one of two roles at any given moment. The IMX708 and the ESP32-S3-attached OV2640/OV5640 are capable of either role; the DSLR is permanently locked to role (1).

Role (1) — Highest-fidelity still producer

Goal: Best possible JPG (or RAW) of one moment in time, on demand or at a slow cadence. Latency does not matter.

  • Maximum sensor resolution (e.g. IMX708 4608×2592, OV5640 2592×1944, DSLR ≥24 MP).
  • Highest JPG quality (low quantisation) — or RAW where available.
  • Capture is triggered, not free-running. One trigger → one (or one stack of) frames written to durable storage with a session ID.
  • Consumed by the photogrammetry / SfM pipeline downstream — not by RViz/Foxglove.
  • Control plane: MQTT (recipient-based topics — see mqtt__gphoto2_delegate.spec.md for the canonical request/response shape).

Role (2) — Lowest-latency MJPG streamer

Goal: Real-time monitoring on the ROS 2 graph (visible in rqt_image_view, Foxglove, RViz). Per-frame fidelity is sacrificed for timeliness.

  • Down-rezzed and/or higher JPG compression (e.g. IMX708 at 2304×1296 @ 8 Hz, ESP32-S3 OV5640 typically HD/SVGA).
  • Continuous MJPG over plain HTTP (multipart/x-mixed-replace).
  • Encoded once at the edge, decoded once on the ROS 2 client by image_publisher_node, republished as sensor_msgs/Image on a per-camera namespace (/cam0/image_raw, /xiao_143/image_raw, …).
  • Control plane: HTTP (/set on RPi today; open work for ESP32-S3 — see TODOs below).
                 role-(1)                          role-(2)
              "snapshot mode"                  "viewfinder mode"
        ┌────────────────────────┐        ┌─────────────────────────┐
   any  │  highest quality JPG    │        │  smallest-possible JPG  │
   cam  │  on demand, slow rate   │        │  fast as possible,      │
        │  → durable storage      │        │  free-running           │
        │  → SfM / photogrammetry │        │  → ROS 2 image topic    │
        │  → RViz/Foxglove via    │        │  → live monitoring      │
        │     image_publisher of  │        │     (rqt_image_view)    │
        │     a *file* path       │        │                         │
        └────────────────────────┘        └─────────────────────────┘
                  ▲                                     ▲
                  │ MQTT request/response               │ HTTP GET /stream
                  │ (gphoto2-style topics)              │ (multipart MJPG)

The three hardware pipelines, in detail

1. IMX708 on Raspberry Pi (CSI)

[ IMX708 sensor ]──CSI──▶[ Raspberry Pi ]──HTTP MJPG──▶[ ROS 2 host ]
                          libcamera/picamera2           image_publisher_node
                          + cv2.imencode JPG            → /camN/image_raw
                          app.py @ :8000

Edge: ros2_ws/edge/simple_picamera2_streamer/app.py. A single Python process that owns the camera, runs a capture thread at the configured FrameDurationLimits, and serves three endpoints:

EndpointMethodPurpose
/streamGETmultipart/x-mixed-replace MJPG, frame-rate-locked to the capture loop (currently 8 Hz)
/jpgGETOne latest JPG frame (single-shot)
/setGETSet ExposureTime, AnalogueGain, or LensPosition (puts AF into manual when LensPosition is given)

Role today: running role (2) only — see TODO todo-imx708-fb-roles for the mode-switch work.

Client: see Both tmuxp variants below.

2. OV2640 / OV5640 on XIAO ESP32-S3 Sense (DVP)

[ OV2640 / OV5640 ]──DVP──▶[ XIAO ESP32-S3 ]──HTTP MJPG──▶[ ROS 2 host ]
                            esp_camera + httpd            image_publisher_node
                            CameraWebServer_for_*.ino     → /xiao_NNN/image_raw
                            stream @ :81/stream
                            OTA    @ :8080/update
                            telemetry → MQTT broker

Edge: PhotogrammetricWAAM-Edge/photogrammetricWAAM_xiao_eyes_ov2640_ov5640/CameraWebServer_for_esp-arduino_3.0.x/.

Custom Arduino-ESP32 (3.0.x) firmware derived from Espressif's CameraWebServer. Each board is statically configured by a single #define DEVICE_ID 1xx which also drives:

  • Static IP 172.31.1.<DEVICE_ID>
  • MQTT topics esp32s3/<DEVICE_ID>/{log,temp,rssi}
  • MQTT client id esp32s3-<DEVICE_ID>

The MJPG stream is served on port 81 (the canonical Espressif port — not the same as the IMX708 streamer's :8000). OTA flashing lives on :8080/update.

Currently the firmware is hard-configured for role (1)-leaning settings (FRAMESIZE_5MP, set_quality(s, 6), set_aec_value(s, 800), awb=OFF, fb_count=1, CAMERA_GRAB_LATEST) — see TODO todo-esp32s3-fb-roles for the dynamic role-switching work.

Client: see Both tmuxp variants below.

3. DSLR on Raspberry Pi (USB)

[ Canon/Nikon/Sony ]──USB──▶[ Raspberry Pi ]──gphoto2 capture──▶[ shared FS ]
                              mqtt__gphoto2_delegate.py        ─sync─▶ ROS 2 host
                              MQTT request/response                    image_publisher_node
                                                                      → /dslr_NN/image_raw

Edge: PhotogrammetricWAAM-Blender-UI/02__STILLS/_EDGE_CAMERA_DAEMON/.../mqtt__gphoto2_delegate.py.

A Python service that subscribes to {hostname}/gphoto2 (or ALL/gphoto2), shells out to gphoto2 --set-config … --capture-image-and-download …, writes the result into a session-ID'd directory, and publishes a structured {hostname}/gphoto2/response along with a photogrammetry/sync/available notification for the file-sync layer.

Role today: role (1) only. DSLRs do not stream MJPG in this stack. (gphoto2 --capture-movie exists but is intentionally out of scope — the DSLR is the fidelity reference.)

SSH operator view: INBOX/TMUXP_VIEWS/DSLR.tmuxp.yml opens parallel SSH sessions to the DSLR-hosting Pis (id2-rpi4.local, pi3m50.local).

Batch coordination across many DSLRs + Pi cams is the job of batch_request_delegate.py — one MQTT batch request fans out to N services and aggregates N responses into a single batch response.


Edge ↔ ROS 2 contract — the two halves

Edge half (server)

PipelineServerListens onOutput
IMX708simple_picamera2_streamer/app.pyTCP :8000 (HTTP)MJPG /stream, JPG /jpg, control /set
ESP32-S3CameraWebServer_for_esp-arduino_3.0.x.inoTCP :81 (httpd) + :8080 OTAMJPG /stream, MQTT telemetry
DSLRmqtt__gphoto2_delegate.pyMQTT {host}/gphoto2JPG/RAW file + MQTT response

ROS 2 client half

The ROS 2 host runs image_publisher_node (one per camera URL or per file path), which:

  1. Decodes the MJPG / JPG into an OpenCV Mat.
  2. Publishes sensor_msgs/Image on <__ns>/image_raw (and camera_info if provided).
  3. Republishes at the rate set by publish_rate.

Critical empirical finding (see simple_picamera2_streamer/README.md): publish_rate MUST match the edge capture rate exactly, otherwise OpenCV internally buffers MJPG frames and rqt_image_view shows stale frames from seconds in the past. With the IMX708 streamer at 8 Hz, the client must be launched with publish_rate:=8. — not 7.9, not 10.

Two tmuxp launch styles exist for this client side, depending on where you're running from:

Plus a parameterised Python launch file at xiao_sense_esp32s3_eyes.py for the ESP32-S3 fleet, and esp32s3_eth.tmuxp.yml for the wired ESP32-S3 + Lepton thermal mix.

See ros2_ws/launch/image_publisher_client/README.md for the full namespace map and per-host IP allocation.


Open implementation work (tracked, not yet implemented)

These are intentional gaps — documented here so the architecture page is the source of truth, then mirrored in the project todo list.

todo-esp32s3-fb-roles — Runtime role switching on the ESP32-S3 XIAO

The OV2640/OV5640 firmware is currently hard-pinned to one operating point. Add a runtime mode-switch (over MQTT or HTTP) that reconfigures the camera without a reflash:

SettingRole (1) HQ stillsRole (2) low-latency MJPG
config.fb_count1 (max single-frame size in PSRAM)2 (pipeline encoder, hide latency)
config.frame_sizeFRAMESIZE_5MP (2592×1944)FRAMESIZE_HD or _SVGA
set_quality()low number = high quality (≈ 4–6)higher number (≈ 12–20)
config.grab_modeCAMERA_GRAB_WHEN_EMPTYCAMERA_GRAB_LATEST
set_exposure_ctrlmanual, locked AEC valueauto
set_whitebalmanual, locked WBauto OK

Rationale: with PSRAM at a premium, fb_count=1 lets a 5MP JPG actually fit; fb_count=2 hides JPG-encode latency for streaming.

todo-imx708-fb-roles — Mode switching in simple_picamera2_streamer/app.py

Today app.py is built around picam2.create_video_configuration(main={"size": (2304,1296)}, buffer_count=4) — fixed at role (2). Add an endpoint (e.g. GET /mode?role=stills / …?role=stream) that:

  • For role (1): picam2.switch_mode_and_capture_file(...) against a still configuration at full sensor resolution, optionally RAW+JPEG, then revert.
  • For role (2): keep the current free-running 8 Hz video path.

This makes one IMX708 host serve both the SfM batch capture and the live viewfinder without contention.

todo-mqtt-bridge — Unify the control plane

Right now control is heterogeneous:

  • IMX708 is controlled by HTTP GET /set?ExposureTime=….
  • ESP32-S3 has only MQTT telemetry (log/temp/rssi) — control is via the Espressif web UI on :81/.
  • DSLR is fully MQTT (request/response, recipient-based).

Decide whether the RPi streamer and the ESP32-S3 firmware should adopt the same recipient-based MQTT contract as the gphoto2 delegate. If yes, the batch_request_delegate already aggregates across services and would Just Work.


See also