Built a telemetry → KPI → watchlist pipeline for a 24/7 robot fleet, with coverage-aware gating so missing data can’t masquerade as “healthy,” and a static report ops can run on schedule.
Intern (End-to-end Owner) · Q1 2026 · Team of 1
What I Owned
Owned ingestion, raw/silver/gold tables, KPI dictionary, coverage semantics + trust gating, risk ranking/watchlist logic, and static report generation + operator runbook (public-safe representation; internals redacted).
Implemented ingestion workflow, raw snapshot layout, and failure artifact conventions.
Requirement: Must be replayable and safe to operate on a 24/7 fleet; must not rely on one-off exports.
Result: Re-running for the same window reproduces expected artifact presence + stable row counts in downstream tables (validated via a verification checklist).
Designed the tidy-row schema and implemented raw→silver transformation with lineage preserved.
Requirement: Must support downstream KPI computation without bespoke per-metric parsing; must preserve linkage to source day/controller.
Result: All telemetry stat streams conform to one schema, enabling automated KPI derivation without per-metric parsing branches (demonstrated via schema excerpt + synthetic sample rows).
Defined KPI dictionary and implemented silver→gold KPI generation + CSV contracts.
Requirement: KPIs must be interpretable by operations and comparable across controllers and days.
Result: A KPI dictionary + export contract exists, and KPIs regenerate consistently from the same inputs (verified via report artifact + KPI excerpt).
Implemented coverage computation and trust gating applied to KPIs and risk/watchlist logic.
Requirement: Missing telemetry must never be interpreted as zero errors/zero downtime; low-coverage windows must not drive decisions.
Result: A controlled missing-data injection yields “insufficient data / untrusted” states instead of zero-valued KPIs (validated via a missing-data test plan and example outputs).
Implemented risk ranking logic, gating, and watchlist row schema including trigger_reason + recommended_action.
Requirement: Outputs must be actionable (not just dashboards) and resistant to noisy denominators / “cry wolf” failure modes.
Result: Each run emits a ranked watchlist with explicit triggers/actions; rank ordering is stable when inputs are unchanged (verified via watchlist format + report artifact).
| From | To | Type | Notes |
|---|---|---|---|
| Robot controller stats API (redacted) | Raw telemetry snapshots | data | Pulled multiple telemetry stat streams per controller; persisted JSON snapshots partitioned by controller/stat/interval/day. Endpoints/auth/identifiers are [REDACTED]. |
| Raw telemetry snapshots | Silver tidy-row analytics table | data | Flattened nested JSON into a single row schema with preserved lineage (controller label + day + stat_type + metric_key + value + interval + generated_at). |
| Silver tidy-row table | Gold daily KPI + coverage + watchlist tables | data | Computed daily KPIs (throughput, downtime/error shares, intervention rates, finish/status distributions) plus coverage/trust flags; generated ranked watchlist with trigger_reason + recommended_action. |
| Gold tables | Ops-ready outputs (HTML + CSV bundle) | process | Rendered a static HTML report and exported CSVs (KPIs, coverage, watchlist) on a schedule; includes verification checklist/runbook for repeatability. |
Risks & Limitations
Lessons & Next Steps