Telemetry Pipeline for Deployed Warehouse Robots

Built a telemetry → KPI → watchlist pipeline for a 24/7 robot fleet, with coverage-aware gating so missing data can’t masquerade as “healthy,” and a static report ops can run on schedule.

Intern (End-to-end Owner) · Q1 2026 · Team of 1

telemetryroboticsreliabilityobservabilityETLanalyticsoperations

What I Owned

Owned ingestion, raw/silver/gold tables, KPI dictionary, coverage semantics + trust gating, risk ranking/watchlist logic, and static report generation + operator runbook (public-safe representation; internals redacted).

12–16ControllersDaily + ad hocCadence~7d rollingWindowMonths-scaleBackfillHTML + CSVOutputs

Contributions

Ingest controller telemetry into an auditable raw snapshot layer

Implemented ingestion workflow, raw snapshot layout, and failure artifact conventions.

Requirement: Must be replayable and safe to operate on a 24/7 fleet; must not rely on one-off exports.

Result: Re-running for the same window reproduces expected artifact presence + stable row counts in downstream tables (validated via a verification checklist).

diagramPublic-safe architecture diagram (raw → silver → gold → report/watchlist)specOperator runbook + verification checklist excerptscreenshotStatic HTML report screenshot (KPIs + coverage + watchlist)

Normalize raw JSON into a single tidy-row schema (silver) to eliminate bespoke parsing

Designed the tidy-row schema and implemented raw→silver transformation with lineage preserved.

Requirement: Must support downstream KPI computation without bespoke per-metric parsing; must preserve linkage to source day/controller.

Result: All telemetry stat streams conform to one schema, enabling automated KPI derivation without per-metric parsing branches (demonstrated via schema excerpt + synthetic sample rows).

specSilver/gold table contract (schema excerpt)test_dataSynthetic sample rows (shape + typical fields)repoETL transformation snippet (raw JSON → tidy rows)

Compute interpretable daily reliability KPIs with a stable export contract (gold)

Defined KPI dictionary and implemented silver→gold KPI generation + CSV contracts.

Requirement: KPIs must be interpretable by operations and comparable across controllers and days.

Result: A KPI dictionary + export contract exists, and KPIs regenerate consistently from the same inputs (verified via report artifact + KPI excerpt).

specKPI dictionary excerpt (definitions + interpretation rules)screenshotStatic HTML report screenshot (KPIs + coverage + watchlist)

Add coverage semantics so missing data is treated as unknown (not zero) and gated from ranking

Implemented coverage computation and trust gating applied to KPIs and risk/watchlist logic.

Requirement: Missing telemetry must never be interpreted as zero errors/zero downtime; low-coverage windows must not drive decisions.

Result: A controlled missing-data injection yields “insufficient data / untrusted” states instead of zero-valued KPIs (validated via a missing-data test plan and example outputs).

reportCoverage semantics + trust gating rules (missing ≠ zero)test_dataMissing-data injection test plan (coverage gating validation)screenshotStatic HTML report screenshot (KPIs + coverage + watchlist)

Generate an ops-ready risk ranking and weekly watchlist with trigger + recommended action

Implemented risk ranking logic, gating, and watchlist row schema including trigger_reason + recommended_action.

Requirement: Outputs must be actionable (not just dashboards) and resistant to noisy denominators / “cry wolf” failure modes.

Result: Each run emits a ranked watchlist with explicit triggers/actions; rank ordering is stable when inputs are unchanged (verified via watchlist format + report artifact).

minutesRisk ranking + gating notes (anti-noise principles)reportWatchlist output format (rank + trigger_reason + recommended_action)screenshotStatic HTML report screenshot (KPIs + coverage + watchlist)

Interfaces

From	To	Type	Notes
Robot controller stats API (redacted)	Raw telemetry snapshots	data	Pulled multiple telemetry stat streams per controller; persisted JSON snapshots partitioned by controller/stat/interval/day. Endpoints/auth/identifiers are [REDACTED].
Raw telemetry snapshots	Silver tidy-row analytics table	data	Flattened nested JSON into a single row schema with preserved lineage (controller label + day + stat_type + metric_key + value + interval + generated_at).
Silver tidy-row table	Gold daily KPI + coverage + watchlist tables	data	Computed daily KPIs (throughput, downtime/error shares, intervention rates, finish/status distributions) plus coverage/trust flags; generated ranked watchlist with trigger_reason + recommended_action.
Gold tables	Ops-ready outputs (HTML + CSV bundle)	process	Rendered a static HTML report and exported CSVs (KPIs, coverage, watchlist) on a schedule; includes verification checklist/runbook for repeatability.

Documents

DGMPublic-safe architecture diagram (raw → silver → gold → report/watchlist)

Sanitized excerpt: remove internal URLs, repo names, controller/site IDs, hostnames/ports; replace with [REDACTED].No link

SPECSilver/gold table contract (schema excerpt)

Sanitized excerpt: keep field names generic; redact proprietary metric keys; include synthetic example rows.No link

SPECKPI dictionary excerpt (definitions + interpretation rules)

Sanitized excerpt: abstract sensitive code mappings; keep definitions interpretable for ops.No link

SPECOperator runbook + verification checklist excerpt

Sanitized excerpt: redact commands exposing endpoints/hosts; keep steps high-level; replace secrets with [REDACTED].No link

REPOETL transformation snippet (raw JSON → tidy rows)

Internal repo not public. Provide a sanitized snippet: remove internal imports, endpoints/auth/env var names; replace identifiers with Controller A/B.No link

RPTCoverage semantics + trust gating rules (missing ≠ zero)

Sanitized excerpt: show required-days logic and untrusted states using synthetic examples.No link

RPTWatchlist output format (rank + trigger_reason + recommended_action)

Sanitized artifact: replace identifiers with Controller A/B; remove customer/site names; redact rare triggers if fingerprintable.No link

DATASynthetic sample rows (shape + typical fields)

Use synthetic controller labels and synthetic dates; remove rare categorical values that could fingerprint deployments.No link

DATAMissing-data injection test plan (coverage gating validation)

Synthetic test plan: specify setup + expected untrusted outputs; avoid real dates/identifiers.No link

MINRisk ranking + gating notes (anti-noise principles)

Sanitized notes: express thresholds as ranges if sensitive; emphasize minimum denominators and gating/persistence concepts.No link

SCRStatic HTML report screenshot (KPIs + coverage + watchlist)

Blur controller/site labels, file paths, hostnames, and internal timestamps; redact proprietary metric keys if needed.No link

Risks & Limitations

• Accidental leakage of proprietary identifiers in evidence artifacts → use a redaction checklist + blur pass for screenshots; treat all empty-url items as “not public.”
• Over-claiming outcomes without public proof → anchor each claim to a public-safe artifact (diagram/schema/test plan) and label internal-only evidence explicitly.
• Misleading visuals if missing data appears as zeros → always show explicit gaps + “untrusted/insufficient data” states in screenshots/charts.

Lessons & Next Steps

• Dashboards without an action layer are noise — the watchlist (trigger + recommended action) is the deliverable.
• Coverage semantics are non-negotiable: missing data must be treated as unknown, not as zero.
• Operationalizing analytics requires runbooks + verification, not just code.

Built something worth showing?

Document your project →