"Why isn't my data ready?" used to mean 30+ minutes switching between systems with no way to correlate logs and metrics. We built a unified telemetry lakehouse using OpenTelemetry, Iceberg, and correlation keys across three tables, reducing investigation time to ~8 seconds per query while enabling automated anomaly detection. Our three-layer detection stack catches various failure modes including silent drift and triggers automated ticketing and remediation. This session shares the architecture, bridge pattern, operational lessons learned, and design decisions we'd revisit.
聽眾收穫:
Attendees will learn how to unify logs, metrics, and traces in a queryable lakehouse using OpenTelemetry and Apache Iceberg. We'll demonstrate the bridge pattern for writing telemetry data into open table formats, a correlation key model for joining signals across systems, and a practical three-layer approach to anomaly detection that catches issues single methods miss.
We'll also share our AIOps automation loop: from detection to auto-remediation and lessons from cross-functional collaboration between platform engineers and domain experts. All code and runbooks are open-source and reproducible.

A data engineer with expertise in data platforms, pipeline engineering, and site reliability. Focuses on open-source tooling, vendor-neutral architecture, and cross-functional collaboration. Passionate about turning operational experience into practical patterns for the community.