Back to Projects
Case Study

AI-Powered Pipeline Recommender for O11ySources

Transforming complex data pipeline creation from a manual process into an intelligent, one-click experience.

Role

Product Manager

Company

Vunet Systems

Date

July 2024


Project Summary

I led the design and delivery of an AI-driven recommendation engine for O11ySources (integrations) that converts raw telemetry samples into validated, deployable streaming pipeline blueprints. The recommender analyzes historical pipeline definitions and plugin configurations, then proposes ready-to-apply pipeline topologies for heterogeneous telemetry data like logs, metrics and traces from various infrastructure sources like servers, DBs, network devices, middleware, cloud services and more.

The Problem

Customers had to manually author complex pipeline configurations, choosing block topologies, transform sequences, and detailed plugin parameters. This manual process was:

  • Time-Consuming: Required hours of effort and deep product knowledge.
  • Error-Prone: Simple syntax mistakes led to deployment failures.
  • A Bottleneck: Slowed down data onboarding and delayed time-to-value for customers.
  • Expert-Dependent: Often required on-call engineering effort to build and tune.

The Solution

The recommender follows a hybrid architecture combining retrieval, template-driven generation, and lightweight LLM assistance to automatically generate safe and accurate pipeline configurations.

1.

Corpus & Index: Collect and anonymize historical pipeline JSONs to create a knowledge base.

2.

Feature Extraction & Retrieval: Use KNN over embeddings to find the most similar prior configurations for a given data sample.

3.

Template + LLM Generation: A template engine ensures valid output structure, while an LLM helps fill in specific parameter values (like grok patterns).

4.

Validation & Sandboxing: Statically validate the generated config and perform a dry-run with sample data to ensure correctness before deployment.

My Role as Product Manager

Vision & Strategy

Defined product vision, scope, acceptance criteria, and KPIs for the AI Recommender.

User Discovery

Led discovery sessions with SREs, support teams, and pilot customers to build the data corpus.

Prioritization

Owned product prioritization for the entire ML pipeline, from data ingestion to the final UI wizard.

Safety & Validation

Designed a continuous evaluation (Evals) framework, validation rules and rollback flows for safety protocols.

Go-To-Market

Led UAT, adoption measurement, and created launch collateral like in-product onboarding.

Post-Launch Monitoring

Tracked adoption rates and quality metrics from our Evals dashboard to inform future iterations and improvements.

Architecture & User Experience

High-Level Architecture

1. Input & Analysis

Sample Data

Feature Extraction

2. Recommendation Engine

Vector Index (Corpus)

KNN Retrieval

Hybrid Generator

3. Validation & Output

Validator

Sandbox Dry-Run

UI Suggestion

Suggestion UI Mockup

AI Recommended Pipeline

For your Apache Access Log sample.

Confidence: 92%
Proposed Pipeline
apache-logs
GrokParser
GeoIP Enricher
UserAgent Parser
parsed-logs
Transformation Preview

BEFORE

127.0.0.1 - - [10/Oct/2025:13:55:36 +0000] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/5.0 (...)"

AFTER

{
  "response": 200,
  "verb": "GET",
  "geoip": { "country_name": "USA" },
  "user_agent": { "name": "Firefox" }
}

Based on 3 similar historical pipelines.

Impact & Key Metrics

>2 Hours to <5 Mins

Median reduction in time-to-first-pipeline.

~65%

Pilot suggestion acceptance rate without edits.

~90%

Reduction in syntax/logic errors reaching runtime.

Full

Coverage across DB, Network, Middleware & more.

Roadmap & Next Steps

Active Learning

Capture user edits to suggestions to retrain and re-rank proposals over time, creating a self-improving system.

Template Marketplace

Create a library of shareable templates for common telemetry types (e.g., NGINX, MySQL, SNMP).

Performance Tuning

Use AI to analyze pipeline performance data and proactively suggest optimizations, like recommending more efficient plugins or identifying bottlenecks.