EventFlow Analytics Proposal
Observable State Machines with Automatic Funnel Detection
Version: Draft 0.1 Date: December 2024
1. Executive Summary
EventFlow machines process events and transition between states. Understanding how these flows behave in production - which paths users take, where they drop off, which guards are effective - requires analytics. This proposal introduces EventFlow Analytics - a natural language approach to observing, measuring, and understanding state machine behavior.
Core Philosophy
Numbers tell the story. Funnels reveal the journey.
Metrics are events. Events flow through queues.
Analytics never blocks business logic.
Key Features
- Zero overhead collection - Metrics are queued events, never blocking business logic
- In-flow analytics declaration - Metrics defined alongside workflows, readable by non-developers
- Automatic funnel detection - System discovers conversion paths from state graph topology
- Multi-level metrics - Event, state, guard, transition, and context measurement
- Near real-time alerts - Queued metrics evaluated within seconds
- CLI analytics -
eventflow analyticsandeventflow funnelcommands - Auto-generated insights - Dead code, bottlenecks, drop-off points
2. Motivation & Problem Statement
2.1 Current Situation
The Event Queue Proposal introduces queue-level metrics:
queue.pending,queue.processing,queue.completedqueue.processing_time,queue.wait_time
But there's no model for:
- Machine behavior analytics - How do instances flow through states?
- Conversion tracking - What percentage reach the success state?
- Guard effectiveness - Which conditions actually branch the flow?
- Dead code detection - Which paths are defined but never taken?
2.2 Real-World Challenges
Challenge 1: Invisible Funnels
──────────────────────────────
E-commerce orders go through: cart → checkout → payment → fulfillment
How many drop at each step? We don't know without manual instrumentation.
Challenge 2: Dead Guards
────────────────────────
Guard "customer is VIP" exists in code but always returns false.
Is it dead code or just missing test data?
Challenge 3: Hidden Bottlenecks
───────────────────────────────
Orders are slow. Is it payment processing? Stock reservation?
Which state has the longest duration?
Challenge 4: Disconnected Metrics
─────────────────────────────────
Business analysts define funnels in external tools.
When developers change the workflow, funnel definitions become stale.2.3 Goals
- Automatic funnel discovery - Infer conversion paths from state transitions
- In-flow metric declaration - Keep metrics with workflows they measure
- Multi-stakeholder readability - Developers, PMs, analysts can all contribute
- Real-time + batch support - Alerts immediately, reports periodically
- Dead code detection - Identify unused paths and always-true/false guards
3. Core Metric Types
EventFlow Analytics tracks metrics at five levels, each revealing different insights about machine behavior.
3.1 Event Metrics
Track event occurrences, rates, and processing latency.
analytics:
track :checkout
count
rate over 1 minute
latency: histogram
track :payment_failed as "Payment Failures"
count
rate over 5 minutes| Metric | Type | Description |
|---|---|---|
count | Counter | Total event occurrences |
rate over <duration> | Gauge | Events per time window |
latency: histogram | Histogram | Time from event arrival to handler completion |
Use cases:
- Monitor traffic patterns: "How many checkouts per minute?"
- Detect anomalies: "Payment failures spiked 3x in last hour"
- Performance monitoring: "p95 checkout latency is 2.3s"
3.2 State Metrics
Measure time spent in states and state entry/exit patterns.
analytics:
measure #awaiting_payment
duration: histogram
entry_count as "Payments Started"
exit_count as "Payments Resolved"
active_count
measure #fulfilled
entry_count as "Orders Completed"| Metric | Type | Description |
|---|---|---|
duration: histogram | Histogram | Time instances spend in state |
entry_count | Counter | How many times state was entered |
exit_count | Counter | How many times state was exited |
active_count | Gauge | Current instances in state |
Use cases:
- Identify bottlenecks: "Orders spend 45s average in #awaiting_payment"
- Monitor capacity: "234 orders currently in #processing"
- Track completions: "8,725 orders fulfilled this week"
3.3 Guard Metrics
Track guard evaluation patterns to detect dead code and understand branching.
analytics:
measure guard "cart is valid"
true_rate
false_rate
evaluation_count
measure guard "fraud detected"
true_rate
alert when true_rate > 5%| Metric | Type | Description |
|---|---|---|
true_rate | Gauge | Percentage of evaluations returning true |
false_rate | Gauge | Percentage of evaluations returning false |
evaluation_count | Counter | Total guard evaluations |
Auto-detected insights:
- Dead guard (always true): Guard "payment gateway available" is 100% true - consider removing
- Dead guard (always false): Guard "customer is VIP" is 0% true - dead code or missing data?
- Effective guard: Guard "cart is valid" is 92% true, 8% false - working as intended
3.4 Transition Metrics
Track state-to-state movement patterns and conversion rates.
analytics:
measure transition #cart -> #checkout
count as "Checkout Started"
conversion_rate from #cart
measure transition #awaiting_payment -> #paid
count as "Successful Payments"
conversion_rate from #awaiting_payment| Metric | Type | Description |
|---|---|---|
count | Counter | Transition occurrences |
conversion_rate from <state> | Gauge | Percentage of source state entries that take this transition |
Use cases:
- Conversion tracking: "78% of #awaiting_payment reach #paid"
- Path analysis: "Most common path is cart → checkout → paid → fulfilled"
- Drop-off detection: "22% exit to #payment_failed at payment stage"
3.5 Context Metrics
Track context variable distributions and cardinality.
analytics:
measure $total
distribution: histogram
buckets: [0, 100, 500, 1000, 5000]
measure $payment_method
distribution: labels
cardinality
measure $items
cardinality as "Unique Products Ordered"| Metric | Type | Description |
|---|---|---|
distribution: histogram | Histogram | Numeric value distribution with buckets |
distribution: labels | Labels | Categorical value distribution |
buckets: [...] | Config | Custom histogram bucket boundaries |
cardinality | Gauge | Count of unique values |
Use cases:
- Order value analysis: "65% of orders are between $100-500"
- Payment method breakdown: "Credit card 45%, PayPal 30%, Apple Pay 25%"
- Product diversity: "Average order contains 2.3 unique products"
3.6 Timing Metrics (Performance Profiling)
Track execution time at the most granular level: individual guards, individual actions, state transitions, and API response times.
analytics:
// Individual guard timing
measure guard "cart is valid"
evaluation_time: histogram
alert when evaluation_time p95 > 10ms
measure guard "payment gateway available"
evaluation_time: histogram
measure guard "fraud check passed"
evaluation_time: histogram
alert when evaluation_time p95 > 500ms
// Individual action timing
measure action "validate cart"
execution_time: histogram
alert when execution_time p95 > 100ms
measure action "reserve stock"
execution_time: histogram
measure action "charge payment"
execution_time: histogram
alert when execution_time p99 > 5 seconds
measure action "send confirmation email"
execution_time: histogram
// State transition overhead
measure transition #cart -> #checkout
transition_time: histogram
measure transition #awaiting_payment -> #paid
transition_time: histogram
// API event end-to-end timing
track :checkout (api)
response_time: histogram // HTTP request → response (total)
processing_time: histogram // Business logic only (excludes HTTP overhead)
track :process_payment (api)
response_time: histogram
processing_time: histogram| Metric | Type | Applies To | Description |
|---|---|---|---|
evaluation_time: histogram | Histogram | Guard | Time to evaluate guard condition |
execution_time: histogram | Histogram | Action | Time to execute a single action |
transition_time: histogram | Histogram | Transition | Overhead of state transition itself |
response_time: histogram | Histogram | API Event | End-to-end HTTP response time |
processing_time: histogram | Histogram | API Event | Business logic execution time |
Use cases:
- Identify slow guards: "Fraud check guard takes 450ms p95 - needs optimization"
- Find slow actions: "Payment charging takes 1.8s p95 - consider async processing"
- Measure transition overhead: "State transitions are 0.1ms - negligible"
- API performance: "Checkout API responds in 120ms p95 - within SLA"
Metric Event Types:
Timing metrics emit the following metric events:
| Metric Event | Payload | When Emitted |
|---|---|---|
:metric.guard_evaluated | { guard, result, duration_ms, machine, instance_id } | Guard evaluation completes |
:metric.action_executed | { action, duration_ms, machine, instance_id } | Action execution completes |
:metric.transition_completed | { from_state, to_state, duration_ms, machine, instance_id } | State transition completes |
:metric.api_event_handled | { event, response_time_ms, processing_time_ms, machine, instance_id } | API event handler completes |
Performance Budget:
Define acceptable timing thresholds for automatic enforcement:
analytics:
performance_budget:
api_response_time p95: < 500ms
guard_evaluation p95: < 50ms
action_execution p95: < 200ms
transition_time p95: < 1ms
on budget_exceeded
notify @ops via slack
message "Performance budget exceeded: {metric} is {value}"Auto-detected insights:
- Slow guard: Guard "fraud check" p95 > 100ms - consider caching or async pre-check
- Slow action: Action "charge payment" p95 > 1s - consider async processing
- Normal overhead: Transition times are sub-millisecond - healthy
4. Automatic Funnel Detection
The core innovation of EventFlow Analytics is automatic funnel discovery. The system analyzes the state machine topology to identify conversion paths without manual configuration.
4.1 Algorithm
Automatic Funnel Detection Algorithm
─────────────────────────────────────────────────────────────────
Step 1: Build State Graph
- Parse all state transitions from machine definition
- Create directed graph: states as nodes, transitions as edges
- Identify entry states (reachable via API events)
Step 2: Identify Terminal States
- Terminal state = node with out-degree 0 (no outgoing transitions)
- Or explicitly marked terminal states
Step 3: Classify Terminals
- SUCCESS patterns: #fulfilled, #completed, #paid, #shipped, #hired, #approved
- FAILURE patterns: #cancelled, #failed, #rejected, #expired, #declined
- NEUTRAL: Other terminals (#archived, #closed)
Step 4: Discover Paths
- For each terminal state T:
- Reverse BFS/DFS from T to entry states
- Record all paths leading to T
- Track the events that trigger each transition
Step 5: Compute Metrics
- For each path step:
- Calculate conversion rate (proceeding to next step)
- Calculate drop-off rate (exiting to other paths)
- Identify the drop-off destinations4.2 Terminal State Classification
Terminal states can be classified in two ways:
Explicit Marking (Recommended)
Developers mark terminal states during Session 4 (Implementation):
// In the scenario, mark terminal states explicitly
#fulfilled (success) // terminal - success outcome
#cancelled (failure) // terminal - failure outcome
#archived (neutral) // terminal - neutral (neither success nor failure)This is the recommended approach because:
- Clear intent - no ambiguity about state purpose
- Part of implementation workflow - developer decides during state derivation
- Self-documenting - terminal classification visible in flow file
Pattern-Based Inference (Fallback)
If not explicitly marked, the system infers from naming patterns:
| Classification | Patterns | Examples |
|---|---|---|
| SUCCESS | fulfilled, completed, paid, shipped, hired, approved, active, done | #fulfilled, #order_completed, #hired |
| FAILURE | cancelled, failed, rejected, expired, declined, denied, abandoned | #payment_failed, #application_rejected |
| NEUTRAL | archived, closed, suspended, inactive | #archived, #account_closed |
When to use pattern inference: Legacy flows or quick prototyping where explicit marking hasn't been added yet.
4.3 Auto-Generated Funnel Output
Given an e-commerce order machine, the system automatically produces:
Auto-Detected Funnel: @order → #fulfilled
═══════════════════════════════════════════════════════════════
#cart (12,450 entered)
│
│ :checkout (91.2% proceed)
│ [8.8% never checkout - cart abandonment]
▼
#checkout (11,356 reached)
│
│ :validate (98.5% proceed)
│ [1.5% validation failed → exit]
▼
#awaiting_payment (11,186 reached)
│
│ :payment_success (78.0% proceed) ───────────► #paid
│ :payment_failed (22.0% exit) ───────────────► #payment_failed
▼
#paid (8,725 reached)
│
│ :ship (100% proceed)
│
▼
#fulfilled (8,725 reached) ✓ SUCCESS
═══════════════════════════════════════════════════════════════
Overall Conversion: 70.1% (8,725 / 12,450)
Primary Drop-off: Payment stage (22% → #payment_failed)
Secondary Drop-off: Cart abandonment (8.8%)4.4 Multiple Funnels Per Machine
A machine may have multiple terminal states, generating multiple funnels:
@order Funnels (Auto-Detected)
─────────────────────────────────────────
Funnel 1: → #fulfilled (SUCCESS)
Conversion: 70.1%
Path: #cart → #checkout → #awaiting_payment → #paid → #fulfilled
Funnel 2: → #payment_failed (FAILURE)
Conversion: 15.4%
Path: #cart → #checkout → #awaiting_payment → #payment_failed
Funnel 3: → #cancelled (FAILURE)
Conversion: 5.6%
Path: #cart → #checkout → #awaiting_payment → #paid → #cancelled
Unclassified exits: 8.9% (cart abandonment - never reached #checkout)4.5 Optional Funnel Hints
While detection is automatic, users can provide hints to label or customize funnels:
analytics:
funnel "Purchase Flow"
success: #fulfilled, #shipped
failure: #cancelled, #payment_failed
label: "E-Commerce Checkout"
funnel "Refund Process"
entry: #fulfilled
success: #refunded
failure: #refund_denied| Option | Description |
|---|---|
success: | States to classify as successful completion |
failure: | States to classify as failure/drop-off |
entry: | Override the funnel entry point (default: initial states) |
label: | Human-readable funnel name |
5. DSL Syntax
5.1 Analytics Block
Analytics are declared in an analytics: block within a machine:
machine: @order
analytics:
// Event tracking
track :checkout as "Checkout Started"
count
rate over 1 minute
latency: histogram
// State measurement
measure #awaiting_payment
duration: histogram
alert when duration p95 > 30 seconds
// Guard tracking
measure guard "cart is valid"
true_rate
false_rate
// Context distribution
measure $total
distribution: histogram
buckets: [0, 100, 500, 1000, 5000]
// Alerts
alert "High Failure Rate"
when :payment_failed rate > 5% over 1 hour
severity: warning
notify: payments-team
// Funnel hints (optional)
funnel "Purchase Flow"
success: #fulfilled
failure: #cancelled, #payment_failed
scenario: order processing
// ... event handlers ...5.2 Inline Tracking
For simpler cases, add tracking directly to event handlers:
on :checkout from @customer (api) track
// 'track' enables default event metrics (count, rate, latency)
? cart is valid
order moves to #awaiting_payment
on :payment_success from @payment track as "Payment Completed"
// Custom label for this event
order moves to #paid measure
// 'measure' enables default state metrics (duration, entry_count)5.3 Alert Syntax
Alerts are defined within the analytics: block, alongside tracking and measurement declarations:
machine: @order
analytics:
// Tracking declarations
track :checkout
count
rate over 1 minute
track :payment_failed
count
// Measurement declarations
measure #awaiting_payment
duration: histogram
// Alert declarations (same block, typically at the end)
alert "High Failure Rate"
when :payment_failed rate > 5% of :checkout over 1 hour
severity: warning
notify: payments-team
alert "Stuck Orders"
when #awaiting_payment duration > 5 minutes for any instance
severity: critical
notify: on-callAlert syntax structure:
alert "<name>"
when <condition>
severity: <level>
notify: <channel>Condition patterns:
// Event rate conditions
when :payment_failed rate > 10 per minute
when :payment_failed rate > 5% of :checkout over 1 hour
// State duration conditions
when #awaiting_payment duration p95 > 30 seconds
when #awaiting_payment duration > 5 minutes for any instance
// Guard conditions
when guard "fraud detected" true_rate > 5%
// Count conditions
when #payment_failed active_count > 100Severity levels:
| Level | Description |
|---|---|
info | Informational, no action required |
warning | Attention needed, not urgent |
critical | Immediate action required |
6. Zero Overhead Architecture
The most critical architectural principle of EventFlow Analytics:
Analytics collection must have near-zero overhead on production workload.
Traditional analytics add latency to every operation. EventFlow Analytics takes a fundamentally different approach: metrics are events that flow through queues.
6.1 The Problem with Synchronous Analytics
Traditional analytics block business logic:
Traditional Approach (BAD):
─────────────────────────────────────────────────────────────────
:checkout event arrives
│
├──► Track event count (5ms)
├──► Write to metrics DB (10ms)
├──► Update histogram (2ms)
│
└──► Continue to business logic...
Total added latency: ~17ms per event ❌This approach:
- Adds latency to every event handler
- Creates coupling between business logic and analytics storage
- Can cause cascading failures if analytics storage is slow/down
6.2 Metric Events Architecture
EventFlow Analytics treats every metric observation as a lightweight event:
EventFlow Approach (GOOD):
─────────────────────────────────────────────────────────────────
:checkout event arrives
│
├──► Emit :metric.event_received (fire-and-forget, ~0.01ms)
│
└──► Continue to business logic immediately
Total added latency: ~0ms ✓
Meanwhile (async):
─────────────────────────────────────────────────────────────────
:metric.event_received ──┐
:metric.state_entered ──┼──► Analytics Queue ──► Analytics Worker ──► Storage
:metric.guard_evaluated──┘ │
(low priority)
(batched writes)6.3 Metric Event Types
Every analytics observation emits an internal metric event:
| Metric Event | Payload | When Emitted |
|---|---|---|
:metric.event_received | { event, machine, instance_id, timestamp } | Event handler starts |
:metric.event_completed | { event, duration_ms, success, error? } | Event handler completes |
:metric.event_emitted | { event, to_machine, to_instance, timestamp } | Event emitted to another machine |
:metric.state_entered | { state, instance_id, timestamp } | State transition (enter) |
:metric.state_exited | { state, instance_id, duration_ms } | State transition (exit) |
:metric.guard_evaluated | { guard, result, instance_id } | Guard condition checked |
:metric.transition | { from, to, event, instance_id } | State transition recorded |
:metric.context_changed | { variable, old_value, new_value } | Context variable modified |
:metric.event_emitted Use Cases:
- Cross-machine communication tracking - Monitor event flow between machines
- Event chain visualization - Trace causal chains across system
- Cascade failure detection - Identify failure propagation patterns
6.4 Analytics Queue
Metric events flow through a dedicated analytics queue:
system: e-commerce
analytics:
queue:
priority: bulk // lowest priority, never starves business events
concurrency: 10 // metrics are independent, high parallelism OK
batch_size: 100 // write 100 metrics per DB operation
flush_interval: 1 second // or flush every second, whichever comes first
buffer_size: 10000 // in-memory ring buffer
overflow: drop_oldest // if buffer full, drop oldest (never block)| Option | Default | Description |
|---|---|---|
priority | bulk | Queue priority (bulk = lowest) |
concurrency | 10 | Parallel metric processors |
batch_size | 100 | Metrics per storage write |
flush_interval | 1 second | Maximum time before flush |
buffer_size | 10000 | In-memory buffer capacity |
overflow | drop_oldest | Behavior when buffer full |
6.5 Collection Modes
Configure how metrics are collected:
analytics:
collection: queued // default - async via queue (production)analytics:
collection: sampled 10% // collect only 10% of metrics (high-traffic)analytics:
collection: disabled // no collection (emergency/testing)analytics:
collection: sync // synchronous (development only!)| Mode | Overhead | Use Case |
|---|---|---|
queued | ~0ms | Production (default) |
sampled N% | ~0ms | Very high traffic, approximate metrics OK |
disabled | 0ms | Emergency, load testing without metrics |
sync | +15-50ms | Local development, debugging |
6.6 Performance Guarantees
| Metric | Guarantee |
|---|---|
| Business event latency impact | < 0.1ms (fire-and-forget emit) |
| Memory overhead per metric | ~100-200 bytes |
| Metric delivery | Best-effort (may drop under extreme load) |
| Metric latency | 1-5 seconds from observation to storage |
| Batch efficiency | 100+ metrics per DB write |
6.7 Graceful Degradation
Analytics never impacts business operations:
Scenario: Analytics storage is down
───────────────────────────────────
1. Metric events accumulate in buffer
2. Buffer reaches capacity (10,000 events)
3. Oldest metrics dropped (ring buffer)
4. Business events continue unaffected ✓
5. When storage recovers, remaining metrics flush
Scenario: Extreme traffic spike
───────────────────────────────
1. Metrics generated faster than processable
2. Buffer fills up
3. System switches to sampling mode automatically
4. Business events continue unaffected ✓
5. Approximate metrics still collected6.8 Architecture Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ BUSINESS EVENT FLOW │
│ │
│ API ──► Validation ──► Business Queue ──► Worker ──► State Change ──► Response
│ │ │
│ (fire-and-forget) │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ANALYTICS PIPELINE │ │
│ │ │ │
│ │ :metric.* ──► Ring Buffer ──► Analytics Queue ──► Analytics Worker │ │
│ │ events (in-memory) (low priority) (batch writes) │ │
│ │ │ │ │
│ │ ┌───────────────────┼────────┐ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ │ │ │
│ │ ┌─────────┐ ┌─────────────┐ │ │ │
│ │ │ Storage │ │ Alert Check │ │ │ │
│ │ └─────────┘ └─────────────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌────────────┐ │ │ │
│ │ │ Notify │ │ │ │
│ │ └────────────┘ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘7. Timing Model
Because metrics flow through queues, "real-time" actually means "near real-time" with a small delay (1-5 seconds).
7.1 Near Real-Time Analytics
Metrics are available within seconds of observation:
| Capability | Description | Latency |
|---|---|---|
| Live counters | Event counts, state entry/exit | 1-5 seconds |
| Rate calculation | Events per minute/hour | Rolling window, ~5s delay |
| Alert evaluation | Threshold breach detection | 1-5 seconds |
| Active gauges | Current instances per state | ~1 second |
Why not truly real-time?
- Metrics are queued (fire-and-forget)
- Buffer flushes every 1 second
- Analytics worker processes batch
- Total pipeline latency: 1-5 seconds
This is acceptable because:
- Business logic is not blocked
- Alerts within seconds are fast enough for most use cases
- True sub-second alerting requires external streaming infrastructure
7.2 Batch Analytics
Processed periodically for comprehensive analysis:
| Frequency | Analysis | Output |
|---|---|---|
| Hourly | Funnel conversion rates | Trend updates |
| Daily | Full funnel analysis, path frequencies | Daily report |
| Weekly | Dead code detection, guard effectiveness | Coverage report |
Use cases:
- Funnel reports with accurate conversion rates
- Historical trend comparison
- Dead code and test coverage analysis
7.3 Alert Evaluation
Alerts are evaluated by the Analytics Worker, not inline:
Metric Events ──► Analytics Queue ──► Analytics Worker
│
┌───────────┴───────────┐
│ │
▼ ▼
Write to Storage Evaluate Alert Rules
│
▼
(if threshold breached)
│
▼
Send NotificationThis means alerts fire 1-5 seconds after the triggering event, not immediately. For most monitoring scenarios, this is acceptable.
7.4 Configuration
analytics:
timing:
flush_interval: 1 second // how often to flush buffer to queue
alerts: near-realtime // evaluated on each batch (1-5s)
dashboards: near-realtime // updated on each batch (1-5s)
funnels: hourly // aggregated hourly
coverage: weekly // full analysis weekly8. Auto-Generated Insights
Beyond raw metrics, EventFlow Analytics automatically generates actionable insights.
8.1 Path Analysis
Path Analysis: @order (Last 7 Days)
═══════════════════════════════════════════════════════════════
Most Common Paths:
──────────────────
1. #cart → #checkout → #paid → #fulfilled (68.2% of orders)
Average duration: 4.2 hours
2. #cart → #checkout → #payment_failed (15.4% of orders)
Average duration: 12 minutes
3. #cart → #checkout → #paid → #cancelled (5.6% of orders)
Average duration: 18 hours
4. #cart → (abandoned) (8.9% of sessions)
Never reached #checkout
Rare Paths (< 1%):
──────────────────
- #cart → #checkout → #paid → #fraud_review → #cancelled (0.3%)
- #cart → #checkout → #paid → #shipped → #returned (0.8%)8.2 Bottleneck Detection
Bottleneck Analysis: @order
═══════════════════════════════════════════════════════════════
State Duration Ranking:
───────────────────────
1. #paid p50: 2.1h p95: 18h ← Longest
2. #awaiting_payment p50: 12s p95: 45s ⚠ Warning threshold
3. #checkout p50: 2s p95: 8s ✓ Healthy
4. #cart p50: 5m p95: 30m ✓ Healthy
Recommendations:
────────────────
- #paid has high p95 (18h) - investigate shipping delays
- #awaiting_payment p95 (45s) exceeds 30s warning threshold8.3 Drop-off Analysis
Drop-off Analysis: @order
═══════════════════════════════════════════════════════════════
Significant Drop-offs (> 5%):
─────────────────────────────
1. #awaiting_payment → #payment_failed (22.0%)
Cause: Payment processing failures
Events: :payment_failed, :card_declined, :insufficient_funds
2. #cart → (abandoned) (8.9%)
Cause: Cart abandonment
Events: (no checkout event received)
3. #paid → #cancelled (5.6%)
Cause: Post-payment cancellations
Events: :cancel_order, :out_of_stock
Trend Comparison (vs Last Week):
────────────────────────────────
- Payment failures: 22.0% (+2.3%) ⚠ Increasing
- Cart abandonment: 8.9% (-0.5%) ✓ Improving
- Cancellations: 5.6% (+0.1%) ─ Stable8.4 Dead Code Detection
Dead Code Analysis: @order
═══════════════════════════════════════════════════════════════
Dead Guards (Always Same Result):
─────────────────────────────────
⚠ "payment gateway available" 100% true (10,234 evaluations)
→ Consider removing - gateway always available
⚠ "customer is VIP" 0% true (5,432 evaluations)
→ Either dead code or missing VIP customers in production
Unused Transitions (0 Occurrences in 30 Days):
──────────────────────────────────────────────
✗ #paid → #fraud_review
→ Defined but never taken - verify fraud detection is working
✗ #fulfilled → #disputed
→ Defined but never taken - may be dead code
Recommendations:
────────────────
1. Review "payment gateway available" guard - likely removable
2. Verify VIP detection logic or add test data
3. Test fraud detection path manually
4. Consider removing #disputed state if unused8.5 Test Coverage Integration
Test Coverage: @order
═══════════════════════════════════════════════════════════════
States:
───────
#cart [✓ tested: entry, exit]
#checkout [✓ tested: entry, exit]
#awaiting_payment [✓ tested: entry, exit]
#paid [✓ tested: entry, exit]
#fulfilled [⚠ tested: entry only]
#payment_failed [✓ tested: entry, exit]
#fraud_review [✗ NOT TESTED]
Guards:
───────
"cart is valid" [✓ tested: both branches]
"fraud detected" [⚠ tested: false branch only]
"gateway available" [✗ NOT TESTED]
Transitions:
────────────
#cart → #checkout [✓ tested]
#awaiting_payment → #paid [✓ tested]
#awaiting_payment → #failed [✓ tested]
#paid → #fraud_review [✗ NOT TESTED]
Coverage Summary:
─────────────────
States: 85% (6/7)
Guards: 67% (4/6 branches)
Transitions: 88% (7/8)
Overall: 80%
Recommendations:
────────────────
- Add test for #fraud_review entry
- Add test for "fraud detected" = true branch
- Add test for #paid → #fraud_review transition9. CLI Commands
9.1 Analytics Dashboard
# Live dashboard
eventflow analytics @order --live
# Period-based analytics
eventflow analytics @order --period 7d
eventflow analytics @order --from 2024-01-01 --to 2024-01-31
# Filter by state or event
eventflow analytics @order --state=#awaiting_payment
eventflow analytics @order --event=:payment_failed
# Cohort filtering
eventflow analytics @order --where "$customer_type = 'premium'"
eventflow analytics @order --where "$total > 1000"Output:
@order Analytics (Last 7 Days)
═══════════════════════════════════════════════════════════════
Events:
┌──────────────────────┬─────────┬──────────┬───────────┬─────────┐
│ Event │ Count │ Rate/min │ p50 lat │ p95 lat │
├──────────────────────┼─────────┼──────────┼───────────┼─────────┤
│ :checkout │ 12,450 │ 1.24 │ 45ms │ 120ms │
│ :payment_success │ 9,711 │ 0.97 │ 890ms │ 2.1s │
│ :payment_failed │ 2,739 │ 0.27 │ 650ms │ 1.8s │
│ :ship │ 9,234 │ 0.92 │ 12ms │ 45ms │
└──────────────────────┴─────────┴──────────┴───────────┴─────────┘
States:
┌─────────────────────┬──────────┬──────────┬───────────┬─────────┐
│ State │ Entries │ Active │ p50 dur │ p95 dur │
├─────────────────────┼──────────┼──────────┼───────────┼─────────┤
│ #awaiting_payment │ 12,450 │ 234 │ 12s │ 45s │
│ #paid │ 9,711 │ 456 │ 2.1h │ 18h │
│ #fulfilled │ 9,234 │ - │ terminal │ - │
│ #payment_failed │ 2,739 │ 89 │ terminal │ - │
└─────────────────────┴──────────┴──────────┴───────────┴─────────┘
Guards:
┌────────────────────────────┬─────────┬──────────┬─────────────┐
│ Guard │ True % │ False % │ Evaluations │
├────────────────────────────┼─────────┼──────────┼─────────────┤
│ "cart is valid" │ 92% │ 8% │ 12,450 │
│ "fraud detected" │ 2% │ 98% │ 9,711 │
│ "payment gateway available"│ 100% │ 0% │ 12,450 │ ⚠
└────────────────────────────┴─────────┴──────────┴─────────────┘9.2 Funnel Analysis
# Auto-detected funnels
eventflow funnel @order
# Specific terminal state
eventflow funnel @order --to=#fulfilled
# Compare periods
eventflow funnel @order --compare --before 2024-01-01 --after 2024-01-01
# Cohort comparison
eventflow funnel @order --where "$customer_type = 'premium'" --compare-to "$customer_type = 'standard'"
# Export funnel data
eventflow funnel @order --format=csv > funnel.csv
eventflow funnel @order --format=json > funnel.jsonOutput:
Funnel: @order → #fulfilled
Period: Last 7 Days
═══════════════════════════════════════════════════════════════
#cart (12,450 entered)
│
│ :checkout (91.2% proceed)
│ [8.8% abandon - never checkout]
▼
#checkout (11,356 reached)
│
│ :validate (98.5% proceed)
│ [1.5% validation failed]
▼
#awaiting_payment (11,186 reached)
│
│ :payment_success (78.0%) ──────► #paid
│ :payment_failed (22.0%) ───────► #payment_failed
▼
#paid (8,725 reached)
│
│ :ship (100% proceed)
│
▼
#fulfilled (8,725 reached) ✓
═══════════════════════════════════════════════════════════════
Overall Conversion: 70.1%
Primary Drop-off: Payment (22% → #payment_failed)
Comparison (vs Previous 7 Days):
Overall: 70.1% (+2.3%) ▲
Payment success: 78.0% (+1.8%) ▲
Cart abandonment: 8.8% (-0.5%) ▼9.3 Coverage Analysis
# Full coverage report
eventflow coverage @order
# Guards only
eventflow coverage @order --guards
# Dead code detection
eventflow coverage @order --dead-code
# Verbose output
eventflow coverage @order --verbose
# Compare with production data
eventflow coverage @order --production-dataOutput:
Coverage Report: @order
═══════════════════════════════════════════════════════════════
Overall Coverage: 80% (16/20 branches)
Untested:
─────────
✗ State #fraud_review entry
✗ Guard "fraud detected" = true
✗ Guard "gateway available" (both branches)
✗ Transition #paid → #fraud_review
Dead Code (Production Data):
────────────────────────────
⚠ Guard "gateway available" always true (100%)
⚠ Guard "customer is VIP" always false (0%)
⚠ Transition #paid → #fraud_review (0 occurrences)
Recommendations:
────────────────
1. Add test: fraud detection true path
2. Review: "gateway available" guard (always true)
3. Review: "customer is VIP" guard (always false)9.4 Alerts
# Active alerts
eventflow alerts
# Alert history
eventflow alerts --history --period=7d
# Filter by severity
eventflow alerts --severity=critical
# Acknowledge alert
eventflow alerts ack <alert-id>
# Silence alert temporarily
eventflow alerts silence <alert-id> --duration=1hOutput:
Active Alerts
═══════════════════════════════════════════════════════════════
⚠ [warning] High Payment Failure Rate
Machine: @order
Condition: :payment_failed rate > 5% over 1 hour
Current: 6.2%
Triggered: 14 minutes ago
ID: alert-abc123
✓ No critical alerts
History (Last 24h):
───────────────────
[resolved] High Payment Failure Rate (2h ago, duration: 45m)
[resolved] Stuck in #awaiting_payment (6h ago, duration: 12m)9.5 Insights
# All auto-generated insights
eventflow insights @order
# Specific insight types
eventflow insights @order --bottlenecks
eventflow insights @order --drop-offs
eventflow insights @order --dead-code
eventflow insights @order --pathsOutput:
$ eventflow insights @order
@order Insights (Last 7 Days)
═══════════════════════════════════════════════════════════════
BOTTLENECKS
───────────
#paid has high p95 duration (18h)
→ Consider: Add shipping automation or parallel processing
#awaiting_payment p95 (45s) exceeds warning threshold
→ Consider: Optimize payment gateway integration
DROP-OFFS
─────────
22% drop at payment stage (#awaiting_payment → #payment_failed)
→ Consider: Add payment retry, alternative payment methods
8.8% cart abandonment (never reach #checkout)
→ Consider: Cart recovery emails, simplify checkout
DEAD CODE
─────────
Guard "customer is VIP" always false (0% true rate)
→ Either dead code or missing VIP customers in production
Guard "payment gateway available" always true (100%)
→ Consider removing - provides no branching value
Transition #paid → #fraud_review never taken (0 occurrences)
→ Verify fraud detection is working correctly
PATH INSIGHTS
─────────────
68% take happy path: #cart → #checkout → #paid → #fulfilled
15% fail at payment: #cart → #checkout → #payment_failed
8.8% abandon cart: #cart → (no further events)
5.6% cancel after payment: #cart → #checkout → #paid → #cancelled
RECOMMENDATIONS
───────────────
1. [High Priority] Investigate payment gateway failures (22% drop-off)
2. [Medium Priority] Implement cart abandonment recovery
3. [Low Priority] Remove or fix "customer is VIP" guard
4. [Low Priority] Add test coverage for fraud_review path
───────────────────────────────────────────────────────────────
Run 'eventflow insights @order --verbose' for detailed analysis9.6 Performance Profiling
# Full performance profile
eventflow perf @order
# Filter by component
eventflow perf @order --guards # Guard evaluation times only
eventflow perf @order --actions # Action execution times only
eventflow perf @order --api # API event response times only
eventflow perf @order --transitions # State transition overhead only
# Time range
eventflow perf @order --range 24h # Last 24 hours (default)
eventflow perf @order --range 7d # Last 7 days
eventflow perf @order --from 2024-12-01 --to 2024-12-08
# Filter slow components
eventflow perf @order --slow # Only show items exceeding thresholds
eventflow perf @order --slow --threshold 100ms
# Export
eventflow perf @order --export json > perf.json
eventflow perf @order --export csv > perf.csvOutput:
$ eventflow perf @order
╔═══════════════════════════════════════════════════════════════════════════════╗
║ Performance Profile: @order ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║ Time Range: Last 24 hours │ Samples: 12,450 ║
╚═══════════════════════════════════════════════════════════════════════════════╝
┌─ API Event Response Times ────────────────────────────────────────────────────┐
│ Event │ p50 │ p95 │ p99 │ Max │ Calls │
├────────────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ :checkout │ 45ms │ 120ms │ 350ms │ 1.2s │ 8,234 │
│ :add_item │ 12ms │ 35ms │ 80ms │ 450ms │ 24,567 │
│ :process_payment │ 890ms │ 2.1s │ 4.5s │ 12s │ 6,890 │
└────────────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘
┌─ Guard Evaluation Times ──────────────────────────────────────────────────────┐
│ Guard │ p50 │ p95 │ Evals │ Status │
├────────────────────────────────┼──────────┼──────────┼───────────┼───────────┤
│ "cart is valid" │ 0.2ms │ 0.8ms │ 12,450 │ ✓ healthy │
│ "payment gateway available" │ 45ms │ 180ms │ 6,890 │ ⚠ SLOW │
│ "stock is available" │ 2ms │ 8ms │ 8,234 │ ✓ healthy │
│ "fraud check passed" │ 120ms │ 450ms │ 6,890 │ ⚠ SLOW │
└────────────────────────────────┴──────────┴──────────┴───────────┴───────────┘
┌─ Action Execution Times ──────────────────────────────────────────────────────┐
│ Action │ p50 │ p95 │ Calls │ Status │
├────────────────────────────────┼──────────┼──────────┼───────────┼───────────┤
│ "validate cart" │ 1ms │ 3ms │ 12,450 │ ✓ healthy │
│ "calculate totals" │ 0.5ms │ 1.2ms │ 12,450 │ ✓ healthy │
│ "reserve stock" │ 15ms │ 45ms │ 8,234 │ ✓ healthy │
│ "charge payment" │ 780ms │ 1.8s │ 6,890 │ ⚠ SLOW │
│ "send confirmation email" │ 120ms │ 350ms │ 5,234 │ ✓ healthy │
└────────────────────────────────┴──────────┴──────────┴───────────┴───────────┘
┌─ State Transition Overhead ───────────────────────────────────────────────────┐
│ Transition │ p50 │ p95 │ Count │
├────────────────────────────────┼──────────┼──────────┼───────────┤
│ #cart → #checkout │ 0.1ms │ 0.3ms │ 8,234 │
│ #checkout → #awaiting_payment │ 0.1ms │ 0.2ms │ 8,100 │
│ #awaiting_payment → #paid │ 0.1ms │ 0.3ms │ 6,320 │
│ #paid → #fulfilled │ 0.1ms │ 0.2ms │ 6,320 │
└────────────────────────────────┴──────────┴──────────┴───────────┘
┌─ Bottleneck Analysis ─────────────────────────────────────────────────────────┐
│ ⚠ Top 3 Performance Bottlenecks (by p95 impact): │
│ │
│ 1. Action "charge payment" - 1.8s p95 │
│ └─ Recommendation: Consider async processing or timeout optimization │
│ │
│ 2. Guard "fraud check passed" - 450ms p95 │
│ └─ Recommendation: Cache results or use async pre-check │
│ │
│ 3. Guard "payment gateway available" - 180ms p95 │
│ └─ Recommendation: Use circuit breaker pattern │
└───────────────────────────────────────────────────────────────────────────────┘
───────────────────────────────────────────────────────────────────────────────
Run 'eventflow perf @order --slow' to see only slow components
Run 'eventflow perf @order --guards' for detailed guard analysisSlow Components View:
$ eventflow perf @order --slow
@order Slow Components (p95 > threshold)
═══════════════════════════════════════════════════════════════
⚠ GUARDS (threshold: 50ms)
"fraud check passed" p95: 450ms (+800% over threshold)
"payment gateway available" p95: 180ms (+260% over threshold)
⚠ ACTIONS (threshold: 200ms)
"charge payment" p95: 1.8s (+800% over threshold)
⚠ API EVENTS (threshold: 500ms)
:process_payment p95: 2.1s (+320% over threshold)
✓ TRANSITIONS
All transitions within threshold (< 1ms)
═══════════════════════════════════════════════════════════════
Total slow components: 4
Recommendation: Focus on "charge payment" action for biggest impact10. Visualization Integration
Analytics integrate with EventFlow's diagram generation to produce annotated visualizations.
10.1 Annotated State Diagrams
eventflow diagram @order --type=state --analyticsProduces a state diagram with:
- State nodes annotated with duration (p50/p95)
- Transition edges annotated with conversion rates
- Color coding: green (healthy), yellow (warning), red (critical)
- Dead paths shown as dashed/gray lines
10.2 Funnel Diagrams
eventflow diagram @order --type=funnelProduces a funnel visualization:
- Horizontal bars for each state
- Bar width proportional to volume
- Drop-off percentages between stages
- Color intensity by conversion rate
10.3 Heat Maps
eventflow diagram @order --type=heatmap --metric=duration
eventflow diagram @order --type=heatmap --metric=volume
eventflow diagram @order --type=heatmap --metric=errorsProduces a state diagram colored by metric intensity.
11. A/B Testing & Experimentation
Note: A/B testing is covered in a separate proposal. See A/B Testing Proposal.
For basic cohort comparison, use segment by with any context variable:
analytics:
track :checkout
segment by $ab_variant
funnel "Purchase Flow"
segment by $ab_variant$ eventflow funnel @order --segment-by=$ab_variantThis provides basic variant comparison via CLI. For advanced experimentation features (statistical significance, experiment lifecycle), see the dedicated proposal.
12. Configuration Location
Analytics are declared inline within the machine file, in an analytics: block at the top level:
machine: @order
analytics:
collection: queued
track :checkout as "Checkout Started"
count
rate over 1 minute
measure #awaiting_payment
duration: histogram
funnel "Purchase Flow"
success: #fulfilled
failure: #cancelled, #payment_failed
scenario: order lifecycle
on :checkout from @customer (api)
? cart is valid
order moves to #awaiting_paymentWhy Inline?
- Single source of truth - Analytics and behavior in one place
- Easy to maintain - Changes are localized
- Best tooling support - IDE navigation, syntax highlighting
- Natural for EventFlow - "Documentation is code" philosophy
Future: Web-Based Analytics Builder
A future web interface could enable non-developers to:
- Visually define funnels by selecting states
- Configure alerts with form-based UI
- Preview metrics before committing to flow files
This would generate valid EventFlow syntax that developers can review and merge.
This is a future consideration, not part of the initial implementation.
13. Complete Example
13.1 E-Commerce Order with Analytics
machine: @order
analytics:
// Zero-overhead collection configuration
collection: queued // async via queue (default)
queue:
priority: bulk // lowest priority
batch_size: 100 // metrics per DB write
flush_interval: 1 second // flush buffer every second
buffer_size: 10000 // in-memory ring buffer
// Event tracking
track :checkout as "Checkout Started"
count
rate over 1 minute
latency: histogram
track :add_item as "Items Added"
count
track :payment_success as "Payments Succeeded"
count
track :payment_failed as "Payments Failed"
count
alert when rate > 10% of :checkout over 1 hour
// State measurement
measure #awaiting_payment
duration: histogram
active_count
alert when duration p95 > 30 seconds
measure #paid
duration: histogram
entry_count as "Paid Orders"
measure #fulfilled
entry_count as "Fulfilled Orders"
// Guard tracking
measure guard "cart is valid"
true_rate
false_rate
measure guard "fraud detected"
true_rate
alert when true_rate > 5%
// Context distribution
measure $total
distribution: histogram
buckets: [0, 100, 500, 1000, 5000]
measure $payment_method
distribution: labels
// Alerts
alert "High Payment Failure Rate"
when :payment_failed rate > 5% of :checkout over 1 hour
severity: warning
notify: payments-team
alert "Stuck in Payment"
when #awaiting_payment duration > 5 minutes for any instance
severity: critical
notify: on-call
alert "High Fraud Rate"
when guard "fraud detected" true_rate > 5%
severity: critical
notify: security-team
// Funnel hints
funnel "Purchase Flow"
success: #fulfilled, #shipped
failure: #cancelled, #payment_failed
label: "E-Commerce Checkout"
scenario: order lifecycle
given:
@customer is logged in
cart has items
on :add_item from @customer (api) track
$items adds $product
$total increases by $product.price
on :checkout from @customer (api) track
? cart is valid
emit :payment_request to @payment
order moves to #awaiting_payment measure
? cart is empty
emit :error to @customer
on :payment_success from @payment track
? fraud detected
order moves to #fraud_review
emit :fraud_alert to @security
?
order moves to #paid measure
on :payment_failed from @payment track
order moves to #payment_failed
emit :payment_error to @customer
on :stock_reserved from @inventory
order moves to #fulfilled measure as "Order Complete"
emit :order_confirmed to @customer13.2 Job Application with Analytics
machine: @application
analytics:
// Stage tracking
track :submit as "Applications Submitted"
count
rate over 1 day
track :screen as "Screenings Performed"
count
track :schedule_interview as "Interviews Scheduled"
count
// Stage duration
measure #pending
duration: histogram
alert when duration p95 > 7 days
measure #interview_scheduled
duration: histogram
measure #offer_extended
duration: histogram
alert when duration > 7 days for any instance
// Outcome tracking
measure #hired
entry_count as "Candidates Hired"
measure #rejected
entry_count as "Candidates Rejected"
// Guard effectiveness
measure guard "qualifications match"
true_rate as "Screening Pass Rate"
false_rate
// Funnel
funnel "Hiring Pipeline"
success: #hired
failure: #rejected, #offer_declined, #expired
label: "Recruitment Funnel"
scenario: hiring process
// ... event handlers ...14. Sample Dashboard
┌─────────────────────────────────────────────────────────────────────────────┐
│ @order Analytics Dashboard │
│ Period: 2024-12-01 to 2024-12-08 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ OPERATIONAL │
│ ─────────── │
│ │
│ Throughput: 1,234 orders/hour ▲ 12% vs last week │
│ Latency p50: 890ms ▼ 5% improvement │
│ Latency p95: 2.1s ─ stable │
│ Error Rate: 2.3% ▼ 0.8% improvement │
│ │
│ Active by State: │
│ #awaiting_payment ████████████████ 234 │
│ #paid ██████████████████████████ 456 │
│ #shipping ████████ 123 │
│ │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ FUNNEL: Purchase Flow │
│ ───────────────────── │
│ │
│ #cart ██████████████████████████████████████ 12,450 (100%) │
│ │ 91.2% │
│ #checkout █████████████████████████████████████ 11,356 (91%) │
│ │ 98.5% │
│ #awaiting_payment ████████████████████████████████████ 11,186 (90%) │
│ │ 78.0% │
│ #paid ████████████████████████████ 8,725 (70%) │
│ │ 100% │
│ #fulfilled ████████████████████████████ 8,725 (70%) │
│ │
│ Conversion: 70.1% Drop-off: Payment 22% │
│ │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ TESTING & QUALITY │
│ ───────────────── │
│ │
│ Coverage: 80% │
│ │
│ Guard Effectiveness: │
│ "cart is valid" true 92% │ false 8% ✓ │
│ "fraud detected" true 2% │ false 98% ✓ │
│ "gateway available" true 100%│ false 0% ⚠ always true │
│ │
│ Dead Code: 2 candidates │
│ - Guard "customer is VIP" (always false) │
│ - Transition #paid → #fraud_review (0 occurrences) │
│ │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ALERTS │
│ ────── │
│ │
│ ⚠ [warning] High Payment Failure Rate (6.2% > 5%) │
│ 14 minutes ago │
│ │
└─────────────────────────────────────────────────────────────────────────────┘15. Free vs Pro Features
EventFlow Analytics follows a freemium model: CLI tools are free, while advanced web features require Pro.
15.1 Feature Matrix
| Feature | Free (CLI) | Pro (Web Dashboard) |
|---|---|---|
| Analytics Dashboard | Text-based output | Rich interactive charts |
| Funnel Analysis | ASCII funnel diagram | Visual funnel with drag-drop |
| Coverage Analysis | Text coverage report | Interactive treemap |
| Alerts | CLI notifications | Web UI + Slack/Teams/PagerDuty |
| Insights | Text recommendations | AI-powered suggestions |
| Historical Data | 7 days | Unlimited retention |
| Export | JSON, CSV | PDF reports, API access |
| Real-time Updates | Manual refresh | Live streaming |
| Team Sharing | File-based sharing | Collaborative dashboards |
15.2 Visualization Tiers
| Diagram Type | Free | Pro |
|---|---|---|
--type=state | Basic state diagram | Basic state diagram |
--type=state --analytics | - | Annotated with metrics |
--type=funnel | - | Visual funnel diagram |
--type=heatmap | - | State heatmap by metric |
--type=flow | - | Animated event flow |
15.3 CLI Examples (Free)
All analytics functionality is available via CLI:
# Free: Text dashboard
$ eventflow analytics @order --period=7d
# Free: ASCII funnel
$ eventflow funnel @order
# Free: Text coverage report
$ eventflow coverage @order
# Free: Text insights
$ eventflow insights @order
# Free: JSON export
$ eventflow analytics @order --format=json > analytics.json15.4 Pro Web Dashboard
The Pro tier provides a web-based dashboard with:
- Real-time streaming - Metrics update live without refresh
- Interactive charts - Zoom, pan, drill-down into data
- Collaborative features - Share dashboards, annotations, comments
- Advanced integrations - Slack alerts, PagerDuty, custom webhooks
- AI-powered insights - Pattern detection, anomaly prediction
- Unlimited history - No 7-day retention limit
- PDF reports - Scheduled and on-demand reporting
- API access - Programmatic access to all analytics data
15.5 Recommendation
Start with Free CLI - It provides full functionality for most teams:
- All analytics data is accessible
- Automation-friendly (JSON/CSV export)
- No vendor lock-in
Upgrade to Pro when you need:
- Non-technical stakeholder access (visual dashboards)
- Real-time monitoring screens
- Third-party integrations (Slack, PagerDuty)
- Compliance requirements (unlimited history, audit logs)
16. Keywords Reference
| Keyword | Context | Description |
|---|---|---|
analytics: | machine | Analytics configuration block |
track | event handler | Enable default event metrics inline |
track :event | analytics | Track specific event |
as "label" | track/measure | Human-readable label |
count | track | Counter metric |
rate over <duration> | track | Events per time window |
latency: histogram | track | Processing latency histogram |
measure | state/transition | Enable metrics inline |
measure #state | analytics | Measure specific state |
measure guard "..." | analytics | Measure guard effectiveness |
measure transition | analytics | Measure state transition |
measure $var | analytics | Measure context variable |
duration: histogram | state | Time in state histogram |
entry_count | state | State entry counter |
exit_count | state | State exit counter |
active_count | state | Current instances gauge |
true_rate | guard | Percentage returning true |
false_rate | guard | Percentage returning false |
evaluation_count | guard | Total evaluations |
evaluation_time: histogram | guard | Guard evaluation duration |
execution_time: histogram | action | Action execution duration |
transition_time: histogram | transition | State transition overhead |
response_time: histogram | API event | End-to-end HTTP response time |
processing_time: histogram | API event | Business logic execution time |
performance_budget: | analytics | Performance threshold definitions |
measure action "..." | analytics | Measure action execution time |
conversion_rate from | transition | Conversion percentage |
distribution: histogram | context | Numeric distribution |
distribution: labels | context | Categorical distribution |
buckets: [...] | histogram | Custom bucket boundaries |
cardinality | context | Unique value count |
alert | analytics | Alert definition |
when | alert | Alert condition |
severity: | alert | info/warning/critical |
notify: | alert | Notification channel |
funnel | analytics | Funnel configuration |
success: | funnel | Success terminal states |
failure: | funnel | Failure terminal states |
entry: | funnel | Override entry point |
label: | funnel | Human-readable name |
timing: | analytics | Timing configuration |
collection: | analytics | Collection mode (queued/sampled/disabled/sync) |
queue: | analytics | Analytics queue configuration |
priority: | queue | Queue priority (bulk = lowest) |
batch_size: | queue | Metrics per storage write |
flush_interval: | queue | Time between buffer flushes |
buffer_size: | queue | In-memory buffer capacity |
overflow: | queue | Behavior when buffer full (drop_oldest) |
17. Implementation Notes
17.1 Metric Event Emission (Zero Overhead)
The core implementation principle: metric observation emits a fire-and-forget event.
class MetricEmitter
{
private RingBuffer $buffer;
public function emit(MetricEvent $event): void
{
// Fire-and-forget: ~0.01ms, never blocks
if (!$this->buffer->isFull()) {
$this->buffer->push($event);
}
// If buffer full, metric is silently dropped (never block business logic)
}
}Key guarantees:
emit()completes in < 0.1ms- Never waits for I/O
- Never throws exceptions
- Gracefully handles buffer overflow
17.2 Ring Buffer
In-memory circular buffer for metric events:
class RingBuffer
{
private array $buffer;
private int $head = 0;
private int $tail = 0;
private int $size;
public function __construct(int $size = 10000)
{
$this->size = $size;
$this->buffer = array_fill(0, $size, null);
}
public function push(MetricEvent $event): bool
{
if ($this->isFull()) {
return false; // Drop oldest, never block
}
$this->buffer[$this->tail] = $event;
$this->tail = ($this->tail + 1) % $this->size;
return true;
}
public function flush(): array
{
$events = [];
while ($this->head !== $this->tail) {
$events[] = $this->buffer[$this->head];
$this->head = ($this->head + 1) % $this->size;
}
return $events;
}
}17.3 Analytics Queue Integration
Metric events flow through a dedicated analytics queue:
class AnalyticsQueueFlusher
{
private RingBuffer $buffer;
private Queue $analyticsQueue;
private int $flushInterval = 1000; // 1 second
private int $batchSize = 100;
public function flush(): void
{
$events = $this->buffer->flush();
// Batch events for efficient queue operations
foreach (array_chunk($events, $this->batchSize) as $batch) {
$this->analyticsQueue->push(new MetricBatch($batch), priority: 'bulk');
}
}
}The analytics queue uses:
- Priority: bulk (lowest, never starves business events)
- High concurrency (metrics are independent)
- Batch processing (100+ metrics per job)
17.4 Analytics Worker
Processes metric batches and writes to storage:
class AnalyticsWorker
{
public function process(MetricBatch $batch): void
{
// 1. Bulk insert to storage (efficient)
$this->storage->bulkInsert($batch->events);
// 2. Update in-memory aggregates for alerts
foreach ($batch->events as $event) {
$this->aggregator->update($event);
}
// 3. Evaluate alert rules
$this->alertEvaluator->checkThresholds($this->aggregator);
}
}17.5 Metric Event Schemas
Each metric event type has a defined schema:
// Event tracking
class EventReceivedMetric {
string $event; // :checkout
string $machine; // @order
string $instance_id; // order-abc123
float $timestamp; // 1702000000.123
?string $correlation_id; // for tracing
}
class EventCompletedMetric {
string $event;
float $duration_ms; // 45.2
bool $success;
?string $error;
}
// State tracking
class StateEnteredMetric {
string $state; // #awaiting_payment
string $instance_id;
float $timestamp;
?string $previous_state;
}
class StateExitedMetric {
string $state;
string $instance_id;
float $duration_ms;
string $next_state;
}
// Guard tracking (with timing)
class GuardEvaluatedMetric {
string $guard; // "cart is valid"
bool $result; // true
float $duration_ms; // 0.8 (evaluation time)
string $instance_id;
float $timestamp;
}
// Action tracking (timing)
class ActionExecutedMetric {
string $action; // "validate cart"
float $duration_ms; // 3.2 (execution time)
string $machine; // @order
string $instance_id;
float $timestamp;
}
// Transition tracking (with timing)
class TransitionMetric {
string $from_state; // #cart
string $to_state; // #checkout
float $duration_ms; // 0.1 (transition overhead)
string $event; // :checkout
string $instance_id;
}
// API Event tracking (end-to-end timing)
class ApiEventHandledMetric {
string $event; // :checkout
float $response_time_ms; // 120.5 (total HTTP response time)
float $processing_time_ms; // 45.2 (business logic only)
string $machine; // @order
string $instance_id;
float $timestamp;
}17.6 Storage Backend
| Environment | Backend | Notes |
|---|---|---|
| Development | SQLite / In-memory | Simple, no setup |
| Production | TimescaleDB | Time-series optimized PostgreSQL |
| Production | InfluxDB | Purpose-built time-series DB |
| Production | Prometheus | Pull-based, great for dashboards |
Bulk writes are critical for performance:
-- Instead of 100 individual INSERTs:
INSERT INTO metrics (event, machine, instance_id, timestamp, value)
VALUES
(':checkout', '@order', 'abc', 1702000000.1, 1),
(':checkout', '@order', 'def', 1702000000.2, 1),
-- ... 98 more rows
;17.7 Funnel Computation
Auto-funnel detection runs at startup and on definition changes:
class FunnelDetector
{
public function detect(Machine $machine): array
{
$graph = $this->buildStateGraph($machine);
$terminals = $this->findTerminals($graph);
$classified = $this->classifyTerminals($terminals);
$funnels = [];
foreach ($classified as $terminal => $type) {
$paths = $this->discoverPaths($graph, $terminal);
$funnels[] = new Funnel($terminal, $type, $paths);
}
return $funnels;
}
}17.8 PHP Binding
#[Machine('@order')]
#[Analytics(collection: 'queued', bufferSize: 10000)]
class OrderMachine
{
#[Track(':checkout', label: 'Checkout Started')]
public function onCheckout(Event $event): void
{
// MetricEmitter::emit() called automatically (fire-and-forget)
// Business logic continues immediately
}
#[Measure('#awaiting_payment', duration: true)]
public function enterAwaitingPayment(State $state): void
{
// StateEnteredMetric emitted automatically
}
}17.9 Graceful Degradation Strategies
class AdaptiveCollector
{
private float $lastFlushTime;
private int $droppedCount = 0;
public function emit(MetricEvent $event): void
{
// Strategy 1: Drop if buffer full
if ($this->buffer->isFull()) {
$this->droppedCount++;
return; // Silently drop, never block
}
// Strategy 2: Sample under pressure
if ($this->isUnderPressure() && !$this->shouldSample($event)) {
return; // Skip this metric (sampled)
}
$this->buffer->push($event);
}
private function isUnderPressure(): bool
{
return $this->buffer->fillRatio() > 0.8; // 80% full
}
private function shouldSample(MetricEvent $event): bool
{
// Keep 10% of metrics under pressure
return crc32($event->instance_id) % 10 === 0;
}
}17.10 Timing Metrics Implementation
Timing metrics wrap each measurable component with a stopwatch:
class TimingMetricCollector
{
private MetricEmitter $emitter;
/**
* Measure guard evaluation time
*/
public function measureGuard(string $guard, callable $evaluator): bool
{
$stopwatch = Stopwatch::start();
$result = $evaluator();
$elapsed = $stopwatch->elapsedMs();
$this->emitter->emit(new GuardEvaluatedMetric(
guard: $guard,
result: $result,
duration_ms: $elapsed,
instance_id: $this->context->instanceId(),
timestamp: microtime(true),
));
return $result;
}
/**
* Measure action execution time
*/
public function measureAction(string $action, callable $executor): void
{
$stopwatch = Stopwatch::start();
$executor();
$elapsed = $stopwatch->elapsedMs();
$this->emitter->emit(new ActionExecutedMetric(
action: $action,
duration_ms: $elapsed,
machine: $this->context->machineName(),
instance_id: $this->context->instanceId(),
timestamp: microtime(true),
));
}
/**
* Measure state transition overhead
*/
public function measureTransition(
string $fromState,
string $toState,
string $event,
callable $transitioner
): void {
$stopwatch = Stopwatch::start();
$transitioner();
$elapsed = $stopwatch->elapsedMs();
$this->emitter->emit(new TransitionMetric(
from_state: $fromState,
to_state: $toState,
duration_ms: $elapsed,
event: $event,
instance_id: $this->context->instanceId(),
));
}
/**
* Measure API event end-to-end timing
*/
public function measureApiEvent(
string $event,
float $httpStartTime,
callable $handler
): mixed {
$processingStart = microtime(true);
$result = $handler();
$processingEnd = microtime(true);
$this->emitter->emit(new ApiEventHandledMetric(
event: $event,
response_time_ms: ($processingEnd - $httpStartTime) * 1000,
processing_time_ms: ($processingEnd - $processingStart) * 1000,
machine: $this->context->machineName(),
instance_id: $this->context->instanceId(),
timestamp: $processingEnd,
));
return $result;
}
}Integration with Event Handler:
class EventHandler
{
public function handle(Event $event): void
{
// Measure each guard
foreach ($this->guards as $guard) {
$passed = $this->timing->measureGuard(
$guard->name(),
fn() => $guard->evaluate($this->context)
);
if (!$passed) return;
}
// Measure each action
foreach ($this->actions as $action) {
$this->timing->measureAction(
$action->name(),
fn() => $action->execute($this->context)
);
}
// Measure state transition
if ($this->transition) {
$this->timing->measureTransition(
$this->currentState,
$this->transition->targetState(),
$event->name(),
fn() => $this->state->transitionTo($this->transition->targetState())
);
}
}
}PHP Binding with Timing Attributes:
#[Machine('@order')]
class OrderMachine
{
#[MeasureGuard(name: 'cart is valid')]
public function guardCartIsValid(Context $ctx): bool
{
// Guard logic - timing measured automatically
return $ctx->cart->isValid();
}
#[MeasureAction(name: 'charge payment')]
public function actionChargePayment(Context $ctx): void
{
// Action logic - timing measured automatically
$this->paymentGateway->charge($ctx->amount);
}
#[Track(':checkout', timing: true)]
public function onCheckout(Event $event): void
{
// API event timing measured automatically
// response_time includes HTTP overhead
// processing_time is business logic only
}
}Performance Budget Enforcement:
class PerformanceBudgetChecker
{
private array $budgets = [
'api_response_time' => ['p95' => 500], // 500ms
'guard_evaluation' => ['p95' => 50], // 50ms
'action_execution' => ['p95' => 200], // 200ms
'transition_time' => ['p95' => 1], // 1ms
];
public function check(PercentileAggregator $aggregator): array
{
$violations = [];
foreach ($this->budgets as $metric => $thresholds) {
foreach ($thresholds as $percentile => $limit) {
$value = $aggregator->getPercentile($metric, $percentile);
if ($value > $limit) {
$violations[] = new BudgetViolation(
metric: $metric,
percentile: $percentile,
limit: $limit,
actual: $value,
);
}
}
}
return $violations;
}
}18. Schema Evolution & Migration
When flow definitions change, analytics data needs to remain consistent and queryable.
18.1 The Challenge
When you change a machine definition:
Problem: Event renamed
────────────────────────
Before: :checkout
After: :start_checkout
Historical data has :checkout
New data has :start_checkout
Funnel reports break!Problem: State removed
──────────────────────
Before: #cart → #checkout → #awaiting_payment → #paid → #fulfilled
After: #cart → #payment → #paid → #fulfilled
Funnel path changed, historical data has old state names18.2 Migration Syntax
Declare migrations in the analytics block:
machine: @order
analytics:
migrations:
// Event renames
:checkout -> :start_checkout // alias old → new
:payment_received -> :payment_success
// State renames
#awaiting_payment -> #payment_pending
#processing -> #in_progress
// Removed (mark as deprecated, preserve history)
:legacy_checkout deprecated // historical only
#old_state deprecated18.3 Migration Behavior
| Change Type | Behavior |
|---|---|
| Event rename | Historical data aliased, queries work with either name |
| State rename | Historical state durations remain, funnel paths updated |
| Event removed | Marked deprecated, historical data preserved, new metrics stop |
| State removed | Marked deprecated, not included in new funnels |
| Guard renamed | Guard effectiveness combined under new name |
18.4 CLI Commands
# Validate analytics schema against current flow
$ eventflow analytics:validate @order
@order Analytics Validation
═══════════════════════════════════════════════════════════════
✓ Schema consistent with flow definition
Migrations Applied:
:checkout → :start_checkout (1,234 historical events aliased)
#awaiting_payment → #payment_pending (456 state entries aliased)
Deprecated (Historical Only):
:legacy_checkout (890 events, last seen: 2024-01-15)
#old_state (12 entries, last seen: 2024-01-10)
Warnings:
⚠ Funnel "Purchase Flow" references #awaiting_payment (migrated)
→ Auto-updated to #payment_pending# Preview migration impact
$ eventflow analytics:migrate @order --dry-run
Migration Preview: @order
═══════════════════════════════════════════════════════════════
Changes Detected:
1. State #processing not in flow definition
→ Mark as deprecated? [y/n]
2. Event :old_event tracked but not in flow
→ Mark as deprecated? [y/n]
3. Guard "legacy check" no longer exists
→ Historical data will be preserved
Run 'eventflow analytics:migrate @order' to apply changes.18.5 Funnel Invalidation
When schema changes affect funnels:
Funnel Invalidation Warning
═══════════════════════════════════════════════════════════════
Funnel "Purchase Flow" affected by schema change:
Before: #cart → #checkout → #awaiting_payment → #paid → #fulfilled
After: #cart → #checkout → #payment_pending → #paid → #fulfilled
Historical data:
- 12,450 instances through old path
- Path step #awaiting_payment aliased to #payment_pending
Action Required: None (auto-migrated)18.6 Best Practices
Always add migrations before deploying flow changes
- Historical data remains queryable
- Funnel reports don't break
Use
deprecatedfor removed elements- Preserves audit trail
- Allows historical analysis
Run
analytics:validatein CI/CD- Catch schema drift early
- Ensure migrations are complete
Document migration reasons
flowmigrations: :checkout -> :start_checkout // renamed for clarity in v2.0
19. Event Sourcing vs Metric Tables
This section clarifies the relationship between event sourcing (the foundation of EventFlow) and metric tables (used for analytics).
19.1 The Question
"EventFlow is already event-sourced. Every event is stored. Why do we need separate metric tables?"
19.2 Event Sourcing Data
Event sourcing stores every event that occurs:
Event Store (Raw Events)
─────────────────────────────────────────────────────────────
| id | machine | instance_id | event | timestamp | payload |
|--------|---------|-------------|------------|-------------|------------|
| evt-1 | @order | order-abc | :checkout | 1702000000 | {cart: ..} |
| evt-2 | @order | order-abc | :pay | 1702000045 | {amount:99}|
| evt-3 | @order | order-def | :checkout | 1702000050 | {cart: ..} |
| ... | ... | ... | ... | ... | ... |Pros:
- Complete audit trail
- Can replay to any point in time
- No data loss
Cons for Analytics:
- To answer "How many checkouts today?" → scan ALL events
- To calculate "Average time in #awaiting_payment" → replay state transitions
- Query performance degrades with data size
- Real-time dashboards impractical
19.3 Metric Tables
Metric tables are pre-aggregated derived views:
Metric Tables (Pre-Aggregated)
─────────────────────────────────────────────────────────────
event_counts:
| machine | event | period_start | count |
|---------|------------|--------------|-------|
| @order | :checkout | 2024-12-01 | 1,234 |
| @order | :checkout | 2024-12-02 | 1,456 |
state_durations:
| machine | state | p50_ms | p95_ms | p99_ms |
|---------|--------------------|--------|--------|--------|
| @order | #awaiting_payment | 12000 | 45000 | 120000 |
guard_effectiveness:
| machine | guard | true_count | false_count |
|---------|-------------------|------------|-------------|
| @order | "cart is valid" | 11,234 | 1,016 |Pros:
- O(1) query time for "How many checkouts today?"
- Dashboard-ready aggregates
- Supports real-time alerting
Cons:
- Cannot replay (aggregates only)
- Potential data loss if aggregation missed events
19.4 Hybrid Architecture
EventFlow Analytics uses both:
┌─────────────────────────────────────────────────────────────────────┐
│ EVENT STORE │
│ (Source of Truth) │
│ │
│ Every event stored permanently │
│ Full audit trail │
│ Can rebuild everything from here │
└───────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
│ │
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ REAL-TIME PATH │ │ BATCH PATH │
│ │ │ │
│ Ring Buffer │ │ Nightly Job │
│ ↓ │ │ ↓ │
│ Analytics Queue │ │ Rebuild Aggregates │
│ ↓ │ │ ↓ │
│ Metric Tables │ │ Metric Tables │
│ (incremental) │ │ (full refresh) │
└───────────────────────┘ └───────────────────────┘
│ │
└───────────────┬───────────────┘
│
▼
┌───────────────────────┐
│ ANALYTICS QUERIES │
│ │
│ - Dashboard │
│ - Funnels │
│ - Alerts │
└───────────────────────┘19.5 Why Both?
| Use Case | Event Store | Metric Tables |
|---|---|---|
| Audit trail | ✓ Primary | - |
| Replay/debug | ✓ Primary | - |
| Real-time dashboard | - | ✓ Primary |
| Funnel reports | - | ✓ Primary |
| Alert evaluation | - | ✓ Primary |
| Historical analysis | ✓ Fallback | ✓ Primary |
| Data recovery | ✓ Source | Rebuild from events |
19.6 Implementation
Metric tables are materialized views of the event store:
- Real-time updates: Metric events → Queue → Increment counters
- Batch rebuild: Scheduled job replays event store → Rebuild aggregates
- Consistency: Batch job fixes any real-time drift
// Conceptual: Metric table is derivable from event store
class MetricTableRebuilder
{
public function rebuild(DateRange $range): void
{
$events = $this->eventStore->query($range);
foreach ($events as $event) {
// Same logic as real-time, but from historical data
$this->aggregator->process($event);
}
$this->metricTable->replaceAggregates($this->aggregator->results());
}
}19.7 Rebuild Triggers
Metric table rebuilds can be triggered in several ways:
| Trigger | When | Use Case |
|---|---|---|
| Scheduled | Nightly (configurable) | Regular consistency check |
| On Demand | CLI command | After data recovery, migration |
| Automatic | Drift detected | Real-time vs batch mismatch |
| Schema Change | Migration applied | Event/state renames |
CLI Commands:
# Full rebuild (all time)
$ eventflow analytics:rebuild @order
Rebuilding metric tables for @order...
Processing: 2024-01-01 to 2024-12-08
Events processed: 1,234,567
✓ Rebuild complete (took 3m 42s)
# Partial rebuild (specific range)
$ eventflow analytics:rebuild @order --from=2024-12-01
# Check for drift without rebuilding
$ eventflow analytics:verify @order
Verifying metric consistency for @order...
Checking event counts... ✓ match
Checking state durations... ⚠ drift detected
- #awaiting_payment: real-time 45.2s vs batch 44.8s (0.8% diff)
Checking guard rates... ✓ match
Recommendation: Run 'eventflow analytics:rebuild @order' to fix driftConfiguration:
analytics:
rebuild:
schedule: "0 2 * * *" // 2 AM daily (cron syntax)
drift_threshold: 1% // auto-rebuild if drift > 1%
retention: 90 days // rebuild windowAutomatic Drift Detection:
The analytics worker compares real-time aggregates with batch results. If drift exceeds the threshold:
- Log warning with drift details
- If
drift_thresholdconfigured, trigger automatic rebuild - Alert operations team if drift persists
19.8 Summary
| Aspect | Event Store | Metric Tables |
|---|---|---|
| Role | Source of truth | Derived views |
| Persistence | Permanent | Rebuildable |
| Query speed | O(n) | O(1) |
| Use case | Audit, replay | Dashboard, alerts |
| Relationship | Parent | Child (derived) |
Key insight: Metric tables don't replace event sourcing—they're an optimization layer. If metric tables are lost, they can be rebuilt from the event store.
20. Related Proposals
- Event Queue Proposal - Queue-level metrics (pending, processing, failed) - Analytics extends this with machine behavior metrics
- Test Scenarios Proposal - Test coverage tracking integrates with analytics coverage reports
- Data Validation Proposal - Validation failures can be tracked as analytics events
- Machine Response Proposal - Response metrics (status codes, latency) complement event metrics
21. Summary
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Zero Overhead Architecture │
│ ────────────────────────── │
│ - Metrics are events (fire-and-forget) │
│ - Ring buffer + async queue = ~0ms latency impact │
│ - Batch writes for storage efficiency │
│ - Graceful degradation under load │
│ │
│ Metric Types │
│ ──────────── │
│ - Events: count, rate, latency │
│ - States: duration, entry/exit, active │
│ - Guards: true/false rate, dead detection, evaluation time │
│ - Actions: execution time per action │
│ - Transitions: count, conversion rate, transition time │
│ - Context: distribution, cardinality │
│ - Timing: API response time, processing time │
│ │
│ Automatic Funnel Detection │
│ ────────────────────────── │
│ - Terminal states discovered from graph topology │
│ - SUCCESS/FAILURE classification by naming patterns │
│ - All paths traced, conversion rates computed │
│ - No manual configuration required │
│ │
│ Timing Model │
│ ──────────── │
│ - Near real-time: alerts, dashboards (1-5s delay) │
│ - Batch: funnel reports, coverage, dead code │
│ │
│ CLI Tools │
│ ───────── │
│ - eventflow analytics: metrics dashboard │
│ - eventflow funnel: conversion analysis │
│ - eventflow coverage: test coverage + dead code │
│ - eventflow alerts: alert management │
│ - eventflow insights: auto-generated recommendations │
│ - eventflow perf: performance profiling │
│ │
└─────────────────────────────────────────────────────────────────┘Philosophy
Numbers tell the story. Funnels reveal the journey.
Metrics are events. Events flow through queues.
Analytics never blocks business logic.
EventFlow Analytics follows the natural language philosophy: analytics declarations are as readable as the workflows they measure, enabling collaboration between developers, product managers, and business analysts. The zero-overhead architecture ensures production workloads are never impacted by metric collection.