SLA Monitoring¶
Kiket tracks workflow SLAs directly in the runtime so teams can spot issues before they miss a promise. SLA definitions live alongside workflows inside definition repositories and are evaluated continuously by the Risk Scanner job.
Defining SLAs¶
Create workflow_sla_definitions.yml (or embed the block inside a project manifest) to describe the states and guardrails that matter:
- id: ops-critical-in-progress
status: in_progress
issue_type: Incident
priority: high
max_duration_minutes: 120
warning_buffer_minutes: 30
project: platform-core
- Status – workflow state to watch. Optional filters (
issue_type,priority,project) scope the definition. - max_duration_minutes – breach threshold. When an issue stays in the state longer than this value, the SLA is marked
breached. - warning_buffer_minutes – buffer before the breach. When elapsed time exceeds
max_duration - buffer, the SLA enters theimminentstate and notifications fire. - Definitions are tenant-scoped: customers can check them in with the rest of their
.kiketrepo and swap templates at any time. Validation runs during definition sync so invalid assets never activate.
Detection Pipeline¶
RiskScanner::Analyzerruns on a schedule for every organization (and per project when needed).Workflow::SlaEvaluatorloads active definitions, scans the matching issues, and either opens or resolvesworkflow_sla_events.NotificationService.send_sla_eventnotifies the assignee, project lead, and admins with contextual details and links.Extensions::WorkflowExtensionDispatcheremitsworkflow.sla_statusevents so inline code and external webhooks can react.
Each workflow_sla_event records:
| Field | Description |
|---|---|
state |
imminent, breached, or recovered. |
triggered_at |
When the SLA entered the current state. |
duration_minutes |
Minutes spent in the monitored state. |
overdue_minutes |
Minutes over the SLA (only for breaches). |
definition |
Snapshot of the definition (status, limits, filters). |
Surfaces & Alerts¶
- In-app notifications – assignees, project leads, and org admins receive alerts with direct links to the offending issue.
- Command palette – open the palette (
Cmd/Ctrl + K) inside a project and run Review SLA alerts. The action summarises the five most recentimminent/breachedevents and links you to the issues. - CLI –
kiket sla events --org acme --project 42 --state imminentfetches the stream directly from/api/v1/sla_events. Use--format jsonwhen piping into other tooling. - Extensions – call
GET /api/v1/ext/sla/events?project_id=...with an extension API key or subscribe to theworkflow.sla_statuswebhook. SDK helpers expose this ascontext.endpoints.sla_events(project_id)across Python, Node.js, Ruby, Java, and .NET. - Dashboards – SLA metrics join the analytics warehouse so cost dashboards and operations overviews can chart warning trends alongside throughput.
API Reference¶
| Method | Path | Notes |
|---|---|---|
GET |
/api/v1/sla_events |
Organization authenticated endpoint (requires organization param when scripting). Supports project_id, issue_id, state, limit query params. |
GET |
/api/v1/ext/sla/events |
Extension-scoped endpoint. Requires project_id and the read:issues scope on the extension API key. |
| Webhook | workflow.sla_status |
Fired for each state transition. Payload includes state, issue, sla.definition, and metrics (duration/overdue minutes). |
Webhook payload example:
{
"event": "workflow.sla_status",
"state": "imminent",
"issue": {
"id": 42,
"status": "in_progress",
"title": "Escalated onboarding incident",
"project_id": 17
},
"sla": {
"definition_id": 7,
"status": "in_progress",
"max_duration_minutes": 120,
"warning_buffer_minutes": 30
},
"metrics": {
"duration_minutes": 105,
"overdue_minutes": null
}
}
CLI Usage¶
--projectand--issuefilters can be combined.--state recoveredis helpful for auditing recent recoveries without paging every notification.--format human(default) prints a table with columns for duration and overdue minutes.
Extension Examples¶
All SDKs expose the SLA helper:
- Python:
context.endpoints.sla_events(project_id).list(limit=5) - Node.js:
context.endpoints.slaEvents(projectId).list({ state: 'imminent' }) - Ruby:
context[:endpoints].sla_events(project_id).list - Java:
context.getEndpoints().slaEvents(projectId).list(options) - .NET:
await context.Endpoints.SlaEvents(projectId).ListAsync()
Use it to: - Post custom alerts in Slack/Teams. - Gate deployments when breaches exist. - Populate downstream analytics pipelines.
Operational Notes¶
- SLA evaluation reuses
column_changed_attimestamps from boards. If your workflow mutates state outside the standard board orchestrator, be sure to update that field. - Definitions support tenant overrides. When a repo fails validation (duplicate IDs, invalid values) the platform surfaces actionable errors instead of silently falling back.
- Marketplace bundles can include SLA definition files so partners ship ready-to-use guardrails.