Skip to content

Changelog Evaluation Metrics

Kiket records evaluation summaries each time an automated changelog draft is produced (or promoted) alongside a human-authored release note. These evaluations are exported to the docs-site data directory via the analytics changelog export automation.

The task emits docs-site/data/changelog_evaluations.json, containing the latest 200 evaluation entries. Each record includes:

  • id, release_id, project_id, organization_id
  • evaluator – currently draft_generation
  • metrics – hash with overlap ratio, word counts, sentence similarity, bullet overlap, token diff sample
  • summary – human-readable comparison string
  • generated_at – timestamp for the evaluation snapshot

Interpreting the Metrics

  • overlap_ratio – vocabulary overlap between the automated draft and the human changelog (0–1).
  • jaccard_similarity – set similarity of unique words, highlighting overall coverage.
  • sentence_similarity – percentage of automated sentences with a close human analogue (>= 0.75 string similarity).
  • bullet_overlap_ratio – proportion of bullet points the human author preserved.
  • token_diff – sample of tokens present in one version but not the other.

These metrics help determine when automated drafts need extra curation versus when they are ready for direct promotion.

Next Steps

  • Feed the exported JSON into internal dashboards for trend tracking (e.g. markdown promotion success rates).
  • Expand the harness with reviewer feedback once pilot teams begin grading drafts inside the admin analytics console.