Skip to main content
BlockPrincipal content designer2026

Building a risk classification engine

Driving data governance and semantic analysis to improve AI model outputs

Designed and implemented an AI-driven classification system that analyzes internal content standards documentation and assigns priority levels (critical, high, medium, low) to individual sections. This system enables downstream AI writing tools to distinguish between rigid requirements and flexible guidance, improving both compliance and output quality.

At a glance

Enable AI-generated content to adhere to content standards with appropriate levels of strictness, balancing compliance with creative flexibility.

  • Ambiguity in content standards language and structure
  • Lack of labeled training data
  • Pressure to move faster
  • Need to balance AI system needs and human user needs

Create a way to take a disparate set of standards and build in risk assessment that can be readable by both AI and humans.

  • Reinforced regulatory requirements for non-writers
  • Increased generation flexibility by allowing non-critical guidance to be applied more adaptively
  • Established a foundation for AI governance by operationalizing policy into machine-readable logic
Process
1

Defined the priority framework

Established clear criteria for critical, high, medium and low classifications

2

Developed a semantic classification approach

Built system based on natural language processing to interpret meaning and intent in standards

3

Created a training and evaluation set

Generated labeled examples to bootstrap model performance, focusing on regulatory required language

4

Iterated on model collaboration

Tuned classification thresholds to strike the right balance between strict compliance and generative flexibility

5

Designed structured output for integration

Standardized outputs into a format consumable by AI writing systems, enabling dynamic enforcement of content rules during generation

What shipped

Delivered a scalable classification system that transforms static content standards into actionable inputs for AI systems, enabling more reliable and context-aware content generation.


LLM Content Risk Assessment Framework

Use this framework to evaluate guidelines from content standards and style guides, and assign each a risk tier based on its potential impact if violated by an LLM model.

How to apply this framework
1
Identify the guideline
Extract each rule or requirement from the style guide or content standard being assessed.
2
Test against criteria
Match the guideline to the criteria in each tier, starting at critical and moving down.
3
Assign the tier
Assign the highest tier where the guideline meets at least two criteria. Document your reasoning.
4
Set enforcement
Use tier assignments to prioritize model instructions, guardrails, and QA test coverage.
CriticalMust never violateRegulatory, legal, and safety requirements — zero tolerance
  • Violation creates legal liability or regulatory penalty
  • Noncompliance could cause direct harm to users or third parties
  • Required by law, regulation, or binding contract (GDPR, HIPAA, FTC, etc.)
  • Breach could trigger a product recall, ban, or enforcement action
  • No acceptable workaround or contextual exception exists
These guidelines represent absolute floors — requirements that the model must follow in every output, in every context, with no exceptions. They are typically codified in law, enforced by regulators, or tied to binding agreements. Failure here is not a quality issue; it is a compliance and safety failure. These should be embedded as hard guardrails in system prompts, model instructions, and output validation.
A financial content standard requires that all pages include a disclaimer footer states exactly "Spring Bank is FDIC insured."
Guideline in practice
When the model responds to any financial-related query, it must append a financial disclaimer — regardless of how the user frames the request, and regardless of output length or format. Omitting this in a user-facing app creates regulatory exposure and possible monetary liability.
HighShould consistently followBrand trust, credibility, and audience expectations
  • Violation would meaningfully damage brand reputation or user trust
  • Inconsistency here undermines the product's core value proposition
  • Required by an internal style guide with organizational authority
  • Closely tied to audience expectations for tone, voice, or format
  • Exceptions exist but must be deliberately approved, not accidental
These guidelines don't carry legal weight, but violating them consistently erodes trust, creates a disjointed user experience, or signals that the product is low-quality or unreliable. They reflect how an organization has chosen to present itself and what its audience has come to expect. The model should follow these in the vast majority of outputs, with rare, intentional exceptions.
A financial services brand guide specifies that the model must never use speculative or emotionally charged language (e.g., "guaranteed returns," "sure thing") in any investment-related content.
Guideline in practice
When describing fund performance or market outlooks, the model uses measured, factual language — "historically performed well" rather than "consistently beats the market." Violations here wouldn't necessarily trigger a regulator, but they would erode trust with a sophisticated audience and create brand risk.
MediumFollow by defaultQuality, clarity, and content consistency
  • Violation reduces output quality but does not cause user harm
  • Inconsistency creates friction but not a fundamental trust breakdown
  • Guideline reflects best practice or house style preference
  • Reasonable exceptions exist based on context or user request
  • Deviation is noticeable to an informed reviewer, not a casual user
Medium-tier guidelines shape the quality and consistency of outputs without being non-negotiable. These are often editorial preferences, formatting standards, or structural conventions. They matter for professional-grade output and for maintaining a coherent voice across a product — but occasional, context-appropriate deviations are acceptable. These should appear as model instructions and default behaviors, not hard constraints.
A content style guide specifies that responses should use active voice and avoid nominalization (e.g., "we analyzed" not "an analysis was conducted").
Guideline in practice
For standard user-facing responses, the model writes in active voice. In formal report generation or when a user requests a specific passive structure, the model may deviate. The quality difference is real but not brand-threatening.
LowApply when practicalStylistic preferences and contextual suggestions
  • Violation has no measurable impact on trust or user outcome
  • Guideline reflects personal, team, or regional preference
  • Inconsistency is unlikely to be noticed by most users
  • Context-dependent — correct behavior varies by use case
  • Would only be flagged in a detailed editorial review, not production QA
Low-tier guidelines are preferences that improve polish but do not meaningfully affect user trust, product quality, or brand integrity. They're worth knowing and applying when there is no competing consideration — but they should not drive heavy enforcement, prompt real estate, or QA cycles. These are useful for training data curation or fine-tuning feedback but should not block deployment.
A writing guide notes that the Oxford comma is preferred but not required, and that em dashes should use a space on each side in web copy.
Guideline in practice
The model uses Oxford commas by default and formats em dashes with spaces when generating blog-style content. In structured outputs, forms, or tables, punctuation conventions are deprioritized. A reviewer may flag these in an audit, but no user-facing impact occurs either way.