SAMDAMMission Partners
Research · Technical Paper

Deterministic-First Generation for Protected-Field Integrity in Federal Compliance Documentation

Author
David McCaskill, SAMDAM Mission Partners, LLC
Domain
Federal compliance documentation · zero-tolerance generation
Standards
NIST SP 800-53 Rev 5SP 800-171 Rev 3CNSSI 1253
Evidence
Observational production deployment · 18 months · 10,000 artifacts
Status
Working paper — controlled baseline comparison specified, reserved for future work
Abstract

Large language models hallucinate persistently, which makes them unsuitable for zero-tolerance domains — contexts where a factual error becomes an audit finding, a legal liability, or a mission-impacting defect. Retrieval-augmented generation reduces hallucination but cannot eliminate it, because output formation stays stochastic.

Deterministic-First inverts that flow. Complete responses — including citations, identifiers, and quantitative values — are composed programmatically from authoritative, version-controlled repositories before any optional model invocation. The model is reduced from a primary generator to a constrained refiner, permitted only to improve style under an explicit prohibition on introducing new facts, citations, identifiers, or quantitative values, and gated by a deterministic, fail-closed validation layer. Within its protected-field scope, the pattern eliminates LLM-sourced factual hallucination by construction — not by reducing its probability, but by removing the model’s ability to author a fact at all.

The problem

Why “better models” and RAG don’t close the gap.

In zero-tolerance domains, plausible language without traceable provenance is equivalent to being wrong. The gap these environments demand is forensic auditability — decomposable proof that each factual claim came from an authoritative record. RAG leaves output formation stochastic (retrieval misses, synthesis errors, misattribution, fabrication). Fine-tuning doesn’t guarantee citation correctness. Constrained decoding shapes structure, not truth. Post-hoc verification yields a confidence score, not an audit trail.

Motivating production failure

A user asked a routine question: how often should critical systems be scanned? An intent misclassification routed it to the generative path, and the model produced fluent, authoritative-sounding guidance — “6–12 months” — for a requirement that did not exist in any authoritative source. The failure wasn’t missing context. It was an architecture that let a probabilistic generator author a fact-bearing answer at all.

The approach

Invert the flow: facts precede the model.

Three constraints define the pattern: authoritative source binding (no fact selection inside a model call), programmatic assembly first (the complete template is built before any refinement), and constrained refinement with validation (the model may rewrite for clarity; a deterministic gate rejects any refinement that touches a protected field and falls back to the template).

Conventional · RAG

The model authors the answer

1
Retrieve passagesrelevant context is injected
2
Model synthesizes the responsefacts, citations, numbers all generated
3
Hope the output stays groundedverification is probabilistic, after the fact
Output formation is stochastic — fabrication remains possible.
Deterministic-First

The model never authors a fact

1
Bind facts to authorized sourcesversion-controlled repositories only
2
Assemble the complete templatecitations, IDs, values composed programmatically
3
Optional style-only refinementprotected fields are invariant
4
Validate (fail-closed) & commit auditany protected-field change → discard, fall back
Factual content is a subset of the authoritative repository, by construction.
Findings

Eighteen months in federal production.

An observational deployment supporting NIST SP 800-53 Rev 5, SP 800-171 Rev 3, and CNSSI 1253 control documentation across 10,000 compliance artifacts.

0
accepted outputs with an LLM-sourced protected-field delta, across the deployment.
0
SHA-256 audit-hash verification failures detected across all 10,000 artifacts.
1.0%
of refinement attempts rejected for introducing entities outside the authorized registry (40 of 4,000).
80%
of queries resolved with model invocation suppressed entirely by the pre-refinement gate.
18 mo
continuous observational production window the findings are drawn from.
10k
federal compliance artifacts generated and audited over the study.

These are operational findings within a bounded protected-field scope — not a controlled comparison. A head-to-head evaluation against RAG, guardrailed-RAG, and verifier-loop baselines is specified as a protocol and reserved for future work. Adversarial tests exercised representative injection and validation-evasion cases; no accepted protected-field modifications were observed in the tested suite.

Scope of the guarantee

What the pattern does — and does not — claim.

Five commitments bound the contribution, and should be read with every use of the word “elimination.”

01

Protected fields, not general truth

The guarantee covers a defined protected-field scope — identifiers, citations, control tokens, quantitative values — not the general semantic truth of every sentence.

02

Sources must be governed

The pattern assumes version-controlled authoritative sources. If a source is wrong, the system faithfully reproduces the error. Source quality is a customer governance responsibility.

03

The model is untrusted

The LLM is treated as a component outside the trusted computing base — useful for refinement, never granted factual authority.

04

Accepted outputs, after validation

The guarantee concerns outputs that pass the deterministic validation gate — not arbitrary, unvalidated model output.

05

Selector misbinding is a residual risk

Selecting the wrong authoritative record remains a first-class risk — measured separately, and distinct from hallucination.

06

Accuracy, not confidentiality

The pattern addresses factual accuracy. Data sovereignty is a function of deployment topology (local vs. cloud inference) and is orthogonal to the architectural guarantee.

Method & reproducibility

Honest about evidence, open for replication.

Evidence type

Observational, not controlled

Production telemetry over 18 months — validator-rejection, audit-integrity, and adversarial analyses — not a randomized baseline comparison.

Future work

Controlled protocol specified

A head-to-head protocol against RAG, guardrailed-RAG, and verifier-loop baselines is defined and reserved for a subsequent study.

Replication

Sanitized artifacts provided

Reference algorithms, schema definitions, and a synthetic evaluation-corpus design support independent replication; the production implementation remains proprietary.

Beyond federal compliance

Any domain that needs forensic auditability.

The pattern generalizes wherever a factual error carries externally enforced consequences and outputs must be traceable to an authoritative record.

Healthcare documentation Legal discovery Financial reporting Regulatory compliance Safety-critical engineering
Request the paper

Read the full study, or see it running in Sourcine.

We share the paper — including the validation-gate specification, threat model, and the synthetic evaluation-corpus design — with federal and regulated-industry teams on request.