Capsa Studio · for clinical informatics & MLOps

The coding AI, built and proven in the open.

No black box. Every rule is human-readable and versioned — no model fine-tuning. Accuracy is measured against what coders actually billed, and every improvement is proposed, reviewed, and approved by people.

Find out more See how it works

Real precision & recall vs what your coders billed Every code reproducible from its version + evidence

Analyzer correctness for the Vaccines guideline: 96.8% right when it recommends, 96.1% catches what's billed, 96.4% overall, with a confusion matrix showing 390 code-and-modifier matches, 13 over-fired, and 16 missed

The trust problem

You can't bill what you can't defend — or improve what you can't measure.

“Trust our AI” isn't good enough in coding. The logic has to be inspectable, the accuracy provable, and the changes governed.

Black box

Logic you can't audit or explain

A model nobody can open can't answer “why this code?” at appeal — and can't be trusted to bill in your name.

Unmeasured

No proof, no version control

No way to prove accuracy, catch a regression before it ships, or track the logic that picks codes as it changes.

TakeawayCapsa makes the coding glass-box: readable rules, measured accuracy, governed change.

The improvement cycle

A measured loop: run, score, iterate, apply.

Run cohorts against a guideline version, score them against what coders billed, find the recurring misses, and apply approved fixes — every stage instrumented.

Run

Run a locked cohort of real visits against the current guideline version.

Score

Compare Capsa's codes to what coders billed — precision and recall on the codes that matter.

Iterate

Find the recurring miss patterns and propose rule fixes from real disagreements.

Apply

A reviewer approves the change; it lands as a versioned diff.

Capsa Studio home: the improvement cycle and coverage by guideline bundle — **The loop, instrumented.** Coverage by guideline bundle, the next action, and every stage of the cycle in one place.

Build & run

Test on the same visits, every time.

Build cohorts of real, de-identified visits and lock them so results are comparable across versions — then run any cohort against any guideline version and track improvements or regressions.

Cohorts: locked sets of real visits for consistent testing — **Locked cohorts.** Balanced, triage-stress, or analyzer-stress by design — comparable across every version.

Runs: a history of cohort runs against guideline versions with status, cost, and outcomes — **Every run, on the record.** Run a cohort against any version; track improvements and regressions with full history.

Score vs billed

Proof, not vibes.

Scored against the codes your coders actually billed, on two axes — did Capsa triage the visit right, and code it right — with per-CPT and per-rule breakdowns. Then drill from any number to the exact chart words behind a code.

Scorecard: the AI matched the coder on 96% of in-scope codes, with triage and analyzer correctness — **Measured against your coders.** Plain-language metrics — “right when it recommends” and “catches what's billed” — with per-CPT and per-rule detail.

Why each code: the verbatim chart quote and the rule that produced it, expanded for one code — **…to the chart words.** Each code → the rule + the verbatim quote + a defensibility tag. This is how you defend a code at audit.

Improve, governed

The system tells you where it's weak — people approve the fix.

Compare a run to its baseline, review the recurring miss patterns, and apply approved rule edits — versioned like code, with the reviewer's reason on the record.

Iterate: run-vs-baseline precision/recall deltas and the improvement pipeline — **Where it's weak.** Run-vs-baseline deltas and recurring miss patterns, grouped for review.

Review history: applied and rejected changes with tickets, version bumps, commits, and diffs — **People approve every change.** Applied or rejected, with the reason — version bumps, commits, and diffs on the record.

Guideline detail header: a version badge, twelve versions on record, a view-version picker, compare-versions, and the rules, codes, evidence patterns, examples, and version-history tabs — **Versioned, like code.** Twelve versions on record, a version picker, and side-by-side compare — rules, codes, evidence patterns, and examples, all versioned together.

No black box

Glass box, end to end.

Human-readable rules

All logic is explicit rule sets your team can open, read, and change — no opaque model weights, no fine-tuning.

Unsupported codes are dropped

Every quote is checked word-for-word against the chart; if the words aren't there, the code never makes it into the set.

Reproducible by version

Every code is reproducible from the exact guideline version that ran plus its evidence — nothing hidden, nothing un-auditable.

Proven, and it transfers

35% → 94%

a new coding category, from cold start to matured — on the same framework that proved vaccines (96.9% / 95.5% vs billed).

“New categories inherit the machinery instead of starting over.”

Scope-aware measurement Precision and recall are measured against the codes each guideline actually owns — so the numbers mean what they say, not diluted by out-of-scope codes.

Internal, validated results across two live coding categories (vaccines and health screening), measured on cases your team already coded. Not an external certification.

One engine, three apps

Studio builds the guidelines the other two run.

The same Capsa engine powers the coder's cockpit and the control tower that oversees the whole pipeline.

Capsa Workbench: the coder's review cockpit — Capsa Workbench
Where coders finalize

Every code cited to the chart, the provider a click away, the claim finalized in one place.
Explore Workbench →

Capsa Pipeline: monitor the charge-to-claim lifecycle — Capsa Pipeline
Oversee the whole lifecycle

Throughput, billing lags, bottlenecks, and the charges Capsa captures — for revenue-cycle leaders.
Explore Pipeline →

Find out more

See how we prove every code.

Tell us about your team and we'll walk you through the loop on real cohorts — the scorecard, the evidence trail, and the human-approved improvement cycle.

Real precision & recall vs billed claims
Every code traced to verbatim chart text
Versioned guidelines, full audit trail
No fine-tuning, no black box

[email protected] · capsacoding.com

Get in touch

We'll get back to you within one business day.

The coding AI, built and proven in the open.

You can't bill what you can't defend — or improve what you can't measure.

Logic you can't audit or explain

No proof, no version control

A measured loop: run, score, iterate, apply.

Run

Score

Iterate

Apply

Test on the same visits, every time.

Proof, not vibes.

The system tells you where it's weak — people approve the fix.

Glass box, end to end.

Human-readable rules

Unsupported codes are dropped

Reproducible by version

Studio builds the guidelines the other two run.

Where coders finalize

Oversee the whole lifecycle

See how we prove every code.

Get in touch