Engine features.
Five concepts you'll meet across the CLI and the Action. One engine, one implementation of the rules, every surface.
§ 01 · Learned profile
Every dismissed finding and applied fix is recorded into a per-repo profile stored in .codetitan/learned-profile.json — inside your repo, traveling with the git history. The mechanism is implemented and locally verified; what it accumulates is specific to your codebase, not the language.
What the profile stores:
- Dismissal patterns — dismiss the same finding three times (
--dismiss) and it is auto-suppressed for this repo from then on. - Code conventions — the idioms and patterns your existing code uses, recorded per repo.
- AI-drift baselines — what your code normally looks like, so AI-generated code that diverges gets flagged.
The profile is private to your repo. Never shared, never mined for training data.
§ 02 · PR Risk Score
A 0–100 composite with a letter grade — Risk: 60 (high / C) — that weighs the severity mix of the findings against the repo's learned profile. The same diff scores differently in a repo with history than in a fresh one; that is the point.
It appears in the console output, in report.json as prRiskScore, and in the PR comment the Action posts. Gate on it with --risk-threshold <number> (exit 1 at or above; default 80), or gate on severities directly with --fail-on.
§ 03 · AI-drift detector
AI-generated code drifts. Copilot and Claude produce code that looks like the surrounding file but subtly diverges — a different logger, a newly-added dependency, aparseInt without a radix where the rest of the file uses radix 10.
The drift detector catches:
- Imports added that your codebase has never imported before
- Type regressions —
anywhere TS-strict would reject - Helper reimplementation — AI rewrites
fs.readFileinline instead of using your existingreadFilehelper - Style drift — arrow functions where the file uses named functions, etc.
These are MEDIUM findings — not blocking by default, but visible so you know what the AI changed.
§ 04 · 3-pass cross-file taint analysis
Taint analysis traces untrusted input (HTTP request, cookie, header) through function calls and variable assignments until it reaches a sink (database query, exec, file write). If the path doesn't pass through a sanitizer, you have a data-flow vulnerability.
The three passes:
- Pass 1 — identify sources and sinks. Scan every file; list entry points (e.g.
req.query,req.body) and dangerous functions (e.g.db.query,child_process.exec). - Pass 2 — build the call graph. Resolve imports across files. Build a directed graph of "this variable flows into that parameter".
- Pass 3 — trace. For each source, walk the graph. If a path reaches a sink without going through a sanitizer, emit a finding with the full trace (source file + line → intermediate hops → sink file + line).
The 3-pass approach is what makes cross-file reachability possible. A regex linter seesdb.query(q) in isolation. CodeTitan sees that q was built with untrusted input three files away.
§ 05 · False-positive suppression
False positives destroy trust, so several mechanisms work against them in layers rather than one magic filter:
- Confidence scoring — every finding carries a 0–100 confidence, shown inline. Filter with
--min-confidence. - Structural guards — rules carry file- and line-level guards (for example, command-injection taint only fires when the file actually imports
child_process). - The learned profile — three dismissals of the same finding and it is suppressed for this repo. Your judgment accumulates instead of repeating.
- Optional AI filter — with your own Anthropic key, an LLM pass re-examines candidate findings before they reach you.
The proof style we prefer: cold audits against real repos at pinned commits, where every surfaced finding can be checked by cloning the repo yourself.