CodeTitan

§ 01Article · Engineering

Pre-launch notes: what we know works, and what we don't yet

Most AI code review tools claim more than they have shown; this post says exactly what CodeTitan has shown and what it has not.

#Beta#Transparency#False Positives#Per-Repo Profile#Engineering Honesty

Update · 2026-06-12. This post is a dated snapshot and stays as written. Since publication: the current release is CLI 2.1.8 on engine core 1.1.5, and the false-positive sweep promised below shipped across the 1.1.x engine line — see the changelog. One change of policy: rather than publishing a single false-positive percentage, we point at confirmed findings at pinned commits you can verify yourself.

Most AI code review tools claim more than they have shown; this post says exactly what CodeTitan has shown and what it has not.

CodeTitan is pre-launch. It has shipped code paths, public GitHub artifacts, and repeatable local evidence. It does not have design partners, customer outcomes, or month-long proof that the repo profile compounds in production.

That gap is the product right now. The useful version of CodeTitan is not a polished story about being further along than it is. It is a tool with some real working surfaces, some measured weaknesses, and a narrow beta where the next evidence has to come from real repositories.

What works today

The release path is real. Version 1.0.4 was live on npm, and the GitHub Action mirror has a published v1 release tag: Noa-Lia/codetitan-action v1.

The demo PR path is also real. CodeTitan posted a review on codetitan-sarif-demo pull request #1, and the corresponding GitHub Actions run is public. Those links matter because screenshots are cheap. A PR comment, an Action run, and a release tag are harder to wave away.

The current engine has 266 deterministic rules for JavaScript and TypeScript. It also has 3-pass source-to-sink taint analysis. That means CodeTitan is not only matching local syntax shapes. It can trace data movement from a source toward a dangerous sink across multiple steps.

The narrow security result we can point at is 3/3 true positives on a public set of shell-injection cases from mitre/saf. That is a specific result on a specific set of cases — we ran the tool against a public repo, nothing more. It is not validation by anyone, and we are not treating it like one.

The self-scan result is clean. Running CodeTitan against its own codebase produces no findings we consider wrong. The green canary gate is also passing. Those are useful checks because they catch obvious regressions before we ask anyone else to run the tool, but they are still internal checks.

The phase status is better than vapor. Phase 0, Phase 1, Phase 1.5a, and Phase 1.5b are done. Phase 1.5c is mostly done. That is project status, not customer validation.

The repo-specific work also exists. Phase 4 Tier 1 is implemented and locally verified: learned profile, confidence scoring, and PR Risk Score. In plain English, CodeTitan can keep repo-local context, use confidence scoring to change how findings are ranked, and produce a PR Risk Score instead of only posting pass/fail findings.

That last paragraph needs a guardrail. The learned profile exists. It has local verification. No external repo has used it long enough to prove that the learning matters over time. We can say the mechanism works locally. We cannot yet say it has improved a team's merge quality after a month.

What doesn't yet

There are zero design partners. No real team outside the development environment has used CodeTitan on a production repository. That means there are no customer outcomes, no testimonials, no partner case studies, and no evidence from a month of real PRs.

We have one promising outreach thread with a security-tooling organization, but one conversation is not a pipeline. It does not become validation until there is a real onboarding, a real repo, and results we can point to without stretching the language.

The moat is unproven in the market. CodeTitan's bet is that a per-repo learned profile will become more valuable than a generic scanner as it watches more PRs, dismissed findings, accepted fixes, and repo-specific conventions. The code path for that exists. The market proof does not. No repo has used the profile long enough to show measurable improvement over time.

The false-positive rate is too high on representative framework and ORM repositories. After the then-current sweep, FP rate on Hono, Remix, Drizzle, and Prisma was the active work item — not close to acceptable, and it outranks adding new detection rules. The next post will publish the exact number, the breakdown by framework, and the reduction work.

That number also explains why beta access is paced. A noisy reviewer does not become useful because the comment looks good in GitHub. It becomes useful when it is right often enough that engineers keep reading it. Right now, CodeTitan has strong internal and narrow-case evidence, plus a visible FP problem on the exact kind of ecosystem code it needs to understand better.

Several launch pieces are incomplete. The Phase 2.5 telemetry bridge is not done. Launch artifacts are not done. Report-service hardening is not done. Full OS and runtime coverage is not done. These are not cosmetic gaps. They are part of the difference between a working pre-launch tool and a broad public launch.

There is also a language discipline problem to keep solving. It is easy to say "the profile learns your repo" and let readers infer more than the evidence supports. The accurate version is narrower: the learned profile is implemented and locally verified, but no real team has used it long enough to prove that the learning changes outcomes.

Why we're publishing this

First, transparency is the differentiator. The AI code review category has too much launch copy that sounds finished before the product has survived contact with real repositories. We would rather publish the actual ledger: working artifacts, measured strengths, open gaps, and the failure modes we are fixing now.

Second, the people we want as design partners value engineers who admit what they do not know. A useful beta partner does not need a victory lap. They need to know which parts are solid enough to test, which parts are still risky, and where their feedback will change the product.

That is also how we want CodeTitan itself to behave. A review comment should point to the line, explain the evidence, and avoid pretending that a guess is a fact. The blog should follow the same rule.

The beta is open while we reduce false positives and watch the learned profile on real repositories. If you want to test CodeTitan with that level of honesty attached, sign up free.

About the author
CT

CodeTitan Team

Product Team

The CodeTitan team is dedicated to building the best code quality analysis platform for developers worldwide. We combine expertise in AI, static analysis, and developer tools.

End of article · next step

Give your repo a memory.

CodeTitan learns your repo over time. Ships as a GitHub Action and CLI. Scout tier free forever.