Style Graph

Style Graph vs. Universal Rulesets: Why Generic Rules Fail Enterprise Codebases

By Tariq Osei · · 10 min read
Comparison diagram showing rigid uniform ruleset grid versus organic adaptive style graph network

There's a moment in most engineering organizations' growth where the linter config becomes a political document. Someone wants to enforce camelCase for interface properties; someone else is working in a subsystem that's been using PascalCase for years and doesn't want to change. The linter rule gets an exception, then another exception, then a separate ESLint config for that directory, then a third config when a new team joins. After three years of growth, the "universal" ruleset has eight override files and everyone has mentally checked out of enforcing it.

This is a predictable outcome, not a failure of the teams involved. Universal rulesets break at scale because the premise — that a single set of rules can accurately describe good style for all parts of a large codebase — is false for any organization that has been growing for more than two years.

What universal rulesets are actually good at

ESLint, Pylint, SonarQube, and similar tools are genuinely valuable. They catch issues that are objectively wrong regardless of codebase context: unused variables, unreachable code, dangerous type coercions, known security anti-patterns with clear remediation. These are universal rules that apply to TypeScript or Python or Go regardless of what your team's local conventions look like.

The problem isn't that these tools exist — it's when they're asked to do more than they're designed for. Enforcing project-specific naming conventions through ESLint rules requires maintaining custom rules and keeping them synchronized with how the convention actually evolves in the code. Enforcing architecture boundaries through a linter means encoding the architecture explicitly in configuration rather than deriving it from the code itself. Both are high-maintenance approaches to problems that could be solved differently.

How large codebases diverge intentionally

A monorepo at a 300-person engineering organization contains deliberate divergence. The payments service may use a different error handling convention from the notifications service because payments was built first, influenced by the team's Python background, while notifications was built later by a team coming from Go. Both conventions are internally consistent and well-documented. Neither is wrong.

A universal ruleset faces a binary choice: enforce one convention everywhere (breaking one team's established patterns), allow both (making the rule useless as an enforcement mechanism), or create a per-directory override (which negates the "universal" property of the ruleset).

Consider a real scenario: a fintech data platform team with about 180 engineers. Their core financial calculation engine uses a very strict pattern for how validation functions are named and composed — a pattern that evolved from regulatory requirements and has been stable for three years. Their newer real-time analytics module uses a different, more functional composition pattern. Both teams have clear reasons for their conventions. A universal naming rule would force an artificial choice between the two patterns, or produce a linter configuration so full of exceptions that it stops providing useful signal.

What the style graph captures that rulesets can't

A codebase style graph doesn't start with rules. It starts with the code as it actually exists and derives patterns from it. This reverses the direction of the problem: instead of writing rules that describe how code should look, the graph observes how code actually looks and uses that as the baseline for deviation detection.

This means the graph naturally handles intentional divergence. If the payments service and the notifications service have been using different error handling patterns for 200 commits each, the graph builds two separate subgraphs — one for each subsystem — and reviews code in each context against its own local conventions. A PR touching the payments service is reviewed against payments conventions. A PR touching notifications is reviewed against notifications conventions. Neither is expected to conform to a single universal standard.

The specific categories where this makes a practical difference:

  • Naming patterns: Conventions like "all exported functions in this module use verb-first naming" or "type aliases in this package follow the XxxT pattern" are invisible to a global ruleset but clearly visible in a style graph built from the module's actual symbols.
  • Abstraction layer patterns: Which code belongs at which layer — service, repository, domain model — is a codebase-specific concern. The graph can detect when new code is introducing patterns at the wrong layer based on how similar patterns are distributed across the existing codebase.
  • Import and dependency conventions: Whether a module imports from an index barrel or directly from implementation files; whether dependencies flow in one direction or cross-cutting — these are detectable from the codebase structure, not from a ruleset.
  • Test structure patterns: How tests are named and structured within a specific module is often highly local. Some teams use BDD-style describe/it blocks; others use function-per-test with explicit naming. The graph captures what a specific module actually does, not what a project-wide testing guide says.

The case for running both

We're not saying universal rulesets should be replaced. We're saying they address different categories of issues, and treating one as a substitute for the other creates gaps.

Universal rulesets are the right tool for: language-level correctness, known security anti-patterns, and cross-codebase hygiene issues (no console.log in production code, no hardcoded credentials, no dangling promises). These are issues where "it's wrong for everyone" is true.

Codebase style graphs are the right tool for: project-specific naming conventions, subsystem-specific patterns, architecture boundary enforcement, and convention consistency within a module. These are issues where "it diverges from how this part of the codebase actually works" is the right framing.

The teams that get the most value from automated review run both. The linter handles the objective layer; the style graph handles the local-convention layer. They're complementary, not competing.

When the style graph signal gets noisy

The style graph approach has a failure mode worth naming: if the codebase has accumulated inconsistent patterns over time — if the code itself is noisy — the graph will faithfully represent that noise. High-noise codebases produce style graph suggestions that are inconsistent and confusing, because the "conventions" the graph derives are themselves inconsistent.

This doesn't mean the tool is broken. It means the codebase needs cleanup before the graph will produce reliable signal. The graph is a mirror — if the reflection is messy, that's information about the codebase's actual state. Some teams find this useful as a diagnostic: the areas of the codebase where style graph confidence is lowest are often the areas with the most accumulated technical debt.

For new codebases or recently refactored modules, the graph needs sufficient history to build reliable patterns — typically 30–50 substantial commits touching the module before pattern confidence reaches a useful threshold. In practice, this means the style graph is most valuable for mature codebases and least valuable for brand-new ones, which is the inverse of where universal rulesets provide the most value (universal rulesets are valuable immediately, regardless of codebase age). The two tools complement each other across the codebase lifecycle, not just across issue categories.