Code Review

Why PR Queues Age Past 48 Hours (And What Actually Fixes It)

By Priya Nair · November 14, 2024 · 8 min read

Abstract visualization of PR queue aging with dark dashboard showing time-based distribution histogram in amber tones

The 48-hour mark is where pull requests go to compound. You open a PR on Monday afternoon, it's still sitting there Wednesday morning, and by the time a reviewer finally picks it up the author has mentally moved on to three other things. Context-switching cost alone — estimated at 20–30 minutes per context restore for non-trivial code — means that a PR that waited 60 hours doesn't just cost 60 hours. It costs 60 hours plus the context tax on every person who has to re-enter the problem.

We've been collecting data across engineering organizations that use Replixa, and the shape of the problem is consistent: PR age skews heavily right. Most PRs do close in under 24 hours — the small, obvious, low-risk ones. The tail is where the real cost lives, and the tail is long.

What the distribution actually looks like

When you plot PR age at an organization with 200–400 engineers, you don't get a normal curve. You get something closer to a log-normal distribution with an ugly right tail. Roughly 60–70% of PRs close within 24 hours. The remaining 30–40% can stretch from 48 hours to multiple weeks. That remaining slice isn't a minor inconvenience — it typically represents the highest-stakes code changes, exactly the ones where you most need timely, high-quality review.

A backend team at a mid-size logistics software company ran a retrospective on their past two quarters of PR activity. Their median PR cycle time was 14 hours — seemingly healthy. But their 90th percentile was 6.2 days. The engineers whose PRs aged into that tail were consistently the ones working on cross-service changes, database migrations, and API contract modifications. Exactly the code that most benefits from experienced review.

The reviewer concentration problem

The most common cause of queue aging isn't that engineers are lazy reviewers. It's that review responsibility is concentrated in a small number of people. At most organizations, somewhere between 15–25% of engineers account for 60–70% of review activity, and within that group there's typically a further concentration: 4–5 senior engineers who are on every CODEOWNERS path that matters.

CODEOWNERS files are useful — they ensure that domain experts are in the loop. But when you combine CODEOWNERS with a team that's grown from 50 to 300 engineers over three years, you often end up with CODEOWNERS paths that were set up for a smaller org and never pruned. A staff engineer who should be focused on architecture decisions ends up required-reviewer on 40+ PRs per week.

We're not saying CODEOWNERS is the problem. We're saying the combination of concentrated review ownership plus undifferentiated review tasks — where the same senior engineer is required to review a trivial variable rename and a database schema change in the same queue — is where the system breaks down. The fix isn't to remove the senior engineer from the important reviews; it's to stop asking them to spend time on the routine ones.

What actually causes a PR to sit

PR aging happens for a few distinct reasons, and conflating them leads to bad fixes:

Reviewer availability: Required reviewers are in meetings, on PTO, or simply overloaded. This is the most visible cause and the one most engineers identify first.
PR size and complexity: Large PRs (500+ lines of diff) get deprioritized. Reviewers look at the diff size, estimate the time investment, and reach for something smaller first. This is rational behavior, not negligence.
Review quality uncertainty: When a PR touches unfamiliar code paths, reviewers spend 30–40 minutes just building context before they can write a single useful comment. If they're not confident they understand the full impact, they may add comments asking for more context — which initiates a back-and-forth cycle that can extend PR age by days.
Automated check noise: If CI status checks are flaky or unreliable, reviewers learn to wait for the "real" failure signal rather than the first red CI run. A PR that fails CI three times before passing has typically waited 6–8 additional hours while the author re-ran workflows.
Comment thread ambiguity: "Consider refactoring this" is not an actionable review comment. Comment threads that are advisory rather than specific create decision loops: the author doesn't know if the comment is blocking, the reviewer doesn't follow up, the thread stays open, the PR stays open.

The compounding cost model

The naive cost model for a delayed PR is: hours waiting × engineer hourly rate. That model understates the real cost by roughly 3–5x.

The real model has to account for:

Merge conflict probability: Every hour a PR is open increases the probability that another PR will be merged to the same files. At an org with 200+ active engineers and a common codebase, the probability of a conflict-generating merge to your target files within 72 hours is non-trivial — often 15–25% for active modules. Resolving a significant merge conflict, re-running CI, and re-requesting review adds another cycle to the PR lifecycle.
Blocked downstream work: Engineering work is sequential. If a feature branch depends on an infrastructure change that's been sitting in review for four days, the downstream feature work is in limbo — the engineer is either context-switching to other things (fragmentation cost) or reworking in isolation and risking future merge conflicts.
Context degradation: After 48 hours, the author's mental model of why they made specific decisions starts to fade. If a reviewer comes back with substantive questions on day three, the author's ability to respond quickly and accurately has degraded. The quality of the subsequent discussion goes down.
Reviewer attrition on comment threads: In organizations where comment threads are the primary review medium, threads opened on day one often have no activity by day three even if the PR is still open. Reviewers forget which open PRs they've commented on, authors stop checking for updates — the review becomes a formality rather than a signal.

Where the fixes actually land

There are genuinely useful structural changes, and there are changes that look like fixes but mostly shift the problem around.

PR size limits: Enforcing PR size limits (e.g., no PR over 400 lines except for mechanical changes) is one of the highest-impact interventions. It compresses review cycles because smaller PRs are triaged faster. The practical friction is that some genuinely large changes are hard to decompose — the team has to build the habit of stacking smaller PRs and using draft PRs for work-in-progress.

Review load balancing: Removing manual reviewer assignment in favor of code ownership-aware load balancing reduces the concentration problem. This doesn't mean removing experts from required-reviewer lists; it means distributing the routine reviews across a broader set of qualified engineers.

Automated pre-triage: If an automated tool handles the first pass — checking for style consistency, obvious patterns, missing error handling — reviewers spend their attention on what the automation can't catch: architectural intent, correctness of complex logic, risk assessment on data migrations. The review latency on the human-meaningful parts of a PR is lower when reviewers aren't also scanning for trailing whitespace and inconsistent naming.

Fixing CI reliability: Flaky tests are a silent PR age multiplier. An organization that invests in test stability often sees PR cycle time drop more than they expect, because a significant fraction of PR aging was caused by engineers waiting out flaky CI runs. This is less glamorous than tooling investments but often has higher ROI on PR throughput.

The signal in the tail

The 48-hour cutoff matters because it correlates with behavioral shifts. PRs that are still open after 48 hours are disproportionately likely to: be abandoned rather than merged (the author moved on or the work became stale), accumulate unresolved comment threads, and require a fresh review pass because the codebase shifted under them.

If you track your own PR age distribution and find a bump or plateau in the 48–96 hour range, that's usually pointing to a specific bottleneck: a required reviewer who's overloaded, a CI job that regularly takes 2+ hours, or a class of PRs (cross-service changes, for example) that your process isn't designed to handle efficiently. The 48-hour mark is a diagnostic signal, not just an inconvenience.

The fix path is different for each root cause, but they share a common thread: reducing the time each PR sits waiting for action that doesn't require human judgment. Automated first-pass review, load-balanced reviewer assignment, PR size discipline, and CI reliability together can move the median down significantly — and moving the median has outsized impact on that right-tail where the real cost compounds.