Skip to content
PRD Sizing: Process & Methodology

PRD Sizing: Process & Methodology

Andi Lamprecht Andi Lamprecht ·· 11 min read· Draft

This page explains how PRD sizing estimates are produced, what the numbers mean, and how to use them for planning.


What Is PRD Sizing?

PRD Sizing is an architecture-driven estimation process that produces calibrated time and cost estimates for product features described in a PRD. It uses the PERT (Program Evaluation and Review Technique) method combined with codebase exploration and historic data from our Jira and GitHub activity.

Unlike story-point-based estimation, PRD Sizing estimates directly in days of active engineering work. Each story gets three estimates, and the math produces a range with a confidence level.


Process Overview

The full estimation process runs through five stages with mandatory human checkpoints at each decision point:

    flowchart TD
    A[PRD Input] --> B[Stage 1: Query Expansion]
    B --> CP1{Checkpoint 1\nConfirm interpretation}
    CP1 -->|Approved| C[Stage 2: Repo Discovery]
    CP1 -->|Adjust| B
    C --> D[Stage 3: Architecture Sketch]
    D --> CP2{Checkpoint 2\nConfirm architecture}
    CP2 -->|Approved| E[Stage 4: Story Breakdown]
    CP2 -->|Explore deeper| D
    E --> CP3{Checkpoint 3\nConfirm stories & PERT}
    CP3 -->|Approved| F[Stage 5: Estimation]
    CP3 -->|Adjust| E
    F --> CP4{Checkpoint 4\nFinal estimate card}
    CP4 -->|Approved| G[Save & Publish]
    CP4 -->|Adjust| E

    style CP1 fill:#f9f,stroke:#333
    style CP2 fill:#f9f,stroke:#333
    style CP3 fill:#f9f,stroke:#333
    style CP4 fill:#f9f,stroke:#333
    style G fill:#9f9,stroke:#333
  

The PERT Method

For each story, three time estimates are provided:

EstimateSymbolMeaning
OptimisticOBest case – everything goes smoothly, no surprises
LikelyLNormal case – typical friction, some back-and-forth
PessimisticPWorst case – blockers, rework, unexpected complexity

From these, two values are computed:

    flowchart LR
    O["O\n(Optimistic)"] --> PERT["Expected = (O + 4L + P) / 6"]
    L["L\n(Likely)"] --> PERT
    P["P\n(Pessimistic)"] --> PERT
    O --> SD["StdDev = (P - O) / 6"]
    P --> SD
    PERT --> R["Expected Duration\nper story"]
    SD --> R2["Uncertainty\nper story"]

    style L fill:#ffd,stroke:#333
    style PERT fill:#ddf,stroke:#333
    style SD fill:#fdd,stroke:#333
  

Expected duration weights the likely estimate most heavily (4x) while incorporating best and worst cases. It produces a slightly pessimistic-leaning average, which is intentional – engineering work tends to take longer than expected.

Standard deviation measures how uncertain the estimate is. A story with O=2, P=4 has low uncertainty (StdDev=0.33). A story with O=2, P=14 has high uncertainty (StdDev=2.0).


How Stories Are Aggregated

Individual story estimates are combined using statistical aggregation:

    flowchart TD
    subgraph Stories
        S1["Story 1\nE=4.2d, σ=0.83"]
        S2["Story 2\nE=5.5d, σ=1.17"]
        S3["Story 3\nE=7.7d, σ=1.67"]
        SN["Story N\n..."]
    end

    S1 --> SUM["Total Expected\n= sum of all E"]
    S2 --> SUM
    S3 --> SUM
    SN --> SUM

    S1 --> VAR["Total StdDev\n= sqrt(sum of σ²)"]
    S2 --> VAR
    S3 --> VAR
    SN --> VAR

    SUM --> CI["95% Confidence Interval"]
    VAR --> CI

    CI --> LOW["Low = Expected - 2 × StdDev"]
    CI --> HIGH["High = Expected + 2 × StdDev"]

    style CI fill:#ddf,stroke:#333
    style LOW fill:#dfd,stroke:#333
    style HIGH fill:#fdd,stroke:#333
  

The 95% confidence interval means: there is a 95% probability the actual duration falls within this range, assuming risks are independent.

Important caveat: The 95% CI assumes risks are independent (one story blowing up doesn’t cause others to blow up). In practice, systemic risks – like an external dependency being undocumented, or a technology choice not working out – can affect multiple stories simultaneously. The “Comparable Epics” section in each estimate provides a reality check against actual historic durations.

Reading the Estimate Card

Here is how to interpret each field in the estimate summary:

FieldWhat It Means
StoriesNumber of discrete work items identified
Expected DurationMost probable total duration in engineering days
Range (95% CI)Statistical bounds – 95% chance the actual falls within this range
WeeksExpected duration divided by 5 working days per week
ConfidenceHow certain the estimate is (see below)
Expected CostExpected days multiplied by the fully-loaded daily rate
Cost Range95% CI range multiplied by the daily rate

Confidence Levels

LevelStdDev / ExpectedInterpretation
High< 0.3Narrow range. Estimate is well-bounded.
Medium0.3 – 0.6Moderate uncertainty. Plan for contingency.
Low> 0.6Wide range. Treat the estimate as directional, not precise.
High confidence does not mean “definitely accurate.” It means the statistical spread is narrow. If the estimate is systematically biased (e.g., every story underestimated by 2x), the confidence level won’t catch that. This is why we include comparable epics and historic accuracy data as a cross-check.

Understanding Cost Estimates

Costs are derived from a fully-loaded daily rate:

    flowchart LR
    SAL["Average Salary\n$200k"] --> FL["Fully Loaded\n× 1.4 overhead"]
    FL --> ANN["$280k/year"]
    ANN --> DR["÷ 230 effective\nworking days"]
    DR --> RATE["$1,200/day"]
    RATE --> COST["× Expected Days\n= Total Cost"]

    style SAL fill:#ddf,stroke:#333
    style RATE fill:#ffd,stroke:#333
    style COST fill:#dfd,stroke:#333
  
ComponentTypical ValueWhat It Includes
Average salaryTeam-specificBase compensation
Overhead multiplier1.3x – 1.5xBenefits, payroll taxes, tools/licenses, office/infra, management overhead
Effective working days~230/year260 weekdays minus holidays, PTO, sick days, company events

What Cost Estimates Do NOT Include

  • Calendar time – 168 engineering days does not mean 168 calendar days. See “Converting Days to Calendar Time” below.
  • Non-engineering costs – product management, design, QA, project management, travel, hardware procurement.
  • Opportunity cost – what else the team could be building instead.
  • External dependencies – waiting on third-party documentation, hardware deliveries, partner APIs.

The Five Stages in Detail

Stage 1: Query Expansion

The PRD text is analyzed to extract key terms, risk flags, and search queries. The estimator confirms interpretation with the requester before proceeding.

    flowchart LR
    PRD["PRD Text"] --> KT["Extract\nKey Terms"]
    PRD --> RF["Detect\nRisk Flags"]
    PRD --> SQ["Generate\nSearch Queries"]
    KT --> CP["Checkpoint 1:\nConfirm interpretation"]
    RF --> CP
    SQ --> CP

    style CP fill:#f9f,stroke:#333
  

Risk flags are detected from keywords in the PRD:

PatternFlagSignificance
mavlink, shim, onboardhard_repo_mavlinkC++ firmware work, specialized skills needed
clojure, gcs, ground controlhard_repo_clojureClojure codebase, smaller contributor pool
faa, federal aviationfaa_relatedRegulatory implications
airdex, air dexexternal_integration_airdexExternal service dependency
dss, utm, deconflictionairspace_deconflictionAirspace management complexity

Stage 2: Repo Discovery

Search queries are run across the entire DroneUp GitHub org using gh search code. Affected repos are identified by hit count, and missing repos are cloned locally. Repos are ranked and the top 5 are selected for deep exploration.

Stage 3: Architecture Sketch

Parallel Explore agents are dispatched into the top affected repos. Each agent maps the repo’s architecture, identifies extension points, and documents integration patterns.

    flowchart TD
    TOP["Top 5 Affected Repos"] --> A1["Agent 1\nExplore Repo A"]
    TOP --> A2["Agent 2\nExplore Repo B"]
    TOP --> A3["Agent 3\nExplore Repo C"]
    TOP --> A4["Agent 4\nExplore Repo D"]
    TOP --> A5["Agent 5\nExplore Repo E"]

    A1 --> SK["Architecture Sketch\n+ Complexity Assessment"]
    A2 --> SK
    A3 --> SK
    A4 --> SK
    A5 --> SK

    SK --> CP["Checkpoint 2:\nConfirm architecture"]

    style A1 fill:#ddf,stroke:#333
    style A2 fill:#ddf,stroke:#333
    style A3 fill:#ddf,stroke:#333
    style A4 fill:#ddf,stroke:#333
    style A5 fill:#ddf,stroke:#333
    style CP fill:#f9f,stroke:#333
  

Each agent identifies: tech stack, data models, API surfaces, extension points, testing patterns, and dependencies. This ensures estimates are grounded in what the code actually looks like, not abstract guesses.

Stage 4: Story Breakdown

Each “change needed” from the architecture sketch becomes one or more stories. Stories are:

  • Assigned a domain (backend-go, frontend-react, firmware-cpp, etc.)
  • Given PERT estimates (O/L/P) calibrated against domain baselines from historic data
  • Traced back to PRD user stories for requirements coverage
  • Cross-referenced with comparable epics from Jira

Stage 5: Estimation

PERT aggregation produces the final estimate card with totals, confidence, domain subtotals, and cost projections. The estimate is saved as a structured YAML file for future re-runs.


Calibration: Where the Baselines Come From

Estimates are calibrated against real historic data from DroneUp’s Jira and GitHub:

    flowchart TD
    subgraph Data Sources
        J["Jira Epics\n& Stories"]
        G["GitHub PRs\n& Reviews"]
    end

    J --> F["prd-sizing refresh"]
    G --> F

    F --> S["stats.json"]

    subgraph Calibration Outputs
        BL["Baseline\nMedian days/story"]
        DB["Domain Baselines\nPer tech stack"]
        CE["Comparable Epics\nSimilar past work"]
        EA["Estimation Accuracy\nActual vs estimated"]
        VL["Velocity\nCompletion rate"]
    end

    S --> BL
    S --> DB
    S --> CE
    S --> EA
    S --> VL

    BL --> EST["Used during\nStory Breakdown"]
    DB --> EST
    CE --> EST
    EA --> EST

    style F fill:#ffd,stroke:#333
    style S fill:#dfd,stroke:#333
  
Data SourceWhat It Provides
Jira epics + storiesCycle times (In Progress -> Done), story counts per epic, story titles for pattern matching
GitHub PRsMerge times, review durations, code volume
Domain baselinesMedian days per story by tech domain (e.g., frontend-react = 8.7d median)
Comparable epicsReal duration data for similar past work
Estimation accuracyHow accurate past estimates were (actuals vs estimates ratio)

Keeping Data Fresh

Calibration data is refreshed by running:

cd .claude/skills/prd-sizing/scripts
.venv/bin/prd-sizing refresh

This fetches the latest Jira and GitHub data incrementally and recomputes baselines. The skill warns if calibration data is older than 30 days.


Re-Running an Estimate

Estimates are living documents. As work progresses, they can be re-run to incorporate actual data:

    flowchart LR
    subgraph "First Run"
        FR["All stories\nestimated O/L/P"]
    end

    subgraph "Re-Run"
        JF["Fetch Jira\nstatus + actuals"]
        RC["Reconcile:\nold vs new"]
        RE["Recompute\nwith actuals"]
    end

    FR --> JF
    JF --> RC
    RC --> CP["Checkpoint:\nReview delta"]
    CP --> RE
    RE --> UP["Updated\nestimate card"]

    style CP fill:#f9f,stroke:#333
    style UP fill:#dfd,stroke:#333
  

A re-run:

  • Fetches Jira status updates for stories with keys
  • Pulls actual cycle times for completed stories (replacing estimates with real data)
  • Discovers new stories added to the epic during implementation
  • Recomputes the aggregate with a mix of actuals and remaining estimates
  • Shows a delta of what changed
/prd-sizing re-run <slug-or-path>

Using Estimates for Planning

For Product Managers

  • Use the Expected Duration as the primary planning number
  • Use the 95% CI Range to communicate uncertainty to stakeholders
  • Use Phase Mapping to sequence work and identify what ships first
  • Use Cost Estimates for budgeting and ROI analysis
  • Check Comparable Epics to gut-check against similar past work

For Engineering Managers

  • Use the Story Breakdown to plan sprints and assign work
  • Use Domain Subtotals to understand staffing needs (e.g., “91 days of Go work, 38 days of React work”)
  • Use Risk Flags to identify where spikes or de-risking should happen first
  • Use the Architecture Sketch to understand cross-repo dependencies

For Executives

  • Use the Estimate Summary box for a one-glance view
  • Use the Cost Range for budget allocation (pad to the high end)
  • Compare the Expected Duration against the Comparable Epics actual durations
  • Read the Historic Accuracy warning – if past estimates undershot by 4x, factor that into expectations

Converting Days to Calendar Time

The estimate is in engineering days (active work time). To convert to calendar time:

    flowchart LR
    ED["Expected Days"] --> DIV["÷ Team Size\n÷ 5 days/week\n÷ Completion Rate"]
    DIV --> CW["Calendar Weeks"]

    style ED fill:#ddf,stroke:#333
    style CW fill:#dfd,stroke:#333
  
Calendar Weeks = Expected Days / (Team Size x 5 x Completion Rate)
Team SizeCompletion Rate167.7 expected days becomes…
1 engineer77%43.5 weeks (~10 months)
2 engineers77%21.8 weeks (~5 months)
3 engineers77%14.5 weeks (~3.5 months)
4 engineers77%10.9 weeks (~2.5 months)
5 engineers77%8.7 weeks (~2 months)
The completion rate (77%) comes from historic data and accounts for meetings, code review, context switching, on-call duties, and other non-coding work. It means engineers spend about 77% of their time on feature development. Your team’s rate may differ.

Common Questions

Why not just use story points?

Story points measure relative effort but don’t translate directly to time or cost. Different teams have different velocity, and points don’t account for specific codebase complexity. PERT estimates in days are more actionable for planning and budgeting.

How do you account for unknown unknowns?

Three ways:

  1. Pessimistic estimates explicitly model worst-case scenarios per story
  2. Comparable epics from real historic data show what similar work actually took
  3. Historic accuracy data reveals systematic bias (e.g., if actuals typically run 4x estimates)

What if the PRD changes after estimation?

Re-run the estimate. The re-run mode preserves existing data and layers in changes. Stories can be added, removed, or re-estimated.

Can this replace detailed sprint planning?

No. This is a macro estimate for budgeting, roadmap planning, and resource allocation. Sprint-level planning still needs to happen, ideally using the story breakdown as a starting point.

Why is “High confidence” not always reassuring?

Because confidence measures statistical spread, not accuracy. If you’re consistently wrong in the same direction (always underestimating), the spread can be narrow (high confidence) while the central estimate is still off. The comparable epics and accuracy data are the corrective lens.


Glossary

TermDefinition
PERTProgram Evaluation and Review Technique – a statistical estimation method using three-point estimates
95% CI95% Confidence Interval – the range within which the actual value falls with 95% probability
StdDevStandard Deviation – a measure of how spread out the estimates are
Fully-loaded rateThe total cost of an engineer including salary, benefits, taxes, tools, and overhead
Completion rateThe fraction of working time spent on feature development (vs meetings, reviews, etc.)
Domain baselineThe historic median days per story for a given technology domain
Comparable epicA past Jira epic with similar scope, used as a sanity check
Scope clarityHow well-defined the requirements are: A (fully defined), B (partially), C (early idea)
Architecture-drivenEstimates derived from exploring actual codebases, not just reading requirements
Last updated on