PRD Sizing: Process & Methodology

What Is PRD Sizing?

PRD Sizing is an architecture-driven estimation process that produces calibrated time and cost estimates for product features described in a PRD. It uses the PERT (Program Evaluation and Review Technique) method combined with codebase exploration and historic data from our Jira and GitHub activity.

Unlike story-point-based estimation, PRD Sizing estimates directly in days of active engineering work. Each story gets three estimates, and the math produces a range with a confidence level.

Process Overview

The full estimation process runs through five stages with mandatory human checkpoints at each decision point:

    flowchart TD
    A[PRD Input] --> B[Stage 1: Query Expansion]
    B --> CP1{Checkpoint 1\nConfirm interpretation}
    CP1 -->|Approved| C[Stage 2: Repo Discovery]
    CP1 -->|Adjust| B
    C --> D[Stage 3: Architecture Sketch]
    D --> CP2{Checkpoint 2\nConfirm architecture}
    CP2 -->|Approved| E[Stage 4: Story Breakdown]
    CP2 -->|Explore deeper| D
    E --> CP3{Checkpoint 3\nConfirm stories & PERT}
    CP3 -->|Approved| F[Stage 5: Estimation]
    CP3 -->|Adjust| E
    F --> CP4{Checkpoint 4\nFinal estimate card}
    CP4 -->|Approved| G[Save & Publish]
    CP4 -->|Adjust| E

    style CP1 fill:#f9f,stroke:#333
    style CP2 fill:#f9f,stroke:#333
    style CP3 fill:#f9f,stroke:#333
    style CP4 fill:#f9f,stroke:#333
    style G fill:#9f9,stroke:#333

The PERT Method

For each story, three time estimates are provided:

Estimate	Symbol	Meaning
Optimistic	O	Best case – everything goes smoothly, no surprises
Likely	L	Normal case – typical friction, some back-and-forth
Pessimistic	P	Worst case – blockers, rework, unexpected complexity

From these, two values are computed:

    flowchart LR
    O["O\n(Optimistic)"] --> PERT["Expected = (O + 4L + P) / 6"]
    L["L\n(Likely)"] --> PERT
    P["P\n(Pessimistic)"] --> PERT
    O --> SD["StdDev = (P - O) / 6"]
    P --> SD
    PERT --> R["Expected Duration\nper story"]
    SD --> R2["Uncertainty\nper story"]

    style L fill:#ffd,stroke:#333
    style PERT fill:#ddf,stroke:#333
    style SD fill:#fdd,stroke:#333

Expected duration weights the likely estimate most heavily (4x) while incorporating best and worst cases. It produces a slightly pessimistic-leaning average, which is intentional – engineering work tends to take longer than expected.

Standard deviation measures how uncertain the estimate is. A story with O=2, P=4 has low uncertainty (StdDev=0.33). A story with O=2, P=14 has high uncertainty (StdDev=2.0).

How Stories Are Aggregated

Individual story estimates are combined using statistical aggregation:

    flowchart TD
    subgraph Stories
        S1["Story 1\nE=4.2d, σ=0.83"]
        S2["Story 2\nE=5.5d, σ=1.17"]
        S3["Story 3\nE=7.7d, σ=1.67"]
        SN["Story N\n..."]
    end

    S1 --> SUM["Total Expected\n= sum of all E"]
    S2 --> SUM
    S3 --> SUM
    SN --> SUM

    S1 --> VAR["Total StdDev\n= sqrt(sum of σ²)"]
    S2 --> VAR
    S3 --> VAR
    SN --> VAR

    SUM --> CI["95% Confidence Interval"]
    VAR --> CI

    CI --> LOW["Low = Expected - 2 × StdDev"]
    CI --> HIGH["High = Expected + 2 × StdDev"]

    style CI fill:#ddf,stroke:#333
    style LOW fill:#dfd,stroke:#333
    style HIGH fill:#fdd,stroke:#333

The 95% confidence interval means: there is a 95% probability the actual duration falls within this range, assuming risks are independent.

Important caveat: The 95% CI assumes risks are independent (one story blowing up doesn’t cause others to blow up). In practice, systemic risks – like an external dependency being undocumented, or a technology choice not working out – can affect multiple stories simultaneously. The “Comparable Epics” section in each estimate provides a reality check against actual historic durations.

Reading the Estimate Card

Here is how to interpret each field in the estimate summary:

Field	What It Means
Stories	Number of discrete work items identified
Expected Duration	Most probable total duration in engineering days
Range (95% CI)	Statistical bounds – 95% chance the actual falls within this range
Weeks	Expected duration divided by 5 working days per week
Confidence	How certain the estimate is (see below)
Expected Cost	Expected days multiplied by the fully-loaded daily rate
Cost Range	95% CI range multiplied by the daily rate

Confidence Levels

Level	StdDev / Expected	Interpretation
High	< 0.3	Narrow range. Estimate is well-bounded.
Medium	0.3 – 0.6	Moderate uncertainty. Plan for contingency.
Low	> 0.6	Wide range. Treat the estimate as directional, not precise.

High confidence does not mean “definitely accurate.” It means the statistical spread is narrow. If the estimate is systematically biased (e.g., every story underestimated by 2x), the confidence level won’t catch that. This is why we include comparable epics and historic accuracy data as a cross-check.

Understanding Cost Estimates

Costs are derived from a fully-loaded daily rate:

    flowchart LR
    SAL["Average Salary\n$200k"] --> FL["Fully Loaded\n× 1.4 overhead"]
    FL --> ANN["$280k/year"]
    ANN --> DR["÷ 230 effective\nworking days"]
    DR --> RATE["$1,200/day"]
    RATE --> COST["× Expected Days\n= Total Cost"]

    style SAL fill:#ddf,stroke:#333
    style RATE fill:#ffd,stroke:#333
    style COST fill:#dfd,stroke:#333

Component	Typical Value	What It Includes
Average salary	Team-specific	Base compensation
Overhead multiplier	1.3x – 1.5x	Benefits, payroll taxes, tools/licenses, office/infra, management overhead
Effective working days	~230/year	260 weekdays minus holidays, PTO, sick days, company events

What Cost Estimates Do NOT Include

Calendar time – 168 engineering days does not mean 168 calendar days. See “Converting Days to Calendar Time” below.
Non-engineering costs – product management, design, QA, project management, travel, hardware procurement.
Opportunity cost – what else the team could be building instead.
External dependencies – waiting on third-party documentation, hardware deliveries, partner APIs.

The Five Stages in Detail

Stage 1: Query Expansion

The PRD text is analyzed to extract key terms, risk flags, and search queries. The estimator confirms interpretation with the requester before proceeding.

    flowchart LR
    PRD["PRD Text"] --> KT["Extract\nKey Terms"]
    PRD --> RF["Detect\nRisk Flags"]
    PRD --> SQ["Generate\nSearch Queries"]
    KT --> CP["Checkpoint 1:\nConfirm interpretation"]
    RF --> CP
    SQ --> CP

    style CP fill:#f9f,stroke:#333

Risk flags are detected from keywords in the PRD:

Pattern	Flag	Significance
mavlink, shim, onboard	`hard_repo_mavlink`	C++ firmware work, specialized skills needed
clojure, gcs, ground control	`hard_repo_clojure`	Clojure codebase, smaller contributor pool
faa, federal aviation	`faa_related`	Regulatory implications
airdex, air dex	`external_integration_airdex`	External service dependency
dss, utm, deconfliction	`airspace_deconfliction`	Airspace management complexity

Stage 2: Repo Discovery

Search queries are run across the entire DroneUp GitHub org using gh search code. Affected repos are identified by hit count, and missing repos are cloned locally. Repos are ranked and the top 5 are selected for deep exploration.

Stage 3: Architecture Sketch

Parallel Explore agents are dispatched into the top affected repos. Each agent maps the repo’s architecture, identifies extension points, and documents integration patterns.

    flowchart TD
    TOP["Top 5 Affected Repos"] --> A1["Agent 1\nExplore Repo A"]
    TOP --> A2["Agent 2\nExplore Repo B"]
    TOP --> A3["Agent 3\nExplore Repo C"]
    TOP --> A4["Agent 4\nExplore Repo D"]
    TOP --> A5["Agent 5\nExplore Repo E"]

    A1 --> SK["Architecture Sketch\n+ Complexity Assessment"]
    A2 --> SK
    A3 --> SK
    A4 --> SK
    A5 --> SK

    SK --> CP["Checkpoint 2:\nConfirm architecture"]

    style A1 fill:#ddf,stroke:#333
    style A2 fill:#ddf,stroke:#333
    style A3 fill:#ddf,stroke:#333
    style A4 fill:#ddf,stroke:#333
    style A5 fill:#ddf,stroke:#333
    style CP fill:#f9f,stroke:#333

Each agent identifies: tech stack, data models, API surfaces, extension points, testing patterns, and dependencies. This ensures estimates are grounded in what the code actually looks like, not abstract guesses.

Stage 4: Story Breakdown

Each “change needed” from the architecture sketch becomes one or more stories. Stories are:

Assigned a domain (backend-go, frontend-react, firmware-cpp, etc.)
Given PERT estimates (O/L/P) calibrated against domain baselines from historic data
Traced back to PRD user stories for requirements coverage
Cross-referenced with comparable epics from Jira

Stage 5: Estimation

PERT aggregation produces the final estimate card with totals, confidence, domain subtotals, and cost projections. The estimate is saved as a structured YAML file for future re-runs.

Calibration: Where the Baselines Come From

Estimates are calibrated against real historic data from DroneUp’s Jira and GitHub:

    flowchart TD
    subgraph Data Sources
        J["Jira Epics\n& Stories"]
        G["GitHub PRs\n& Reviews"]
    end

    J --> F["prd-sizing refresh"]
    G --> F

    F --> S["stats.json"]

    subgraph Calibration Outputs
        BL["Baseline\nMedian days/story"]
        DB["Domain Baselines\nPer tech stack"]
        CE["Comparable Epics\nSimilar past work"]
        EA["Estimation Accuracy\nActual vs estimated"]
        VL["Velocity\nCompletion rate"]
    end

    S --> BL
    S --> DB
    S --> CE
    S --> EA
    S --> VL

    BL --> EST["Used during\nStory Breakdown"]
    DB --> EST
    CE --> EST
    EA --> EST

    style F fill:#ffd,stroke:#333
    style S fill:#dfd,stroke:#333

Data Source	What It Provides
Jira epics + stories	Cycle times (In Progress -> Done), story counts per epic, story titles for pattern matching
GitHub PRs	Merge times, review durations, code volume
Domain baselines	Median days per story by tech domain (e.g., frontend-react = 8.7d median)
Comparable epics	Real duration data for similar past work
Estimation accuracy	How accurate past estimates were (actuals vs estimates ratio)

Keeping Data Fresh

Calibration data is refreshed by running:

cd .claude/skills/prd-sizing/scripts
.venv/bin/prd-sizing refresh

This fetches the latest Jira and GitHub data incrementally and recomputes baselines. The skill warns if calibration data is older than 30 days.

Re-Running an Estimate

Estimates are living documents. As work progresses, they can be re-run to incorporate actual data:

    flowchart LR
    subgraph "First Run"
        FR["All stories\nestimated O/L/P"]
    end

    subgraph "Re-Run"
        JF["Fetch Jira\nstatus + actuals"]
        RC["Reconcile:\nold vs new"]
        RE["Recompute\nwith actuals"]
    end

    FR --> JF
    JF --> RC
    RC --> CP["Checkpoint:\nReview delta"]
    CP --> RE
    RE --> UP["Updated\nestimate card"]

    style CP fill:#f9f,stroke:#333
    style UP fill:#dfd,stroke:#333

A re-run:

Fetches Jira status updates for stories with keys
Pulls actual cycle times for completed stories (replacing estimates with real data)
Discovers new stories added to the epic during implementation
Recomputes the aggregate with a mix of actuals and remaining estimates
Shows a delta of what changed

/prd-sizing re-run <slug-or-path>

Using Estimates for Planning

For Product Managers

Use the Expected Duration as the primary planning number
Use the 95% CI Range to communicate uncertainty to stakeholders
Use Phase Mapping to sequence work and identify what ships first
Use Cost Estimates for budgeting and ROI analysis
Check Comparable Epics to gut-check against similar past work

For Engineering Managers

Use the Story Breakdown to plan sprints and assign work
Use Domain Subtotals to understand staffing needs (e.g., “91 days of Go work, 38 days of React work”)
Use Risk Flags to identify where spikes or de-risking should happen first
Use the Architecture Sketch to understand cross-repo dependencies

For Executives

Use the Estimate Summary box for a one-glance view
Use the Cost Range for budget allocation (pad to the high end)
Compare the Expected Duration against the Comparable Epics actual durations
Read the Historic Accuracy warning – if past estimates undershot by 4x, factor that into expectations

Converting Days to Calendar Time

The estimate is in engineering days (active work time). To convert to calendar time:

    flowchart LR
    ED["Expected Days"] --> DIV["÷ Team Size\n÷ 5 days/week\n÷ Completion Rate"]
    DIV --> CW["Calendar Weeks"]

    style ED fill:#ddf,stroke:#333
    style CW fill:#dfd,stroke:#333

Calendar Weeks = Expected Days / (Team Size x 5 x Completion Rate)

Team Size	Completion Rate	167.7 expected days becomes…
1 engineer	77%	43.5 weeks (~10 months)
2 engineers	77%	21.8 weeks (~5 months)
3 engineers	77%	14.5 weeks (~3.5 months)
4 engineers	77%	10.9 weeks (~2.5 months)
5 engineers	77%	8.7 weeks (~2 months)

The completion rate (77%) comes from historic data and accounts for meetings, code review, context switching, on-call duties, and other non-coding work. It means engineers spend about 77% of their time on feature development. Your team’s rate may differ.

Common Questions

Why not just use story points?

Story points measure relative effort but don’t translate directly to time or cost. Different teams have different velocity, and points don’t account for specific codebase complexity. PERT estimates in days are more actionable for planning and budgeting.

How do you account for unknown unknowns?

Three ways:

Pessimistic estimates explicitly model worst-case scenarios per story
Comparable epics from real historic data show what similar work actually took
Historic accuracy data reveals systematic bias (e.g., if actuals typically run 4x estimates)

What if the PRD changes after estimation?

Re-run the estimate. The re-run mode preserves existing data and layers in changes. Stories can be added, removed, or re-estimated.

Can this replace detailed sprint planning?

No. This is a macro estimate for budgeting, roadmap planning, and resource allocation. Sprint-level planning still needs to happen, ideally using the story breakdown as a starting point.

Why is “High confidence” not always reassuring?

Because confidence measures statistical spread, not accuracy. If you’re consistently wrong in the same direction (always underestimating), the spread can be narrow (high confidence) while the central estimate is still off. The comparable epics and accuracy data are the corrective lens.

Glossary

Term	Definition
PERT	Program Evaluation and Review Technique – a statistical estimation method using three-point estimates
95% CI	95% Confidence Interval – the range within which the actual value falls with 95% probability
StdDev	Standard Deviation – a measure of how spread out the estimates are
Fully-loaded rate	The total cost of an engineer including salary, benefits, taxes, tools, and overhead
Completion rate	The fraction of working time spent on feature development (vs meetings, reviews, etc.)
Domain baseline	The historic median days per story for a given technology domain
Comparable epic	A past Jira epic with similar scope, used as a sanity check
Scope clarity	How well-defined the requirements are: A (fully defined), B (partially), C (early idea)
Architecture-driven	Estimates derived from exploring actual codebases, not just reading requirements

Last updated on May 11, 2026

_PRD Template