PRD Sizing: Process & Methodology
This page explains how PRD sizing estimates are produced, what the numbers mean, and how to use them for planning.
What Is PRD Sizing?
PRD Sizing is an architecture-driven estimation process that produces calibrated time and cost estimates for product features described in a PRD. It uses the PERT (Program Evaluation and Review Technique) method combined with codebase exploration and historic data from our Jira and GitHub activity.
Unlike story-point-based estimation, PRD Sizing estimates directly in days of active engineering work. Each story gets three estimates, and the math produces a range with a confidence level.
Process Overview
The full estimation process runs through five stages with mandatory human checkpoints at each decision point:
flowchart TD
A[PRD Input] --> B[Stage 1: Query Expansion]
B --> CP1{Checkpoint 1\nConfirm interpretation}
CP1 -->|Approved| C[Stage 2: Repo Discovery]
CP1 -->|Adjust| B
C --> D[Stage 3: Architecture Sketch]
D --> CP2{Checkpoint 2\nConfirm architecture}
CP2 -->|Approved| E[Stage 4: Story Breakdown]
CP2 -->|Explore deeper| D
E --> CP3{Checkpoint 3\nConfirm stories & PERT}
CP3 -->|Approved| F[Stage 5: Estimation]
CP3 -->|Adjust| E
F --> CP4{Checkpoint 4\nFinal estimate card}
CP4 -->|Approved| G[Save & Publish]
CP4 -->|Adjust| E
style CP1 fill:#f9f,stroke:#333
style CP2 fill:#f9f,stroke:#333
style CP3 fill:#f9f,stroke:#333
style CP4 fill:#f9f,stroke:#333
style G fill:#9f9,stroke:#333
The PERT Method
For each story, three time estimates are provided:
| Estimate | Symbol | Meaning |
|---|---|---|
| Optimistic | O | Best case – everything goes smoothly, no surprises |
| Likely | L | Normal case – typical friction, some back-and-forth |
| Pessimistic | P | Worst case – blockers, rework, unexpected complexity |
From these, two values are computed:
flowchart LR
O["O\n(Optimistic)"] --> PERT["Expected = (O + 4L + P) / 6"]
L["L\n(Likely)"] --> PERT
P["P\n(Pessimistic)"] --> PERT
O --> SD["StdDev = (P - O) / 6"]
P --> SD
PERT --> R["Expected Duration\nper story"]
SD --> R2["Uncertainty\nper story"]
style L fill:#ffd,stroke:#333
style PERT fill:#ddf,stroke:#333
style SD fill:#fdd,stroke:#333
Expected duration weights the likely estimate most heavily (4x) while incorporating best and worst cases. It produces a slightly pessimistic-leaning average, which is intentional – engineering work tends to take longer than expected.
Standard deviation measures how uncertain the estimate is. A story with O=2, P=4 has low uncertainty (StdDev=0.33). A story with O=2, P=14 has high uncertainty (StdDev=2.0).
How Stories Are Aggregated
Individual story estimates are combined using statistical aggregation:
flowchart TD
subgraph Stories
S1["Story 1\nE=4.2d, σ=0.83"]
S2["Story 2\nE=5.5d, σ=1.17"]
S3["Story 3\nE=7.7d, σ=1.67"]
SN["Story N\n..."]
end
S1 --> SUM["Total Expected\n= sum of all E"]
S2 --> SUM
S3 --> SUM
SN --> SUM
S1 --> VAR["Total StdDev\n= sqrt(sum of σ²)"]
S2 --> VAR
S3 --> VAR
SN --> VAR
SUM --> CI["95% Confidence Interval"]
VAR --> CI
CI --> LOW["Low = Expected - 2 × StdDev"]
CI --> HIGH["High = Expected + 2 × StdDev"]
style CI fill:#ddf,stroke:#333
style LOW fill:#dfd,stroke:#333
style HIGH fill:#fdd,stroke:#333
The 95% confidence interval means: there is a 95% probability the actual duration falls within this range, assuming risks are independent.
Reading the Estimate Card
Here is how to interpret each field in the estimate summary:
| Field | What It Means |
|---|---|
| Stories | Number of discrete work items identified |
| Expected Duration | Most probable total duration in engineering days |
| Range (95% CI) | Statistical bounds – 95% chance the actual falls within this range |
| Weeks | Expected duration divided by 5 working days per week |
| Confidence | How certain the estimate is (see below) |
| Expected Cost | Expected days multiplied by the fully-loaded daily rate |
| Cost Range | 95% CI range multiplied by the daily rate |
Confidence Levels
| Level | StdDev / Expected | Interpretation |
|---|---|---|
| High | < 0.3 | Narrow range. Estimate is well-bounded. |
| Medium | 0.3 – 0.6 | Moderate uncertainty. Plan for contingency. |
| Low | > 0.6 | Wide range. Treat the estimate as directional, not precise. |
Understanding Cost Estimates
Costs are derived from a fully-loaded daily rate:
flowchart LR
SAL["Average Salary\n$200k"] --> FL["Fully Loaded\n× 1.4 overhead"]
FL --> ANN["$280k/year"]
ANN --> DR["÷ 230 effective\nworking days"]
DR --> RATE["$1,200/day"]
RATE --> COST["× Expected Days\n= Total Cost"]
style SAL fill:#ddf,stroke:#333
style RATE fill:#ffd,stroke:#333
style COST fill:#dfd,stroke:#333
| Component | Typical Value | What It Includes |
|---|---|---|
| Average salary | Team-specific | Base compensation |
| Overhead multiplier | 1.3x – 1.5x | Benefits, payroll taxes, tools/licenses, office/infra, management overhead |
| Effective working days | ~230/year | 260 weekdays minus holidays, PTO, sick days, company events |
What Cost Estimates Do NOT Include
- Calendar time – 168 engineering days does not mean 168 calendar days. See “Converting Days to Calendar Time” below.
- Non-engineering costs – product management, design, QA, project management, travel, hardware procurement.
- Opportunity cost – what else the team could be building instead.
- External dependencies – waiting on third-party documentation, hardware deliveries, partner APIs.
The Five Stages in Detail
Stage 1: Query Expansion
The PRD text is analyzed to extract key terms, risk flags, and search queries. The estimator confirms interpretation with the requester before proceeding.
flowchart LR
PRD["PRD Text"] --> KT["Extract\nKey Terms"]
PRD --> RF["Detect\nRisk Flags"]
PRD --> SQ["Generate\nSearch Queries"]
KT --> CP["Checkpoint 1:\nConfirm interpretation"]
RF --> CP
SQ --> CP
style CP fill:#f9f,stroke:#333
Risk flags are detected from keywords in the PRD:
| Pattern | Flag | Significance |
|---|---|---|
| mavlink, shim, onboard | hard_repo_mavlink | C++ firmware work, specialized skills needed |
| clojure, gcs, ground control | hard_repo_clojure | Clojure codebase, smaller contributor pool |
| faa, federal aviation | faa_related | Regulatory implications |
| airdex, air dex | external_integration_airdex | External service dependency |
| dss, utm, deconfliction | airspace_deconfliction | Airspace management complexity |
Stage 2: Repo Discovery
Search queries are run across the entire DroneUp GitHub org using gh search code. Affected repos are identified by hit count, and missing repos are cloned locally. Repos are ranked and the top 5 are selected for deep exploration.
Stage 3: Architecture Sketch
Parallel Explore agents are dispatched into the top affected repos. Each agent maps the repo’s architecture, identifies extension points, and documents integration patterns.
flowchart TD
TOP["Top 5 Affected Repos"] --> A1["Agent 1\nExplore Repo A"]
TOP --> A2["Agent 2\nExplore Repo B"]
TOP --> A3["Agent 3\nExplore Repo C"]
TOP --> A4["Agent 4\nExplore Repo D"]
TOP --> A5["Agent 5\nExplore Repo E"]
A1 --> SK["Architecture Sketch\n+ Complexity Assessment"]
A2 --> SK
A3 --> SK
A4 --> SK
A5 --> SK
SK --> CP["Checkpoint 2:\nConfirm architecture"]
style A1 fill:#ddf,stroke:#333
style A2 fill:#ddf,stroke:#333
style A3 fill:#ddf,stroke:#333
style A4 fill:#ddf,stroke:#333
style A5 fill:#ddf,stroke:#333
style CP fill:#f9f,stroke:#333
Each agent identifies: tech stack, data models, API surfaces, extension points, testing patterns, and dependencies. This ensures estimates are grounded in what the code actually looks like, not abstract guesses.
Stage 4: Story Breakdown
Each “change needed” from the architecture sketch becomes one or more stories. Stories are:
- Assigned a domain (backend-go, frontend-react, firmware-cpp, etc.)
- Given PERT estimates (O/L/P) calibrated against domain baselines from historic data
- Traced back to PRD user stories for requirements coverage
- Cross-referenced with comparable epics from Jira
Stage 5: Estimation
PERT aggregation produces the final estimate card with totals, confidence, domain subtotals, and cost projections. The estimate is saved as a structured YAML file for future re-runs.
Calibration: Where the Baselines Come From
Estimates are calibrated against real historic data from DroneUp’s Jira and GitHub:
flowchart TD
subgraph Data Sources
J["Jira Epics\n& Stories"]
G["GitHub PRs\n& Reviews"]
end
J --> F["prd-sizing refresh"]
G --> F
F --> S["stats.json"]
subgraph Calibration Outputs
BL["Baseline\nMedian days/story"]
DB["Domain Baselines\nPer tech stack"]
CE["Comparable Epics\nSimilar past work"]
EA["Estimation Accuracy\nActual vs estimated"]
VL["Velocity\nCompletion rate"]
end
S --> BL
S --> DB
S --> CE
S --> EA
S --> VL
BL --> EST["Used during\nStory Breakdown"]
DB --> EST
CE --> EST
EA --> EST
style F fill:#ffd,stroke:#333
style S fill:#dfd,stroke:#333
| Data Source | What It Provides |
|---|---|
| Jira epics + stories | Cycle times (In Progress -> Done), story counts per epic, story titles for pattern matching |
| GitHub PRs | Merge times, review durations, code volume |
| Domain baselines | Median days per story by tech domain (e.g., frontend-react = 8.7d median) |
| Comparable epics | Real duration data for similar past work |
| Estimation accuracy | How accurate past estimates were (actuals vs estimates ratio) |
Keeping Data Fresh
Calibration data is refreshed by running:
cd .claude/skills/prd-sizing/scripts
.venv/bin/prd-sizing refreshThis fetches the latest Jira and GitHub data incrementally and recomputes baselines. The skill warns if calibration data is older than 30 days.
Re-Running an Estimate
Estimates are living documents. As work progresses, they can be re-run to incorporate actual data:
flowchart LR
subgraph "First Run"
FR["All stories\nestimated O/L/P"]
end
subgraph "Re-Run"
JF["Fetch Jira\nstatus + actuals"]
RC["Reconcile:\nold vs new"]
RE["Recompute\nwith actuals"]
end
FR --> JF
JF --> RC
RC --> CP["Checkpoint:\nReview delta"]
CP --> RE
RE --> UP["Updated\nestimate card"]
style CP fill:#f9f,stroke:#333
style UP fill:#dfd,stroke:#333
A re-run:
- Fetches Jira status updates for stories with keys
- Pulls actual cycle times for completed stories (replacing estimates with real data)
- Discovers new stories added to the epic during implementation
- Recomputes the aggregate with a mix of actuals and remaining estimates
- Shows a delta of what changed
/prd-sizing re-run <slug-or-path>Using Estimates for Planning
For Product Managers
- Use the Expected Duration as the primary planning number
- Use the 95% CI Range to communicate uncertainty to stakeholders
- Use Phase Mapping to sequence work and identify what ships first
- Use Cost Estimates for budgeting and ROI analysis
- Check Comparable Epics to gut-check against similar past work
For Engineering Managers
- Use the Story Breakdown to plan sprints and assign work
- Use Domain Subtotals to understand staffing needs (e.g., “91 days of Go work, 38 days of React work”)
- Use Risk Flags to identify where spikes or de-risking should happen first
- Use the Architecture Sketch to understand cross-repo dependencies
For Executives
- Use the Estimate Summary box for a one-glance view
- Use the Cost Range for budget allocation (pad to the high end)
- Compare the Expected Duration against the Comparable Epics actual durations
- Read the Historic Accuracy warning – if past estimates undershot by 4x, factor that into expectations
Converting Days to Calendar Time
The estimate is in engineering days (active work time). To convert to calendar time:
flowchart LR
ED["Expected Days"] --> DIV["÷ Team Size\n÷ 5 days/week\n÷ Completion Rate"]
DIV --> CW["Calendar Weeks"]
style ED fill:#ddf,stroke:#333
style CW fill:#dfd,stroke:#333
Calendar Weeks = Expected Days / (Team Size x 5 x Completion Rate)| Team Size | Completion Rate | 167.7 expected days becomes… |
|---|---|---|
| 1 engineer | 77% | 43.5 weeks (~10 months) |
| 2 engineers | 77% | 21.8 weeks (~5 months) |
| 3 engineers | 77% | 14.5 weeks (~3.5 months) |
| 4 engineers | 77% | 10.9 weeks (~2.5 months) |
| 5 engineers | 77% | 8.7 weeks (~2 months) |
Common Questions
Why not just use story points?
Story points measure relative effort but don’t translate directly to time or cost. Different teams have different velocity, and points don’t account for specific codebase complexity. PERT estimates in days are more actionable for planning and budgeting.
How do you account for unknown unknowns?
Three ways:
- Pessimistic estimates explicitly model worst-case scenarios per story
- Comparable epics from real historic data show what similar work actually took
- Historic accuracy data reveals systematic bias (e.g., if actuals typically run 4x estimates)
What if the PRD changes after estimation?
Re-run the estimate. The re-run mode preserves existing data and layers in changes. Stories can be added, removed, or re-estimated.
Can this replace detailed sprint planning?
No. This is a macro estimate for budgeting, roadmap planning, and resource allocation. Sprint-level planning still needs to happen, ideally using the story breakdown as a starting point.
Why is “High confidence” not always reassuring?
Because confidence measures statistical spread, not accuracy. If you’re consistently wrong in the same direction (always underestimating), the spread can be narrow (high confidence) while the central estimate is still off. The comparable epics and accuracy data are the corrective lens.
Glossary
| Term | Definition |
|---|---|
| PERT | Program Evaluation and Review Technique – a statistical estimation method using three-point estimates |
| 95% CI | 95% Confidence Interval – the range within which the actual value falls with 95% probability |
| StdDev | Standard Deviation – a measure of how spread out the estimates are |
| Fully-loaded rate | The total cost of an engineer including salary, benefits, taxes, tools, and overhead |
| Completion rate | The fraction of working time spent on feature development (vs meetings, reviews, etc.) |
| Domain baseline | The historic median days per story for a given technology domain |
| Comparable epic | A past Jira epic with similar scope, used as a sanity check |
| Scope clarity | How well-defined the requirements are: A (fully defined), B (partially), C (early idea) |
| Architecture-driven | Estimates derived from exploring actual codebases, not just reading requirements |