Brave New IA World
Post #004 — The AIRUP Series

SDD Meets RUP

When the hottest AI-native spec format accidentally reinvents artifacts that IBM catalogued in 1998. Convergent evolution is a beautiful thing.

Ricardo Costa ~14 min read

In the SDD post, I described the three-phase spec pipeline: requirements.mddesign.mdtasks.md. In the AIRUP post, I mapped RUP roles to AI agents. The careful reader might have noticed something suspicious: the SDD pipeline looks a lot like a compressed version of the RUP workflow. That's not a coincidence. It's convergent evolution — and unpacking it is the key to understanding why AIRUP works.

Today I'm going to show you the exact mapping between modern SDD spec formats and classical RUP artifacts. Not as a historical curiosity, but because this mapping reveals something profound: twenty-five years of process engineering didn't produce waste — it produced a scaffold that was simply waiting for the right executors.

The Mapping Nobody Drew

Let's start with the punchline. Here's how the core SDD artifacts map to their RUP ancestors:

SDD Artifact RUP Artifact(s) RUP Discipline
requirements.md Software Requirements Specification (SRS), Use-Case Model, Supplementary Specification Requirements
design.md Software Architecture Document (SAD), Design Model, Architecture Decision Records Analysis & Design
tasks.md Iteration Plan, Work Breakdown Structure, Implementation Model (subset) Project Management + Implementation
test-plan.md Test Plan, Test-Case Specification Test
vision.md / problem statement Vision Document, Business Case Business Modeling

If you've worked with SDD tools — Kiro, Spec-Kit, or any internal framework that generates requirements.mddesign.mdtasks.md — you've been producing a compressed, machine-readable version of what RUP practitioners spent weeks writing in Rational Rose and IBM RequisitePro.

The question is: is this just a surface-level naming coincidence, or is there something deeper going on?

Convergent Evolution in Process Design

In biology, convergent evolution is when unrelated species independently develop similar traits because they solve the same problem. Dolphins and sharks both have streamlined bodies and dorsal fins — not because they share an ancestor, but because water imposes the same physics on anything trying to swim fast through it.

Software development imposes similar "physics." Any process that tries to get from "I have a problem" to "I have a deployed solution" must answer the same fundamental questions, in roughly the same order:

  1. What should the system do? (Requirements)
  2. How should it be structured? (Design)
  3. Who does what, in what order? (Task decomposition)
  4. Did we build the right thing? (Verification)

RUP answered these questions with formal artifacts, ceremony, and tooling. SDD answers them with markdown files and AI agents. The answers are shaped differently, but the questions are identical — because the underlying problem is identical.

"SDD didn't reinvent the wheel. It reinvented the axle — same function, different material, dramatically lower friction."

This matters for AIRUP because it means we're not forcing an artificial marriage between unrelated concepts. SDD and RUP are naturally compatible — they express the same epistemic structure in different formats. AIRUP simply makes the compatibility explicit and exploits it.

Artifact by Artifact: What Changes and What Doesn't

Let's go deeper. For each SDD artifact, I'll show what it shares with its RUP counterpart, what it does differently, and what AIRUP gains from the combination.

requirements.md ↔ Software Requirements Specification

The RUP Software Requirements Specification (SRS) was a heavyweight document. IEEE 830 format. Sections for scope, definitions, system features, external interface requirements, performance requirements, design constraints. A fully-fleshed SRS for a medium project could run 50-100 pages.

An SDD requirements.md contains the same information categories — functional requirements, non-functional constraints, acceptance criteria — but in a format designed for machine consumption. The key innovations:

What AIRUP gains

In AIRUP, requirements.md is produced by the @requirements agent during the Elaboration phase and refined in Construction. Because the format is structured, the AI Governor can run deterministic validations — checking ID uniqueness, EARS syntax, scenario coverage — without spending a single token on an LLM. The RUP SRS needed a human reviewer to catch formatting inconsistencies. The SDD requirements.md catches them with a regex.

design.md ↔ Software Architecture Document

The RUP Software Architecture Document (SAD) was perhaps the most valuable artifact in the entire process — and the one that hurt the most to produce. It documented the system's architectural views (logical, process, deployment, implementation), key design decisions, and their rationale. Philippe Kruchten's 4+1 architectural view model was born here.

An SDD design.md compresses this into a focused document with:

The philosophical difference is subtle but important: the RUP SAD described architecture for human understanding. It used diagrams, prose, and visual models because humans are visual thinkers. The SDD design.md describes architecture for machine execution. It uses structured text, code-like contracts, and explicit cross-references because agents need unambiguous instructions, not elegant diagrams.

Neither format is inherently better. They optimize for different readers. AIRUP's insight is that you can have both: the structured machine-readable format as the source of truth, with human-friendly diagrams generated from the structured data whenever a human needs to review.

tasks.md ↔ Iteration Plan + Work Breakdown

This is where the mapping gets interesting, because RUP didn't have a single artifact that cleanly corresponds to tasks.md. The functionality was spread across:

SDD collapses all of this into a single tasks.md: an ordered list of atomic, implementable tasks, each referencing the design decisions (DES-N) and requirements (REQ-N) it implements. Each task is small enough to be completed in a single commit — essentially a work unit designed for TDD cycles.

This collapse is one of SDD's genuinely original contributions. RUP's separation of iteration planning from work breakdown from implementation modeling made sense when a project manager, a tech lead, and a developer were three different humans with three different perspectives. When the "executor" is an agent pipeline, the separation creates overhead without insight. One artifact, with full traceability, is better.

Aspect RUP Approach SDD Approach AIRUP Synthesis
Granularity Work packages, sometimes vague Atomic tasks, always specific Atomic, with phase-aware ordering
Traceability Separate matrix document Inline references (TASK→DES→REQ) Inline + validated by Governor
Ordering Iteration-based (time-boxed) Dependency-based (topological) Layer-based (domain→app→infra→entry)
Verification Iteration review (human) TDD cycle (automated) TDD + quality gates + Governor oversight

The Traceability Chain: SDD's Gift to RUP

If there's one concept that makes the SDD-RUP marriage productive rather than ceremonial, it's the traceability chain.

RUP always preached traceability. The Rational Unified Process handbook dedicates entire sections to it. But in practice, traceability was the artifact that died first. Maintaining a separate traceability matrix — a spreadsheet mapping requirements to design elements to test cases to code modules — was mind-numbing work that nobody wanted to do and that rotted the moment someone forgot to update it.

SDD makes traceability structural rather than documentary. It's not a matrix you maintain — it's a property of the format:

The Traceability Chain
SC-N
BDD Scenario
REQ-N
Requirement
DES-N
Design Decision
TASK-N
Implementation
TEST-N
Verification

Every element references its upstream justification. TASK-007 implements DES-003, which satisfies REQ-012, which derives from SC-002. The chain is embedded in the artifacts themselves — not in a separate document that can drift.

And here's what makes this transformative for AIRUP: the chain is machine-verifiable. The AI Governor can run a deterministic check — zero LLM tokens — that answers:

In the RUP era, these questions required a human analyst to manually cross-check documents. In AIRUP, they're graph traversals on structured data. The answer comes in milliseconds, not hours.

"RUP told you traceability was important. SDD made it cheap. AIRUP made it automatic."

What RUP Gives SDD (That SDD Can't Give Itself)

If SDD already provides structured specs with inline traceability, why bring RUP into the picture at all? Because SDD is a format, not a process. And formats without processes are recipes without a kitchen.

Here are the four things RUP contributes that SDD alone cannot provide:

1. Progressive Elaboration Through Phases

SDD, as typically practiced, generates specs in one pass: describe the problem, generate requirements, derive design, decompose tasks, implement. It's a pipeline — linear, unidirectional, and brittle in the face of uncertainty.

RUP's four phases (Inception, Elaboration, Construction, Transition) provide a maturity model for specifications. During Inception, the requirements.md is deliberately incomplete — it captures the top-level scenarios and known constraints, nothing more. During Elaboration, the architecturally significant requirements get fleshed out. During Construction, the remaining requirements are detailed iteration by iteration.

This progressive refinement addresses the core criticism of SDD — that specs become stale or over-specified. In AIRUP, the spec is never "done"; it's "at the right level of detail for the current phase."

2. Quality Gates Between Artifacts

In vanilla SDD, the pipeline flows from requirements to design to tasks to code, and the only validation is "does the code pass the tests?" But what if the design is internally inconsistent? What if the requirements contradict each other? What if the tasks miss an entire domain?

RUP introduces milestone reviews at the boundary of each phase. In AIRUP, these become automated quality gates:

These gates catch structural problems before any code is written. In my experiments with AIRUP, the post-design gate alone catches an average of 3-4 issues per feature that would otherwise propagate into code and require expensive rework.

3. Risk-Driven Iteration Order

SDD pipelines are typically feature-driven: implement feature A, then feature B, then feature C. RUP is risk-driven: implement the hardest, most uncertain parts first (during Elaboration), and save the routine work for Construction.

This matters enormously for AI-driven development. If you let an agent implement the easy features first and leave the architectural uncertainty for later, you end up with a codebase that works for simple cases but has a flawed foundation. RUP's risk-driven approach, combined with the AI Governor's ability to assess confidence per requirement, ensures that AIRUP tackles the structurally important work before the cosmetic work.

4. The Governor's Process-Aware Governance

A standalone SDD pipeline has no concept of "where am I in the process?" It knows that requirements come before design, and design before tasks, but it doesn't know whether it's in an early exploratory phase or a late stabilization phase. The pipeline treats every invocation with the same urgency.

RUP gives the AI Governor phase context. During Inception, the Governor knows to be tolerant of ambiguity and brief in its specs. During Construction, it knows to demand precision and completeness. During Transition, it shifts focus from creation to validation. The same Governor, the same strategies — but calibrated to the lifecycle phase.


What SDD Gives RUP (That RUP Could Never Achieve)

Fair is fair. If RUP gives SDD governance, SDD gives RUP something equally important: executability.

Machine-Readable Artifacts

RUP artifacts were designed for human consumption. They used prose, UML diagrams, and structured templates that a person could read, interpret, and act on. But "interpret" is the operative word — different humans would interpret the same artifact differently, leading to the coordination problems that agile used as evidence against RUP.

SDD artifacts are designed for machine consumption. Structured IDs, formal syntax (EARS, TOON), inline references, parseable frontmatter. An AI agent reading a requirements.md can extract every requirement, check its syntax, resolve its references, and begin implementing — all without ambiguity.

This isn't a minor improvement. It's the difference between a blueprint that a human reads and translates into action, and a program that a machine executes directly. SDD makes RUP artifacts executable.

Deterministic Validation

RUP's quality gates required human judgment: "Is this architecture document complete? Does this requirements specification cover the scope?" These are subjective questions that different reviewers answer differently.

SDD's structured format enables a significant portion of validation to be deterministic. As discussed in the Governor post, checks like ID uniqueness, EARS syntax compliance, cross-reference completeness, and traceability coverage can run as code — no LLM required, no human judgment needed.

In AIRUP, we estimate that 40-50% of the validation that previously required human reviewers can be automated deterministically. The remaining 50-60% — judgment calls about design quality, requirement relevance, and architectural fitness — goes to AI agents, with the Governor routing only truly ambiguous cases to humans.

Specs as Code

I touched on this in the SDD post, but it's worth emphasizing here. RUP artifacts lived in Rational Rose, RequisitePro, ClearCase — proprietary tools with proprietary formats. They were difficult to version, impossible to diff meaningfully, and completely disconnected from the codebase.

SDD artifacts are markdown files in a git repository. They sit next to the code. They go through pull requests. They have a commit history. When a developer changes a function signature and doesn't update the corresponding design.md, it's as visible as a broken test — and can be as enforceable.

This solves RUP's most persistent practical problem: documentation drift. Not by asking humans to be more disciplined (that failed for 25 years), but by making specs first-class citizens of the development workflow, subject to the same automation and enforcement as code.

The AIRUP Synthesis

Let me pull this together with a concrete picture of what the AIRUP artifact lifecycle looks like in practice:

Phase Artifacts Produced Level of Detail Validation
Inception vision.md, initial requirements.md (top scenarios only) Broad strokes — 70% of scope, 30% of detail Human review of business alignment
Elaboration Full requirements.md, design.md with ADRs, architecture prototype Architecturally significant — deep where risk is high Deterministic gates + AI review + human approval at milestone
Construction Refined requirements.md, detailed tasks.md, code, tests, test-plan.md Complete — every requirement, every task, every test TDD cycles + traceability chain validation + Governor oversight
Transition Deployment spec, release notes, final traceability report Verification-focused — does it match the spec? Full chain audit: SC→REQ→DES→TASK→TEST→CODE

Notice how the artifacts evolve across phases. The requirements.md that exists in Inception is not the same requirements.md in Construction — it's been enriched, refined, and validated through multiple iterations. But it's the same file, in the same repo, with a full git history of every change.

This is what neither SDD alone nor RUP alone could achieve. SDD alone would produce a static spec in one pass and never look back. RUP alone would produce rich artifacts that nobody wants to maintain. Together, they produce living specs that evolve iteratively and are maintained by machines.


The Uncomfortable Question

There's a question I keep getting, and it deserves an honest answer: "Isn't this just over-engineering? Can't an AI agent just... write the code?"

Yes, it can. And for a TODO app, it should. Give an agent a prompt, get code back, ship it. No specs. No phases. No traceability. That's perfectly fine.

But consider what happens when:

Without structured specs, traceability, and phase-gated reviews, the answer to every one of these questions is: "Let me read through 200 files of code and try to reconstruct the intent."

With AIRUP, the answer is: "Let me query the traceability chain."

The overhead of structured specs is real. AIRUP doesn't deny that. It argues that the overhead is worth it when the project's complexity exceeds a threshold — and that AI agents make that overhead cheap enough to be practical. RUP was right about the value of structure. It was just too expensive for humans. SDD makes it cheap for machines. AIRUP puts the two together.

Twenty-Five Years Later

When I started researching the overlap between SDD and RUP, I expected to find superficial similarities — naming conventions, structural parallels, the kind of thing you can squeeze into a conference paper intro. What I found instead was a deep structural isomorphism: SDD and RUP are solving the same problem, decomposing it in the same way, and producing artifacts with the same epistemic function.

The difference is the executor. RUP assumed humans would produce and maintain the artifacts. SDD assumes machines will. AIRUP bridges the gap: a human-designed process, executed by machines, governed by a hybrid human-AI control loop.

Philippe Kruchten, Ivar Jacobson, and the Rational Software team didn't know they were designing a framework for AI agents. They were designing the best process they could for human teams. But in doing so, they accidentally described the optimal artifact structure for multi-agent systems — twenty-five years before those systems existed.

That's not irony. That's good engineering surviving long enough to find its moment.

"The best frameworks aren't the ones designed for a specific technology. They're the ones designed for a specific problem — because problems outlast every technology that tries to solve them."

Next up: Benchmarking AIRUP — how to design experiments that compare AIRUP against unstructured multi-agent pipelines, what metrics matter, and why "number of tokens" is the most misleading benchmark in AI-driven development.

SDD RUP AIRUP Traceability Requirements Engineering Software Architecture AI Agents Convergent Evolution
RC

Ricardo Costa

Software engineer exploring the intersection of classical software processes and AI-driven development. Currently pursuing a master's degree researching AIRUP — an AI-first approach to the Rational Unified Process.