Module 5

REALIZE — From Design to Reality

Building, testing, and proving value quickly

106 min read21,147 words0/1 deliverables checked

Reading progress0%

Module 4 taught you to design the right solution. This module teaches you to ship it before perfection kills it.

The Perfect System That Never Shipped

Nathan Okafor did everything right. As Director of Practice Technology at Cascade Legal Partners, he spent months on assessment, documented the friction in client intake across five disconnected systems, and built an airtight business case: 4.3 hours average intake time, 23% prospect abandonment, $1.8 million in annual losses from delayed billing and lost conversions. The executive committee approved full funding. Implementation began with adequate budget, visible sponsorship from the managing partner, and a September go-live target.

By November, eighteen months after approval, the system existed only in a test environment that three people used. The project had consumed $400,000 and had yet to process a single real client. Here's what happened: the scope expanded from six features to twenty-three. Each addition was individually justified. Automated conflict check requests? Three weeks, clear win. Conflict response tracking? Three weeks, clear win. Billing system integration? Four weeks, clear win. Every expansion addressed a real friction point, was requested by someone with legitimate authority, and pushed the timeline further out while core functionality remained untested. The original six-week implementation plan became an eighteen-month construction project. The team kept building. They stopped learning.

Meanwhile, the people they were building for moved on. Rachel Torres, the senior intake coordinator who had spent hours in design sessions and advocated for the project with skeptical colleagues, stopped checking in around month eight. She had clients waiting. She built her own workarounds: spreadsheets, email folders, a color-coded calendar system that made sense only to her. Inefficient by design standards, but functional. When Nathan finally reached out about pilot testing, Rachel hesitated. "I've built my own workarounds at this point. The new system would have to be significantly better than what I've cobbled together, or the transition cost isn't worth it." The project's strongest champion had become its most reluctant tester. Managing Partner Elena Reyes followed a similar arc. Eighteen months of progress reports with no visible results exhausted her attention. By the time the system was "ready," she had moved on to other priorities. The executive sponsor wasn't lost to conflict. She was lost to time.

The clarity came from Marcus Webb, a third-year associate with no stake in the project's history. "What's the one thing that proves it works?" he asked. The room went quiet. Rachel answered from the back: "Intake time. That's what started this. If the new system cuts that in half, everything else follows." Marcus followed up: "So what's the smallest version that proves intake time goes down? That's the pilot. Everything else is Phase 2." Nathan checked the project plan. Form routing, the feature that addressed the original problem, had been complete for seven months. It had been sitting in test while the team built around it. The lead developer confirmed: two weeks to deploy, maybe less. It was done. They just never turned it on.

Two weeks later, Rachel's team started using form routing. Intake time dropped from 4.3 hours to 2.1 hours. The system that saved Cascade Legal Partners $1.1 million annually started with a two-week deployment that did one thing well. Everything else came later, justified by results.

The Anchor Principle

Organizations fund projects based on projected value. They continue funding based on demonstrated value. The gap between projection and demonstration is where projects die.

Ship the smallest thing that proves value. Then expand. Nathan's eighteen-month journey could have been a six-week sprint if he had understood that progress is measured in value delivered, not capability accumulated. A system in test is a system at rest.

Three Concepts That Matter

The scope creep trap. Each addition to Nathan's project was individually reasonable. The aggregate effect was fatal. Scope expands through a series of small, justified decisions that feel like progress. The conflicts team needed integration. The billing team needed data flow. Each request came from someone with legitimate authority. The discipline is categorization: Phase 1 tests the core assumption; everything else is Phase 2. "Not no, but not yet."

Champion erosion. Champions have a shelf life. Rachel Torres went from enthusiastic advocate to reluctant participant in eight months. Elena Reyes went from executive sponsor to disengaged bystander in eighteen. Delay doesn't just cost time; it costs you the people who would have carried the project forward. Every month without visible results erodes the political capital and personal investment that made the project possible in the first place. You cannot bank enthusiasm. You spend it or you lose it.

Interactive Exercise

Champion Erosion Clock

Months Without Results

Month 0: Full enthusiasmMonth 18: Window closed

Rachel

Practitioner Champion

Engagement100%

"This is exactly what we needed."

Elena

Executive Sponsor

Engagement100%

"Full support. Keep me posted on progress."

The Skeptic

Senior Staff

Engagement30%

"Show me it works. I have seen this before."

"Ready for testing" vs. "ready for production." Nathan's team kept finding things that didn't work yet, so they kept delaying the pilot. They believed the pilot was supposed to test whether the system worked. They had it backwards. The pilot was supposed to reveal what didn't work. That was the point. A pilot that tests a complete system is a soft launch. A soft launch requires a complete system, which they were never going to have. The prototype's job is to learn, not to impress.

The Deliverable

Module 5 produces a Working Prototype with measured before/after results.

This is evidence, not a plan. The business case projected value; the prototype proves it. The blueprint specified the design; the prototype validates it. Before/after measurement using Module 3's baselines converts "we think this will work" into "here's what happened when we tried it."

Speed matters because stakeholder patience is finite, champions erode, and competitors don't wait. A working system that does one thing well creates more organizational energy than a promised system that does many things eventually. The form routing deployment in month eighteen created immediate enthusiasm. That enthusiasm fueled everything that followed. The momentum came from proof, not promises.

Build to learn. Ship to prove. Iterate to improve.

Interactive Exercise

Build vs. Ship

Capability Built0%

Value Delivered0%

Month 0of 18

Project startMonth 18: still building

Interactive Timeline

Scope Creep Timeline

Nathan Okafor’s team at Cascade Legal Partners set out to build a case management system with six features. Eight months later, they had 20 features — and the original problem was still unsolved.

Step through the timeline to watch scope creep happen in real time. Pay attention to the project health gauges — they tell the story the feature list doesn’t.

Module 5A: REALIZE — Theory

R — Reveal

Case Study: The Perfect System That Never Shipped

The implementation at Cascade Legal Partners should have been a success story.

Nathan Okafor had done everything by the book. As Director of Practice Technology, he had spent six months on assessment, observing how attorneys and paralegals actually conducted client intake, documenting the friction in the current process, cataloging the shadow systems that had accumulated over years of inadequate tooling. His Opportunity Portfolio identified the central problem: intake coordination required manual handoffs across five different systems, creating delays that cost the firm an estimated $1.8 million annually in delayed billing and lost client conversions.

The business case was airtight. Nathan had measured baselines with rigor: 4.3 hours average intake time, 23% of prospects abandoning during the process, $340 average administrative cost per new client. His value model projected $1.1 million in annual savings with a 62% reduction in intake time, plus capacity recovery that would allow the intake team to handle 40% more volume without additional headcount.

The workflow design had been exemplary. Nathan's team had mapped the current state in granular detail, identified friction points through practitioner observation, and designed a future state that preserved attorney judgment while automating information flow. The blueprint had been validated with attorneys, paralegals, and intake coordinators who would actually use the system. They had concerns. Everyone has concerns. They also saw the potential.

The executive committee approved full funding in February. Implementation began in March with adequate budget, visible sponsorship from the managing partner, and a target go-live of September.

By November, eighteen months after approval, the system existed only in a test environment that three people used. The September deadline had been pushed to December, then March, then "when it's ready." The project had consumed $400,000, more than the original budget, and had yet to process a single real client.

The intake process still ran on the same five disconnected systems. The shadow workarounds persisted. And Nathan's most enthusiastic early supporters had stopped attending project meetings.

What Went Wrong

The system that Nathan's team built was impressive. It did everything the blueprint specified, and considerably more.

In the months between design approval and September's original target, the scope had expanded in ways that seemed reasonable at each decision point.

The original design called for automated intake form routing. During development, someone realized that if they were routing forms, they could also generate conflict check requests automatically. Adding that feature took three weeks but eliminated a manual step. It seemed like a clear win.

Then the conflicts team asked: if the system was generating conflict requests, could it also track conflict responses and flag overdue checks? Another three weeks. Another clear win.

The billing team noticed the project and requested integration with their time-entry system, so intake data could pre-populate client matter records. Four weeks. Another clear win.

Each addition made sense in isolation. Each addressed a real friction point. Each was justified by someone with legitimate authority to make requests. And each pushed the timeline further out while the core functionality remained untested.

By September, the original six-week implementation plan had expanded to cover twenty-three distinct feature sets. The system could do remarkable things, things the original blueprint never contemplated. What it couldn't do was ship.

The Testing Trap

Nathan had planned for pilot testing. The project timeline included a four-week pilot with a small group of users before full rollout.

But the pilot never happened as designed. Every time the team approached pilot readiness, someone identified another gap.

"We can't pilot without the conflict integration. Attorneys won't trust the system if it doesn't handle conflicts."

"We can't pilot without the billing connection. The intake team will have to double-enter everything."

"We can't pilot without the client portal. That's what prospects will actually see."

Each objection was valid. Each pushed the pilot date further out. And each revealed a fundamental confusion about what the pilot was for.

Nathan's team believed the pilot was supposed to test whether the system worked. They kept finding things that didn't work yet, so they kept delaying the pilot.

What they didn't understand: the pilot was supposed to reveal what didn't work. That was the point. A pilot that tests a complete system is a soft launch. And a soft launch requires a complete system, which they were never going to have.

The team had confused "ready for testing" with "ready for production." They kept waiting for perfection before subjecting the system to reality.

The Patience Problem

Nine months into development, Managing Partner Elena Reyes asked Nathan for a status update.

"We're making excellent progress," he told her. "The system architecture is sophisticated, the integrations are complex, and we're working through the edge cases. We want to make sure we get this right."

Elena nodded. She trusted Nathan. But she also had partners asking why the firm had spent $300,000 on technology that no one was using. She had an intake team wondering if the promised improvements would ever arrive. She had client acquisition metrics that hadn't improved despite the investment.

"When will we see results?" she asked.

"The pilot is targeted for March," Nathan said. "Full rollout by June."

By March, Elena had moved on to other priorities. She had hired a new operations director whose mandate included "getting technology projects under control." The intake improvement budget was frozen pending review. When Nathan scheduled a meeting to discuss pilot launch, Elena's assistant responded that the managing partner was focused on other initiatives but wished the project well.

The executive sponsor hadn't been lost to conflict or opposition. She had been lost to time. Eighteen months of progress reports with no visible results had exhausted her political capital and attention. By the time the system was "ready," the organization had stopped caring.

The Hidden Costs

While Nathan's team built in isolation, the practitioners they were supposed to serve developed their own solutions.

Rachel Torres, the senior intake coordinator, had been one of Nathan's early champions. She had spent hours in design sessions, contributed expertise to the workflow mapping, and advocated for the project with skeptical colleagues. In the early months, she checked in regularly, eager to see progress.

By month eight, Rachel had stopped asking. She had work to do. Clients were waiting. The current system was terrible, but it was the system she had.

When Nathan finally reached out to schedule pilot testing, Rachel hesitated. "I've built my own workarounds at this point," she said. "The new system would have to be significantly better than what I've cobbled together, or the transition cost isn't worth it."

Her workarounds were inefficient by design standards: spreadsheets and email folders and a color-coded calendar system that made sense only to her. But they worked. She had adapted to the friction rather than waiting for the friction to be solved.

Rachel wasn't resisting change. She was surviving. And survival had made her less available to test something that might or might not eventually help.

The champions hadn't turned hostile. They had simply moved on.

The Moment of Clarity

The intervention came from an unlikely source.

Marcus Webb was a third-year associate who had joined the firm after the project began. He had no investment in the system's success or failure, no stake in the decisions that had brought it here. He had simply been assigned to help with testing and noticed something that insiders couldn't see.

"What problem are we testing for?" Marcus asked during a project review meeting.

"What do you mean?" Nathan replied.

"I've been using the test system for a week. It does a lot of things. But what's the one thing that proves it works? If we deployed this tomorrow and I could show you one number that proved value, what would that number be?"

The room was quiet. Nathan realized he didn't have a clear answer. The system did many things. He couldn't point to the one thing that mattered most.

"Intake time," Rachel said from the back of the room. "That's what started this. 4.3 hours average. If the new system cuts that in half, everything else follows. Better conversion, lower cost, happier clients. But we've been so focused on features that we forgot about the original problem."

Marcus nodded. "So what's the smallest version of this system that proves intake time goes down? That's the pilot. Everything else is Phase 2."

Nathan started to object. There were dependencies, integrations, features that users expected. But he stopped himself.

Twenty-three feature sets. Eighteen months. Four hundred thousand dollars. And the original problem, 4.3 hours average intake time, remained unsolved.

"What would that minimal version look like?" he asked.

"Form routing," Rachel said. "That's where the delay starts. If forms move automatically to the right person, intake time drops. The conflict integration is nice. The billing connection is nice. The client portal is nice. But form routing is the problem we set out to solve."

Nathan looked at the project plan. Form routing had been complete for seven months. It had been sitting in test while the team built features around it.

"How long to deploy just the form routing to your team?" he asked.

"Two weeks," said the lead developer. "Maybe less. It's done. We just never turned it on."

The One Visible Win

Nathan made the call that afternoon.

The project would split into two phases. Phase 1 was form routing, just form routing, deployed to Rachel's intake team within two weeks. No conflict integration. No billing connection. No client portal. Just the original problem, solved.

Phase 2 would include everything else. But Phase 2 would wait until Phase 1 proved value.

The pushback was immediate. The conflicts team had been promised integration. The billing team had been promised data flow. Other stakeholders had been waiting eighteen months for features that were now being deferred.

"We've already built it," the billing manager pointed out. "Why not include it?"

"Because including it means not shipping," Nathan said. "And not shipping means we keep running on the old system while the new system sits in test. We've proven we can build complex software. We haven't proven we can improve intake time. That's what has to happen first."

Two weeks later, Rachel's team started using the form routing system.

The results were immediate and measurable. Intake time dropped from 4.3 hours to 2.1 hours. The system was simple, but forms that previously sat in email queues now moved automatically to the right person. The bottleneck had been simple; the solution was simple.

Rachel sent Nathan a message after the first week: "This is what we needed eighteen months ago. More is coming, right?"

More was coming. But now "more" would be added to a working system, not a theoretical one. Each new feature would prove value before the next was added. The team would ship, measure, learn, and iterate.

When Nathan presented the Phase 1 results to Elena Reyes, she had a single question: "Why did this take so long?"

Nathan didn't have a good answer. But he had a better approach now.

"It won't happen again," he said. "From now on, we ship small and prove value before we build big."

The system that saved Cascade Legal Partners $1.1 million annually started with a two-week deployment that did one thing well. Everything else came later, justified by results.

The Lesson

Nathan's team had confused building with progress.

They had spent eighteen months constructing an impressive system that solved many problems, tested few assumptions, and delivered no results. Every decision to add scope, every delay waiting for completeness, every extension of the timeline had felt like progress. The system grew more capable each week.

Value comes from outcomes delivered, not capability accumulated. A system in test is still a system at rest.

The pilot that finally shipped tested one assumption: that automated form routing would reduce intake time. It did. That single validated assumption earned the right to continue. Everything that followed was built on proof, not projection.

The goal is a working system that proves value quickly enough to earn the right to continue. One visible win buys time, builds trust, and creates the foundation for everything that comes next.

Nathan's eighteen-month journey could have been a six-week sprint, if he had understood from the beginning that progress is measured in value delivered.

End of Case Study

Module 5A: REALIZE — Theory

O — Observe

Core Principles of Rapid Implementation

Module 5's anchor principle: One visible win earns the right to continue.

The business case secured approval. The workflow design earned validation. But approval and validation don't create value. Building creates value, and building requires a different mindset than planning.

The Cascade Legal Partners case illustrates the trap: eighteen months of building, zero months of learning. The team confused construction with progress, capability with value, completeness with readiness. They built an impressive system that solved many problems while proving nothing.

Module 5 provides the discipline of implementation: how to move from validated design to working prototype to production deployment, creating value at each step rather than waiting until everything is complete.

The Prototype Mindset

A Prototype Is a Learning Vehicle

A prototype is a tool for testing assumptions, a vehicle for learning whether the design actually works when it meets reality.

This distinction matters because it changes what "good" looks like. A prototype that reveals the design is wrong has succeeded. A prototype that hides problems until production has failed. The goal is to learn something true.

Nathan's team at Cascade built impressive software. They didn't learn whether automated form routing would reduce intake time until month eighteen, when the answer could have been known in month two.

Validated Learning Over Comprehensive Functionality

Every design embeds assumptions: practitioners will use the system this way; the technology will perform at this speed; the workflow will reduce friction at this point. These assumptions can be stated with confidence during design. They can only be validated through building and testing.

The prototype's purpose is to validate the assumptions that matter most. The ones the business case depends on. The ones that will determine success or failure.

For R-01 (Returns Bible), the critical assumption is that automated policy lookup will reduce representative time from 14.2 minutes to under 5 minutes. A prototype that tests this assumption, even a rough one, creates more value than a polished system that tests everything except this.

Speed Beats Completeness

When testing assumptions, speed matters more than completeness. A quick test that reveals a wrong assumption saves months of building on a flawed foundation. A slow test that confirms a right assumption arrives too late to matter.

This is counterintuitive for teams trained in quality: "We should do it right the first time." But "right" in prototype means "fast enough to learn while we still have time to adjust."

The Cascade team spent seven months with working form routing in test. They delayed learning because they wanted to learn everything at once. The result: they learned nothing until it was almost too late.

Permission to Build Something Imperfect

Prototyping requires organizational permission to build imperfect things. Teams trained on production quality standards struggle with this. They know how to build things right; they don't know how to build things fast and iterate toward right.

This permission must be explicit. Without it, teams will default to quality standards that make prototyping impossible. They will add features to avoid shipping something incomplete. They will delay testing to avoid showing something flawed.

"Perfect is the enemy of good" is a cliché. In prototyping, it's a survival rule.

The One Visible Win Principle

Early Value Earns Continuation

Organizations fund projects based on projected value. They continue funding based on demonstrated value. The gap between projection and demonstration is where projects die.

Nathan had executive support in February. By November, that support had evaporated. Time, not conflict, was the cause. Eighteen months of progress reports with no visible results exhausted stakeholder patience. When results finally arrived, the stakeholders had moved on.

A visible win early in implementation changes this dynamic. It converts projection into evidence. It gives stakeholders something to point to when questions arise. It builds momentum that carries the project through inevitable setbacks.

Stakeholder Patience Is Finite

Organizations have limited attention. Executives sponsor many initiatives. Every project competes for mindshare with every other project.

A project that takes months to show results must compete for attention the entire time. It must justify its continued existence against alternatives that might deliver faster. It must survive leadership changes, budget reviews, and shifting priorities, all before proving it deserves survival.

The one visible win shortens the window of vulnerability. It moves the project from "promising but unproven" to "proven and expanding." That transition happens not when the system is complete, but when it delivers measurable value.

Small Success Builds Momentum

A working system that does one thing well creates more organizational energy than a promised system that does many things eventually.

Rachel Torres stopped advocating for the Cascade project around month eight. By the time form routing shipped, she had built her own workarounds and lost interest. The project's strongest champion became a skeptic. Exhaustion, not opposition, drove the shift.

The form routing deployment in month eighteen created immediate enthusiasm. "This is what we needed." That enthusiasm fueled Phase 2 engagement. The momentum came not from promises, but from proof.

What Counts as a Visible Win

A visible win must be:

Measurable: Not "things feel better" but "intake time dropped from 4.3 hours to 2.1 hours"
Attributable: Clearly connected to the new system, not to other changes
Meaningful: Addressing a problem practitioners actually care about
Communicable: Easy to explain to stakeholders who aren't deeply involved

For R-01, a visible win might be: representatives can now answer policy questions in 3 minutes instead of 14 minutes. Measurable, attributable, meaningful, communicable.

Iteration Over Perfection

First Version Will Be Wrong

No design survives contact with reality unchanged. Users will behave differently than expected. Technology will perform differently than specified. Edge cases will emerge that no one anticipated.

This is normal. The first version will need adjustment. The question is how quickly adjustments can be made.

Teams that expect perfection on first release treat every problem as evidence of inadequate planning. They respond to problems by retreating to more planning. Teams that expect iteration treat every problem as information. They respond to problems by adjusting and retesting.

Problems in Prototype Are Learning

The Cascade team found problems during testing and delayed launch. They treated problems as evidence the system wasn't ready.

The correct interpretation: problems discovered in testing are problems discovered cheaply. Problems that emerge in production are problems discovered expensively. The prototype's job is to find problems, as many as possible, as quickly as possible, while they can still be addressed without damaging live operations.

A prototype that runs for weeks without revealing problems isn't well-built. It's under-tested.

Build-Measure-Learn Cycles

Each iteration follows a cycle:

Build: Implement the next increment
Measure: Collect data on what happened
Learn: Interpret data and decide next action

The speed of this cycle determines learning velocity. A team that completes one cycle per month learns twelve things per year. A team that completes one cycle per week learns fifty things per year.

Cascade's team completed something like one-third of a cycle in eighteen months. They built extensively, measured minimally, and learned almost nothing.

The Cost of Being Wrong

Being wrong early is cheap. The form routing assumption could have been tested in week three with a small group of users. If wrong, the team would have learned it with minimal investment. If right, they would have had sixteen months to build on a proven foundation.

Being wrong late is expensive. Cascade spent $400,000 building features around a core assumption that remained untested. If form routing hadn't worked, most of that investment would have been wasted.

The prototype de-risks implementation by being wrong early, often, and cheaply.

Fast Failure as Strategy

Finding What Doesn't Work Is Valuable

Negative results are results. An assumption that proves wrong is an assumption you no longer need to build around. A feature that practitioners reject is a feature you don't need to maintain.

Teams avoid testing because they fear failure. But skipping tests doesn't prevent failure. It just delays discovery.

Fail Fast, Fail Cheap, Fail Forward

Fail fast: Test assumptions as early as possible
Fail cheap: Test with minimal investment
Fail forward: Each failure teaches something that improves the next attempt

The Cascade team eventually failed forward. Their Phase 1 launch taught them how to implement effectively. But they paid for eighteen months of learning-avoidance first.

Creating Conditions for Productive Failure

Productive failure requires:

Psychological safety: People can report problems without blame
Quick feedback loops: Problems surface rapidly, not months later
Iteration capability: The system can be changed based on what's learned
Clear success criteria: Teams know what they're testing for

Without these conditions, teams hide problems rather than surfacing them. Problems that can't be discussed can't be solved.

From Pilot to Production

Pilots That Stay Pilots Forever

A pilot is a test run, a limited deployment to validate assumptions before full rollout. By definition, a pilot has an end date.

But pilots frequently become permanent. "Just a few more tweaks" becomes an indefinite state. The pilot serves a small group forever while the broader organization waits indefinitely.

This happens when teams lack clear graduation criteria. Without defined thresholds, there's always another reason to delay. Another edge case. Another feature request. Another optimization opportunity.

Define Graduation Criteria Before Starting

Before pilot begins, define what success looks like:

What metrics must reach what thresholds?
What practitioner feedback constitutes validation?
What timeline is acceptable?
Who decides when criteria are met?

Without these criteria, the pilot can never end because success is undefined.

The Pilot Is Not the Destination

The pilot exists to earn the right to production. It's a means, not an end.

Teams that forget this optimize for pilot success rather than production readiness. They build solutions that work for ten users but won't scale to one hundred. They provide support levels that can't be sustained at full deployment. They create a permanent pilot that serves a small group while the original problem persists for everyone else.

Building Toward Scale from Day One

Even in prototype, consider scale:

Will this architecture support full deployment?
Can this support model be sustained?
Does this training approach work for everyone, not just early adopters?

The goal is to avoid building in ways that make production impossible.

Summary: The Module 5 Mindset

From	To
Build everything, then test	Test one thing, then build more
Wait until ready	Ship when valuable
Problems indicate failure	Problems indicate learning
Perfect first release	Iterative improvement
Pilot as destination	Pilot as gate to production

The discipline of Module 5 is progress over perfection: earning the right to continue through demonstrated value rather than promised capability.

Nathan's team at Cascade had everything they needed: good assessment, good business case, good design, adequate resources. What they lacked was the discipline to ship small, prove value, and build on success.

One visible win in month two would have justified eighteen months of development. Instead, eighteen months of development struggled to justify itself.

Build to learn. Ship to prove. Iterate to improve. That's the Module 5 mindset.

Module 5A: REALIZE — Theory

O — Observe

Prototype Construction

The blueprint specifies what to build. This section addresses how to build it: the methodology of translating design into working prototype while maintaining the discipline of speed over completeness.

Minimum Viable Prototype

What "Minimum" Means

Minimum is not "as little as possible." It's "the smallest scope that tests the core assumption."

The core assumption is the one the business case depends on. For R-01, the core assumption is that automated policy lookup reduces representative time. A minimum viable prototype tests this assumption. It skips every other assumption, every other feature, every edge case.

To identify minimum scope, ask: "What is the one thing that must prove true for this opportunity to deliver value?" Everything that tests this assumption is in scope. Everything else is out of scope for the first prototype.

This is harder than it sounds. Teams identify many things that seem essential:

"We can't test without X because users expect it."
"We can't deploy without Y because it's part of the workflow."
"We need Z or the data won't be accurate."

Each may be true for production. None is necessarily true for prototype. The prototype's job is to learn, not to impress.

What "Viable" Means

Viable means functional enough to generate real feedback. A prototype that doesn't work isn't viable. A prototype that works but can't be used by real people on real tasks isn't viable.

The threshold is usability, not polish. Can practitioners complete actual work using this prototype? Will the experience generate meaningful feedback about whether the design works?

For R-01, a viable prototype would:

Accept return attributes from representatives
Match attributes to policy rules
Display relevant policy information
Allow representatives to make decisions based on displayed information

It would not need:

Perfect policy matching accuracy (learning will improve this)
Integration with every downstream system
Polished user interface
Complete exception handling

The Discipline of Cutting Scope

Scope cutting requires discipline because every omitted feature has an advocate. The conflicts team wants integration. The billing team wants data flow. The training team wants onboarding support.

These requests are legitimate. They will eventually be addressed. But addressing them now delays learning about the core assumption.

The discipline: "Not no, but not yet." Every feature request gets categorized:

Phase 1 (MVP): Tests core assumption
Phase 2: Enhances validated solution
Future: Valuable but not urgent

This categorization must be visible and respected. Scope creep begins when categories blur.

Features to Include vs. Defer vs. Never Build

Category	Criteria	Example (R-01)
Include	Tests core assumption	Policy lookup and display
Include	Required for testing to function	Basic CRM integration
Defer	Valuable but not required for test	Billing system integration
Defer	Edge case handling	Complex exception workflows
Never	Requested but unnecessary	Individual override tracking

"Never build" requires courage. Some requested features add complexity without value, or they conflict with design principles. Identifying these early prevents scope creep later.

Build vs. Buy vs. Configure

When to Build Custom

Build custom when:

Requirements are unique to your organization
No existing tool addresses the core workflow
Integration requirements make external tools impractical
Long-term ownership and flexibility matter

Building provides maximum control but maximum cost. Custom solutions require development resources, ongoing maintenance, and organizational capability to support.

For R-01: Building custom might mean developing a policy engine specifically for Lakewood Medical Supply's returns policies. This provides exact fit but requires sustained investment.

When to Purchase Existing Tools

Buy when:

Standard solutions address 80%+ of requirements
Time-to-value matters more than perfect fit
Vendor ecosystem provides ongoing innovation
Internal capability to build and maintain is limited

Purchasing provides faster deployment but less flexibility. The organization adapts to the tool rather than the tool adapting to the organization.

For R-01: Purchasing might mean acquiring a customer service knowledge base tool with policy matching capabilities. Faster deployment, but may require workflow adaptation.

When to Configure Existing Platforms

Configure when:

Platforms already in use have relevant capabilities
Configuration provides adequate functionality
Integration is simplified by staying within platform
Total cost of ownership favors leverage over purchase

Configuration provides the fastest path when platforms are capable. Many organizations have tools with untapped features that address current needs.

For R-01: Configuration might mean extending the existing CRM to display policy information through custom fields and automation rules. Fastest path if the CRM platform supports it.

Decision Framework

Factor	Build	Buy	Configure
Time to prototype	Slowest	Medium	Fastest
Fit to requirements	Exact	Approximate	Variable
Ongoing cost	Highest	Medium	Lowest
Flexibility	Highest	Limited	Limited
Internal capability required	Highest	Low	Medium

The right choice depends on context. A team with strong development capability might build. A team with limited resources might configure. Neither is universally correct.

The R-01 Example

R-01 could be implemented through any path:

Option A: Configure existing CRM

Add policy database as custom object
Create automation rules to match return attributes to policies
Display policy information in customer service interface
Timeline: 3-4 weeks to prototype

Option B: Purchase knowledge management tool

Acquire tool designed for policy/knowledge management
Integrate with existing CRM through API
Configure matching rules within new tool
Timeline: 6-8 weeks to prototype

Option C: Build custom integration layer

Develop policy engine with custom matching logic
Build integration layer connecting Order Management, CRM, and policy database
Create custom interface for policy display
Timeline: 10-12 weeks to prototype

For MVP purposes, Option A is likely preferred. It's fastest to prototype and tests the core assumption. If prototype validates the assumption, later phases might evolve toward Option C for greater capability.

Integration Strategy

Connecting to Existing Systems

Prototypes rarely exist in isolation. They must connect to existing systems for data, for workflow, for context.

Integration approach significantly affects timeline and complexity:

API-First Integration

Clean separation between systems
Well-defined interfaces
Changes in one system don't break others
Requires API availability and documentation

Manual Bridge

Human intermediary handles data transfer
Faster to implement for prototype
Doesn't scale to production
Useful for testing assumptions before investing in integration

Data Export/Import

Batch transfer of data between systems
Simpler than real-time integration
May be sufficient for prototype testing
Production may require more sophisticated approach

Handling Integration Constraints

Integration often reveals constraints that aren't visible during design:

APIs that don't exist or don't expose needed data
Security policies that prevent direct connection
Performance limitations that affect user experience
Data format mismatches that require transformation

For prototype, the response to constraints should prioritize speed:

Can we work around this constraint for testing purposes?
Can we simulate the integration to test the workflow?
Can we use manual processes temporarily to validate the design?

The goal is testing the core assumption, not solving every integration challenge.

When Integration Complexity Should Reduce Scope

Sometimes integration complexity exceeds prototype value. A planned integration that would take eight weeks might be better replaced by a manual workaround that takes three days.

The question: "Does this integration test our core assumption, or is it infrastructure for later phases?"

If it's infrastructure for later phases, defer it. The prototype should answer the essential question with minimum investment.

Technology Selection Process

Evaluating Against Blueprint Requirements

The Module 4 blueprint specifies requirements in tool-agnostic terms. Technology selection evaluates available options against these requirements.

Evaluation criteria derived from blueprint:

Does it meet functional requirements?
Does it integrate with specified systems?
Does it meet performance requirements?
Does it respect specified constraints?

Secondary criteria for prototype:

How quickly can we deploy?
How easily can we iterate?
What learning curve does the team face?
What risks does this choice introduce?

Avoiding Vendor-Driven Design

Technology vendors have capabilities they want to demonstrate. Sales processes emphasize what tools can do, not what you need done.

The danger: selecting a tool and then redesigning the workflow to fit the tool's strengths. This inverts the correct sequence (design workflow, then select tool).

Protection: evaluate against blueprint requirements, not vendor demonstrations. Ask "Does this tool do what our blueprint specifies?" not "What can this tool do?"

Proof-of-Concept Before Commitment

Major technology investments should be preceded by proof-of-concept: a limited test that validates the tool can actually deliver what's needed.

The proof-of-concept tests:

Can the tool handle your specific data and workflows?
Does performance meet requirements under realistic conditions?
Can your team configure and operate it effectively?
Do hidden constraints or limitations emerge?

This test should happen before contract signing, not after. Vendors are motivated to support proof-of-concept because it advances the sale. Use this motivation.

The "Good Enough" Threshold

No tool is perfect. Selection requires identifying what matters most and accepting limitations in what matters less.

For prototype, "good enough" means:

Tests the core assumption
Can be deployed within timeline
Supports iteration based on learning
Doesn't introduce risks that could sink the project

Production may require higher standards. Prototype requires faster decisions.

Module 5A: REALIZE — Theory

O — Observe

T — Testing Frameworks

Building the prototype is half the work. Testing it effectively, gathering the data that validates or refutes assumptions, is the other half. This section covers how to test prototypes in ways that generate actionable learning.

T — Testing Human-AI Workflows

Different from Testing Pure Software

Software testing asks: "Does the system function as specified?" Human-AI workflow testing asks: "Does the workflow produce the intended outcomes when humans and systems work together?"

The distinction matters because the system can function perfectly while the workflow fails. The technology may perform as designed, but:

Humans may not use it as intended
The interaction may create friction the design didn't anticipate
Trust may not develop as assumed
Behavior may not change as predicted

Testing human-AI workflows requires observing the entire interaction, not just the system's behavior.

The Human Element

Human behavior in testing includes:

Adoption patterns: Do practitioners use the system when they could?
Usage patterns: Do they use it as designed, or develop workarounds?
Trust signals: Do they rely on system recommendations, or override consistently?
Behavioral change: Does their overall workflow change as intended?

These patterns emerge over time. Single-day testing won't reveal whether practitioners trust a recommendation system. Extended testing reveals whether trust develops, deteriorates, or never forms.

What to Observe Beyond System Function

System metrics tell part of the story. Observation tells the rest.

Watch for:

Moments of hesitation, where practitioners pause before acting
Workarounds, actions taken outside the system to accomplish tasks
Verbal commentary, what practitioners say while working
Help-seeking, when they ask colleagues for guidance
Abandonment, when they leave the system to finish work elsewhere

These observations surface friction that metrics miss.

Combining Quantitative and Qualitative

Neither metrics nor observation alone provides complete understanding.

Metrics reveal what happened: time dropped from X to Y, error rate changed from A to B. They don't explain why, or whether the change will persist, or what problems lurk beneath surface improvement.

Observation reveals context: practitioners hesitate at step 3 because the language is confusing, or they override frequently because system recommendations don't match reality. But observation is limited by sample size and observer bias.

Effective testing combines both:

Quantitative metrics for what changed
Qualitative observation for why and how
Practitioner interviews for perception and experience
Behavioral analysis for patterns over time

Pilot Group Selection

Size: Small Enough to Support, Large Enough to Learn

Pilot groups face a tradeoff:

Too small: Results may not generalize; individual variation dominates
Too large: Support burden overwhelms; feedback is difficult to process

A reasonable pilot size depends on context. For R-01, a pilot of 6-10 representatives might be appropriate: enough to see patterns, small enough to provide intensive support and gather detailed feedback.

The right size allows:

Direct relationship with each pilot participant
Rapid response to issues that emerge
Detailed feedback collection
Reasonable statistical validity for key metrics

Composition: Mix of Enthusiasts and Skeptics

Pilots populated only by enthusiasts will succeed; pilots populated only by skeptics will fail. Neither result is informative.

Effective pilot composition includes:

Early adopters who will explore and provide feedback willingly
Mainstream users who represent typical behavior
Skeptics who will stress-test the system and surface weaknesses

The mix creates realistic conditions. Early adopters show what's possible. Skeptics reveal what's broken. Mainstream users indicate whether the design works for normal people doing normal work.

Duration: Long Enough to See Patterns

Short pilots reveal whether the system functions. Extended pilots reveal whether it works.

The difference: functioning is about technology; working is about workflow. A system might function correctly while the workflow remains inefficient because practitioners haven't adapted, trust hasn't developed, or edge cases haven't emerged.

Minimum pilot duration should allow:

Initial learning curve to pass (often 1-2 weeks)
Representative volume of work (enough transactions to measure)
Pattern stabilization (behavior settles into routine)
Edge case emergence (unusual situations surface)

For R-01, a reasonable pilot duration might be 4-6 weeks. Enough time for representatives to move past novelty, develop routine usage patterns, and encounter various return scenarios.

Geographic and Functional Considerations

If the production deployment will span locations or functions, the pilot should include variation:

Different locations may have different work patterns
Different shifts may have different volumes
Different practitioners may have different experience levels

A pilot that succeeds in one context and fails in another provides valuable information, but only if both contexts are tested.

Measurement Against Baseline

Using Module 3 Baselines

Module 3 established baseline metrics through rigorous measurement. Module 5 testing uses the same metrics for comparison.

For R-01, baseline metrics included:

Average time for Bible-dependent returns: 14.2 minutes
Incorrect policy application rate: 4.3%
Supervisor escalation rate: 12%
Patricia-specific queries: 15+/day

Pilot measurement must use the same definitions, same methodology, and same rigor. If the baseline measured task time from return initiation to resolution, pilot measurement must use the same boundaries.

Same Methodology, Same Rigor

Methodological consistency enables comparison. If baseline measurement used time-motion observation of 50 transactions, pilot measurement should use comparable sampling.

Inconsistent methodology makes comparison unreliable. A pilot that measures differently than baseline will produce results that can't be interpreted. Was the change real, or an artifact of measurement?

Before/After Measurement Design

The simplest comparison: measure the pilot group before prototype deployment and after. The difference indicates change.

This approach has limitations:

Other factors may have changed between measurements
The "before" measurement may already reflect Hawthorne effects (behavior change from being observed)
Individual variation may dominate small samples

More rigorous designs use control groups or time-series analysis, but these require larger samples and longer durations. For most prototypes, before/after measurement of the pilot group provides adequate evidence.

Controlling for Variables

Factors other than the prototype can affect results:

Volume changes: Busy periods differ from slow periods
Seasonal effects: Some work varies by time of year
Learning effects: Performance improves as practitioners gain experience
Staff changes: Different people may perform differently

Controlling for these variables is challenging in real-world pilots. At minimum:

Note any unusual conditions during pilot
Compare similar time periods (e.g., same day of week)
Consider whether observed changes could have other explanations
Be conservative in attributing results to the prototype

The Three Lenses in Testing

Module 3's three ROI lenses (Time, Throughput, and Focus) provide structure for testing.

Time: Is It Actually Faster?

The Time lens measures whether the prototype reduces time spent on work.

For R-01:

Baseline: 14.2 minutes average for Bible-dependent returns
Target: <5 minutes
Measurement: Time-motion observation of pilot transactions

Results might show:

Average time reduced to 4.8 minutes (target met)
Standard deviation remains high (some transactions still take long)
Time improvement varies by case complexity

Throughput: Is Quality/Volume Actually Improved?

The Throughput lens measures whether the prototype improves work quality or capacity.

For R-01:

Baseline: 4.3% incorrect policy application
Target: <2%
Measurement: QA audit of pilot decisions

Results might show:

Error rate dropped to 1.8% (target met)
Most errors now occur in specific case types
Practitioners feel more confident in decisions

Focus: Is Cognitive Load Actually Reduced?

The Focus lens measures whether the prototype reduces cognitive burden and risk.

For R-01:

Baseline: 12% supervisor escalation rate, 15+/day Patricia queries
Target: <5% escalation, <3/day Patricia queries
Measurement: System tracking of escalations, observation of Patricia queries

Results might show:

Escalation rate dropped to 7% (partial improvement)
Patricia queries dropped to 4/day (partial improvement)
Representatives report feeling more self-sufficient

Each Lens May Show Different Results

A prototype might improve Time while Throughput worsens, or improve Focus while Time increases. Different lenses can reveal different stories.

For R-01, a possible mixed result:

Time improved significantly (wins on speed)
Throughput improved moderately (better accuracy)
Focus improved partially (still some escalation)

Mixed results require interpretation. Is the improvement enough? Which areas need iteration? Does the overall pattern justify production deployment?

Module 5A: REALIZE — Theory

O — Observe

Iteration Methodology

Testing generates data. Iteration converts that data into improvement. This section covers how to interpret feedback, decide what to do next, and maintain progress through the learning cycle.

The Build-Measure-Learn Cycle

Build: Implement the Next Increment

Building in iteration differs from building initially. The initial build implements the prototype scope. Iteration builds implement specific changes responding to specific findings.

An iteration build should:

Address one finding at a time (avoid combining changes)
Have clear scope (what's being changed and why)
Be timeboxed (hours or days, not weeks)
Be testable (the change can be observed and measured)

For R-01, an iteration build might be: "Policy matching accuracy was 78%; adding product category as a matching factor should improve accuracy." That's a specific change, testable, with clear rationale.

Measure: Collect Data on What Happened

After implementing a change, measure its effect. Did the change produce the intended improvement? Did it create unintended consequences?

Measurement in iteration should be:

Focused: Measure the specific thing that was changed
Quick: Get results in days, not weeks
Comparative: Compare to pre-change baseline

For the R-01 example: After adding product category matching, measure policy matching accuracy. Did it improve from 78%? Did it affect anything else negatively?

Learn: Interpret Data and Decide Next Action

Learning converts measurement into decision:

If the change worked, incorporate it and move to the next issue
If the change didn't work, understand why and try a different approach
If the change revealed new issues, add them to the iteration backlog

Learning requires intellectual honesty. A change that was supposed to help but didn't is useful information, if acknowledged. Teams that explain away negative results don't learn from them.

Cycle Speed Matters

The learning rate is proportional to cycle speed. Faster cycles mean more learning in less time.

Consider two teams:

Team A completes one build-measure-learn cycle per month
Team B completes one cycle per week

In three months, Team A has completed 3 cycles. Team B has completed 12 cycles. Team B has four times the learning, which translates to better outcomes.

Cycle speed depends on:

Build complexity (simpler changes build faster)
Measurement latency (quick metrics enable quick cycles)
Decision process (clear authority enables quick decisions)
Technical capability (fast deployment enables fast testing)

Reading Prototype Feedback

What Metrics Tell You

Metrics provide objective measurement of specific outcomes. They tell you what changed, by how much, with what variation.

For R-01, metrics might show:

Average policy lookup time: 3.2 minutes (down from 14.2)
Policy matching accuracy: 83% (users confirm 83% of recommendations)
Error rate: 2.1% (down from 4.3%)
Escalation rate: 8% (down from 12%)

These numbers indicate progress toward goals. They don't explain why progress occurred or didn't occur.

What Practitioner Behavior Tells You

Behavior reveals what metrics can't capture:

Are practitioners using the system enthusiastically, reluctantly, or minimally?
Where do they hesitate or struggle?
What workarounds have they developed?
How has their overall work pattern changed?

Behavioral observation adds context to metrics. A time improvement might be driven by the system working well, or by practitioners giving up on difficult cases and processing only easy ones. Metrics alone can't distinguish these scenarios.

What Silence Tells You

Absence of feedback is data. When practitioners stop commenting on the system, it may mean:

The system works so well they don't notice it (good)
They've stopped using it (bad)
They've adapted in ways that avoid friction (needs investigation)

Silence requires investigation. Don't assume silence means satisfaction.

Distinguishing Signal from Noise

Not all feedback matters equally:

Single-user complaints may reflect individual preference, not design flaw
Rare edge cases may not justify design changes
Early confusion may resolve with experience

Signal indicators:

Multiple practitioners report similar issues
Issues persist over time
Issues affect core workflow, not peripheral features
Practitioners develop consistent workarounds

Noise indicators:

Isolated complaints from single users
Issues that fade as practitioners gain experience
Preference differences that don't affect outcomes
Requests for features that weren't part of scope

The Iteration Decision Framework

Continue: Results Positive, Expand Scope

When to continue:

Core assumptions validated by data
Metrics meet or exceed targets
Practitioners are satisfied and effective
No major issues remain unresolved

Continue means "proceed to next phase," which might be broader pilot, additional features, or production deployment.

Adjust: Results Mixed, Modify and Retest

When to adjust:

Some metrics meet targets, others don't
Practitioners report fixable friction
Issues are implementation problems, not design problems
The core approach is working, with specific gaps

Adjustment should be targeted. Identify specific issues, implement specific fixes, test specific improvements. Avoid broad redesign in response to specific problems.

Pivot: Core Assumption Wrong, Redesign Approach

When to pivot:

Core assumption disproved by testing
Practitioners fundamentally reject the workflow
Issues trace to design principles, not implementation details
Fixing individual problems won't address root cause

Pivot is serious. It means the design was wrong, not merely incomplete. Pivot should return to Module 4 principles rather than tweaking the prototype.

Pivot is also rare. Most pilots reveal adjustment needs, not fundamental design failures. If assessment (Module 2), calculation (Module 3), and design (Module 4) were done well, pivot is unlikely.

Stop: Opportunity Isn't Viable

When to stop:

Core assumption disproved and alternative approaches unlikely to succeed
Value proposition no longer holds after accounting for reality
Organizational conditions have changed, making the opportunity obsolete
Continued investment isn't justified by potential return

Stop is painful but sometimes correct. The discipline of Module 5 is learning what works, which includes learning when something should be abandoned.

Stop should be documented: What was learned? Why did this fail? What would need to be true for a future attempt to succeed?

Scope Management During Iteration

Resisting "While We're Fixing That, Let's Also..."

Iteration is vulnerable to scope creep. Each fix creates temptation to add more:

"While we're updating the policy matching, let's also add..."
"Since we're touching that code, we should..."
"Users are asking for X anyway, might as well..."

These additions derail iteration focus. They turn targeted fixes into expanded scope. They slow cycle speed and blur measurement.

The discipline: each iteration has one focus. Additional requests go to the backlog, not into the current cycle.

Each Iteration Should Have One Focus

Single-focus iteration enables:

Clear measurement (did this specific change help?)
Fast cycles (one change builds faster than many)
Meaningful learning (attribution is clear)
Manageable complexity (fewer things can go wrong)

When iteration scope expands, benefits erode. Multiple simultaneous changes make it impossible to know which change caused which effect.

Deferring Good Ideas That Aren't Urgent

Good ideas arrive constantly during iteration. Some come from practitioners, some from stakeholders, some from the team. Many are genuinely valuable.

The backlog captures these ideas for later evaluation. Deferral is prioritization, not rejection.

Questions for backlog triage:

Does this address a current iteration goal?
Is this urgent (blocking progress) or important (valuable when ready)?
Can this wait for a future phase without significant cost?

Most good ideas can wait. The ones that can't should dominate current iteration focus.

The Discipline of Incremental Improvement

Progress happens through many small improvements, not one large transformation.

Each iteration:

Addresses one issue
Produces measurable improvement
Creates foundation for next iteration

Accumulated iterations produce substantial progress. A team that makes ten small improvements over five weeks may achieve more than a team that attempts one large improvement over the same period.

Module 5A: REALIZE — Theory

O — Observe

From Pilot to Production

The pilot validated the prototype. Metrics improved. Practitioners provided positive feedback. Iteration addressed the rough edges. The system works.

Now what?

The transition from pilot to production is where many projects stall. The pilot becomes permanent, serving a small group forever while the broader organization waits indefinitely. Or the deployment happens without adequate preparation, and production reveals problems the pilot never surfaced.

This section covers how to graduate from validated pilot to successful production deployment.

Defining Pilot Success

Quantitative Thresholds

Before pilot begins, success criteria should be defined. These criteria provide objective targets:

For R-01:

Time per Bible-dependent return: <5 minutes (baseline: 14.2 minutes)
Incorrect policy application: <2% (baseline: 4.3%)
Supervisor escalation rate: <5% (baseline: 12%)
System usage rate: >80% (pilot group)
Practitioner satisfaction: >4.0/5

Success means meeting these thresholds consistently, repeatedly over the pilot duration.

Qualitative Indicators

Numbers alone don't define success. Qualitative factors matter:

Do practitioners prefer the new workflow to the old?
Has behavior genuinely changed, or is compliance superficial?
Are workarounds emerging that indicate unresolved friction?
Would practitioners advocate for the system to their colleagues?

A pilot that meets quantitative targets while practitioners quietly hate the system is a ticking time bomb that will fail at scale.

Comparison to Module 3 Projections

Module 3's ROI model made projections about expected value. Pilot results should be compared to those projections:

For R-01:

Projected time savings: 9.2 minutes/return
Actual time savings: 10.1 minutes/return (exceeded projection)
Projected error reduction: 2.3 percentage points
Actual error reduction: 2.2 percentage points (met projection)
Projected escalation reduction: 7 percentage points
Actual escalation reduction: 4 percentage points (partially met)

This comparison validates the business case. Results that exceed projection strengthen the case for production. Results that fall short require explanation and possibly revised projections.

What "Good Enough" Looks Like

Perfection isn't the standard. "Good enough" means:

Core value proposition demonstrated
Critical success metrics met
Remaining issues are minor, rare, or have clear remediation paths
Production deployment won't create significant new problems
The organization will be better off with the system than without it

Waiting for perfection means waiting forever. At some point, the system is ready. Defining that point in advance prevents endless refinement.

The Pilot Trap

Pilots That Never End

A pilot should have a defined end date. When pilots continue indefinitely, several dynamics are typically at play:

Fear of Scale: "It works for 10 users, but what about 100?" Concerns about scale prevent commitment to deployment.

Perfectionism: "Just a few more tweaks" becomes permanent state. Each improvement reveals another opportunity.

Ownership Ambiguity: No one has authority to declare the pilot successful and proceed.

Risk Aversion: Production deployment feels risky. Pilot feels safe. Safety wins.

Lost Momentum: Original urgency faded. No one is pushing for completion.

"Just a Few More Tweaks" as Avoidance

There's always something else to improve. The policy matching could be 2% more accurate. The interface could be slightly smoother. The documentation could be more complete.

These improvements are genuine. They're also endless. If the standard is "nothing left to improve," deployment never happens.

The discipline: Is the system better than what it replaces? If yes, deploy it. Continue improving after deployment, not instead of deployment.

Loss of Urgency After Initial Success

Early pilots generate excitement. The first positive results create energy. Champions celebrate progress.

As pilots extend, urgency fades. Initial excitement becomes routine. Champions move to other priorities. Stakeholders who were eager become indifferent.

By the time deployment is "ready," no one cares anymore. The project that could have been a success story becomes a footnote.

How Pilots Become Permanent Exceptions

Some organizations have multiple permanent pilots, systems that serve small groups indefinitely because deployment never happened.

These pilots create problems:

Resource drain: Small groups get support that broader deployment would amortize
Inequity: Some practitioners have better tools than others for no good reason
Technical debt: Pilots built for small scale accumulate workarounds as they persist
Organizational confusion: Which system is official? Which is temporary?

A pilot is a test, not a destination. If it passes the test, deploy it. If it fails, kill it. Either way, it shouldn't persist.

Scaling Considerations

What Worked for 10 May Not Work for 100

Pilot conditions differ from production conditions:

Support intensity: Pilot users get intensive support. Production users get standard support.

User selection: Pilot users are often early adopters. Production includes skeptics and reluctant users.

Volume: Pilot handles limited transactions. Production handles full volume.

Edge cases: Pilot encounters some variation. Production encounters all variation.

Scaling requires anticipating these differences. What assumptions held in pilot may not hold in production?

Infrastructure for Production

Technical infrastructure that supported pilot may need enhancement:

Performance: Will the system handle peak loads?
Reliability: What happens when components fail?
Recovery: How quickly can the system be restored after problems?
Monitoring: How will ongoing performance be tracked?

These requirements exist during pilot but become critical at scale. A pilot can tolerate occasional problems; production cannot.

Training and Support at Scale

Pilot training was intensive and personal. Production training must be scalable:

Can new users be onboarded without one-on-one attention?
Do training materials exist that work without facilitators?
Is support infrastructure ready for volume?
Are escalation paths defined?

Change Management for Broader Rollout

Pilot users volunteered or were selected. Production users will have the system imposed on them. This changes the dynamic.

Change management for production:

Communication: Why is this happening? What's in it for practitioners?
Timeline: When will changes affect each group?
Support: Where can practitioners get help?
Feedback: How can practitioners report problems?

Practitioners who feel informed and supported adopt more readily than practitioners who feel surprised and abandoned.

Production Readiness

Technical Checklist

Before production deployment, verify:

Category	Item	Status
Stability	No critical bugs in last 2 weeks	☐
Performance	Response time meets requirements under load	☐
Security	Security review completed, vulnerabilities addressed	☐
Backup	Data backup and recovery tested	☐
Monitoring	Performance and error monitoring in place	☐
Integration	All integrations functioning reliably	☐

Operational Checklist

Category	Item	Status
Support	Help desk trained on new system	☐
Documentation	User guides and troubleshooting docs available	☐
Escalation	Technical escalation path defined	☐
Maintenance	Maintenance schedule and procedures documented	☐
Ownership	System owner assigned	☐

Organizational Checklist

Category	Item	Status
Training	Training materials ready for all user groups	☐
Communication	Deployment communication plan executed	☐
Leadership	Executive sponsor confirmed and engaged	☐
Feedback	Feedback collection mechanism in place	☐
Success metrics	Ongoing measurement plan defined	☐

Documentation for Handoff

Production deployment transfers responsibility from project team to operations. Documentation enables this handoff:

System documentation: What it does, how it works, how to maintain it
Operational procedures: Daily, weekly, monthly tasks
Troubleshooting guides: Common problems and solutions
Contact information: Who to escalate to for what issues

Documentation created during development is often insufficient for operations. Handoff documentation should be created with operational users in mind.

The Deployment Decision

Who Decides

The deployment decision should have clear ownership. Typically:

Project sponsor approves based on results
Technical lead certifies readiness
Operations lead confirms support capability
Business owner validates expected value

If approval authority is ambiguous, deployment stalls in committee.

Building the Case for Deployment

The deployment recommendation summarizes:

Pilot results vs. success criteria
Comparison to Module 3 projections
Remaining risks and mitigations
Recommended deployment approach
Timeline and resource requirements

This is a decision document, not a status report. It should enable a decision, not defer one.

Handling Stakeholder Concerns

Stakeholders may have concerns about deployment:

"What if it breaks?" Show reliability data from pilot. Document rollback procedures.

"Are practitioners ready?" Show adoption data and feedback. Describe training plan.

"What about the edge cases we haven't tested?" Acknowledge remaining uncertainty. Show how edge cases will be monitored and addressed.

"Is the timing right?" Discuss organizational readiness. Note that delay has costs too.

Concerns should be addressed directly, not dismissed. Unaddressed concerns become deployment blockers.

Timing and Sequencing

Deployment timing matters:

Avoid major business cycles (end of quarter, holidays)
Consider training logistics (when can users be trained?)
Account for support availability (who handles problems?)
Coordinate with other initiatives (avoid change saturation)

Sequencing options:

Big bang: Everyone at once. Faster but higher risk.
Phased: Groups deploy sequentially. Slower but lower risk.
Parallel: New and old systems run simultaneously. Safe but expensive.

The right approach depends on organizational tolerance for risk and operational complexity.

Module 5B: REALIZE — Practice

R — Reveal

Introduction

Module 5A established the principles of rapid implementation. This practice module provides the methodology: how to move from validated blueprint to working prototype to production deployment, creating value at each step.

Why This Module Exists

The gap between design and deployment is where organizations lose momentum.

Module 4 produced a validated Workflow Blueprint: a specification of how work should flow, what technology should do, and how humans and AI should collaborate. That blueprint represents significant investment: assessment, calculation, design, validation.

But a blueprint is a plan, not a result. The plan must become reality. Module 5 provides the discipline to make that happen without falling into the traps that stalled Cascade Legal Partners for eighteen months.

The deliverable: A Working Prototype with measured before/after results, evidence that the design works, ready for production deployment.

Learning Objectives

By completing Module 5B, you will be able to:

Scope a minimum viable prototype that tests core assumptions without building everything at once
Select an implementation approach (build, buy, or configure) based on requirements and constraints
Construct or configure the prototype within timeline discipline, avoiding scope creep
Design and execute pilot testing with appropriate group composition, duration, and measurement
Measure results against Module 3 baselines using consistent methodology across all three ROI lenses
Iterate based on evidence using the build-measure-learn cycle to address issues systematically
Prepare for production deployment with appropriate readiness verification and handoff documentation

The Practitioner's Challenge

Three tensions define implementation:

Speed vs. Completeness

The faster you ship, the sooner you learn. But incomplete systems frustrate users and generate invalid feedback. Finding the minimum that enables meaningful testing requires discipline.

Quality vs. Iteration

Production quality standards evolved for good reason. But applying them to prototypes delays learning. Building for iteration means accepting imperfection now to enable improvement later.

Confidence vs. Evidence

The design feels right. Stakeholders are enthusiastic. Practitioners validated the blueprint. But confidence isn't evidence. Only testing reveals whether the design actually works. The temptation to declare victory early, before data confirms success, must be resisted.

Field Note

A technology director at a regional retailer described the moment his team's implementation approach changed:

"We had been building for four months. The system was sophisticated. It did everything we'd designed and more. But we hadn't tested anything with actual users. Every time we got close to pilot, someone found another gap. 'We can't test without X.' 'Y needs to be finished first.' Always reasonable, always delaying.

"Then a competitor launched something similar. Less sophisticated than what we were building, honestly pretty basic. But they were in market, learning from real customers, iterating based on real feedback. We were still planning our pilot.

"That's when we realized: their bad version that shipped beat our good version that didn't. They were learning while we were building. We stripped back to essentials and deployed in three weeks. It wasn't pretty, but it worked. And we learned more in those three weeks than we had in four months of building.

"Now we have a rule: if you can't describe what you'll learn from shipping, you're not ready to ship. But if you can describe what you'll learn, you're already late."

What You're Receiving

Module 5 receives the following from prior modules:

From Module 4: Validated Workflow Blueprint

The blueprint specifies:

Current-state workflow with documented friction
Future-state design with human-AI collaboration
Technology requirements (tool-agnostic)
Adoption design elements
Success metrics aligned with ROI model

For R-01, the blueprint documents:

Current state: 8 steps, 14-28 minutes, high friction at policy search and interpretation
Future state: 5-6 steps, 9-14 minutes, Preparation pattern with Policy Engine
Integration requirements: Order Management, CRM, Policy Engine
Success targets: <5 min time, <2% error, <5% escalation, >80% adoption

From Module 3: Baseline Metrics

The ROI model established baselines:

Time per Bible-dependent return: 14.2 minutes
Incorrect policy application: 4.3%
Supervisor escalation rate: 12%
Patricia-specific queries: 15+/day

These baselines become the comparison point for pilot measurement.

From Module 3: Success Criteria

The business case defined success:

Annual value: $97,516
Implementation cost: $35,000
Payback period: 4.2 months
ROI: 736%

Pilot results must validate (or invalidate) these projections.

Module Structure

Module 5B proceeds through six stages:

1. Prototype Scoping

Translating the complete blueprint into minimum viable scope. What must be tested first? What can wait?

2. Implementation Approach

Selecting build, buy, or configure. Evaluating options against R-01 requirements. Documenting the decision.

3. Testing and Measurement

Designing the pilot. Selecting participants. Defining measurement methodology. Executing the test.

4. Iteration Cycles

Interpreting results. Deciding next actions. Implementing improvements. Retesting.

5. Production Preparation

Verifying readiness. Building the deployment case. Preparing handoff documentation.

6. Transition to Module 6

Connecting proven prototype to sustainability planning. What carries forward.

The R-01 Implementation

Throughout Module 5B, we continue the R-01 example from previous modules:

Module 2 identified R-01 (Returns Bible Not in System) as a high-priority opportunity
Module 3 quantified the value: $97,516 annual savings
Module 4 designed the solution: Preparation pattern with automated policy lookup

Module 5 builds it:

Scoping the minimum prototype that tests policy lookup improvement
Selecting implementation approach (configure CRM vs. build custom)
Testing with representative pilot group
Measuring against the 14.2-minute baseline
Iterating based on what testing reveals
Preparing for deployment to all customer service representatives

By the end of Module 5, R-01 will be a working system with demonstrated results. A reality, no longer a design document.

Module 5B: REALIZE — Practice

O — Observe

Prototype Scoping Methodology

The blueprint specifies the complete solution. The prototype tests the core assumptions. This section covers how to translate comprehensive design into focused prototype scope.

From Blueprint to Prototype Scope

The Blueprint Specifies the Complete Future State

Module 4's blueprint documents everything needed for full implementation:

All workflow steps and decision points
All human-AI collaboration specifications
All integration requirements
All adoption design elements

This completeness is necessary for production. It is often counterproductive for initial prototype.

The Prototype Tests Core Assumptions

Every design embeds assumptions:

The technology can do what we specified
Practitioners will use it as designed
The workflow will reduce friction as projected
Integration will work reliably

Some assumptions are more critical than others. The business case depends on certain assumptions being true. If they're wrong, everything else is irrelevant.

The prototype tests these critical assumptions first. Non-critical assumptions can wait.

Identifying Essential First-Test Components

To identify what must be in the prototype, ask:

What assumption does the business case depend on most?
If this assumption is wrong, does the opportunity still exist?
What's the smallest thing we can build that tests this assumption?

For R-01, the critical assumption is: automated policy lookup will reduce representative time from 14.2 minutes to under 5 minutes.

Everything that tests this assumption is essential. Everything that doesn't is deferrable.

The MVP Question

"What Is the Smallest Thing We Can Build That Tests Our Core Assumption?"

This question forces ruthless prioritization. Not "what would be nice to have." Not "what stakeholders expect." Not "what the blueprint specifies." Just: what tests the core assumption?

For R-01, the answer might be:

Policy Engine that matches return attributes to policies
Display of matched policy in representative's CRM view
Ability for representative to act on the displayed information

That's it. Not billing integration. Not documentation automation. Not exception handling workflow. Just: can automated policy lookup reduce the time representatives spend finding policies?

Distinguishing "Nice to Have" from "Must Have for Testing"

Feature	Must Have (MVP)	Nice to Have	Rationale
Policy matching engine	✓		Tests core assumption
Policy display in CRM	✓		Tests core assumption
Override mechanism	✓		Required for fair test
Similar case display		✓	Valuable but not essential for time test
Automatic documentation		✓	Efficiency gain, not core test
Billing integration		✓	Downstream value, not core test
Exception routing workflow		✓	Handles 15% of cases, not typical flow
Manager dashboard		✓	Observer feature, not practitioner test

The must-haves test whether automated policy lookup works. The nice-to-haves make it better but aren't needed to answer the essential question.

Scope Categories

MoSCoW Prioritization for Prototype

Category	Definition	R-01 Example
Must Have	Required to test core value proposition; prototype fails without it	Policy matching, CRM display, override capability
Should Have	Improves test validity but not essential; include if time permits	Similar case references, confidence indicators
Could Have	Valuable but can wait; include in later iterations	Exception handling workflow, training mode
Won't Have (this version)	Explicitly deferred; not part of prototype scope	Billing integration, manager reporting, mobile access

The discipline is in Won't Have. Every stakeholder has features they consider essential. MVP discipline requires explicit deferral with clear rationale.

Scope Documentation

Document scope decisions formally:

R-01 PROTOTYPE SCOPE DOCUMENT

MVP Scope (Must Have):
1. Policy Engine integration
   - Receive return attributes from CRM
   - Match to applicable policy rules
   - Return policy summary and confidence level

2. CRM Display
   - Show policy information in existing representative view
   - No navigation to separate application
   - Display appears when return details entered

3. Override Mechanism
   - One-click "doesn't apply" option
   - No explanation required
   - Action logged for learning

Deferred to Phase 2 (Should/Could Have):
- Similar case display
- Exception handling workflow
- Automatic documentation
- Confidence threshold alerts

Out of Scope (Won't Have):
- Billing system integration
- Manager dashboard and reporting
- Mobile access
- Multi-language support

Rationale: MVP tests whether automated policy lookup reduces time. All deferred features are valuable but not required to validate core assumption.

The R-01 Prototype Scope

Full Scope from Blueprint (Review)

Module 4's blueprint specified:

Future-State Workflow:

Gather return info → Policy Engine identifies policies
Policy review → System surfaces summary and similar cases
Exception handling → System flags unusual cases
Customer communication → Policy summary available
Return processing → Decision logged automatically
Documentation → Derived from workflow

Technology Requirements:

Policy Engine integration
CRM integration (read/write)
Similar case matching
Automatic documentation
Performance: <2 second response

Adoption Design:

Optional acknowledgment for experienced reps
One-click override
Training integration

MVP Scope for First Prototype

For initial prototype, scope reduces to:

Blueprint Element	MVP Status	Rationale
Policy Engine integration	Include	Core assumption
CRM display	Include	Core assumption
Override mechanism	Include	Fair test requires escape
Similar case matching	Defer	Valuable but not core test
Automatic documentation	Defer	Efficiency, not core value
Exception workflow	Defer	15% of cases, test typical first
Performance (<2 sec)	Include	Poor performance invalidates test
Training mode	Defer	Not needed for pilot with support

What's Deferred and Why

Similar case matching: Helps representatives make decisions but isn't required to test whether automated policy lookup reduces time. If core assumption validates, add this in Phase 2.

Automatic documentation: Saves time at the end of the workflow but doesn't affect the policy lookup test. The time savings from documentation automation can be measured separately.

Exception workflow: Handles the 15% of cases that are unusual. Testing the 85% typical flow first provides cleaner signal. Exception handling adds complexity that obscures core learning.

Manager reporting: Observer feature, not practitioner feature. Violates the Module 4 principle of designing for practitioners first.

Timeline Implications

Scope	Estimated Timeline	Risk Level
Full blueprint	10-12 weeks	Higher (more complexity)
MVP + Phase 2 features	6-8 weeks	Medium
MVP only	3-4 weeks	Lower (focused scope)

MVP timeline enables testing the core assumption in one month rather than three. If the assumption validates, additional features follow. If it doesn't, less has been wasted.

Scope Documentation

Feature List with Categorization

Create a formal scope document for stakeholder alignment:

Feature	Category	Acceptance Criteria	Dependencies
Policy matching	Must Have	Matches return attributes to policy with >75% accuracy	Policy database loaded
CRM display	Must Have	Policy appears within 2 seconds of return entry	CRM API access
Override button	Must Have	Single click dismisses recommendation	None
Confidence indicator	Should Have	Shows high/medium/low based on match quality	Policy matching complete
Similar cases	Could Have	Shows 2-3 prior cases with similar attributes	Case history database
Auto-documentation	Won't Have (v1)	Records decision without manual entry	Defer to Phase 2

Acceptance Criteria for "Done"

Define what "done" means for MVP:

Policy matching: Successfully matches 50+ test cases with >75% accuracy
CRM display: Policy information renders within 2 seconds, consistently
Override: Button functions, action is logged
Integration: No errors in 100 consecutive transactions
User test: 3 representatives can complete workflow without assistance

These criteria define when the prototype is ready for pilot. Testable, not perfect.

Dependencies and Prerequisites

Dependency	Owner	Status	Risk
Policy database content	Patricia (SME)	In progress	Medium - requires knowledge extraction
CRM API access	IT department	Approved	Low
Test environment	Development team	Available	Low
Pilot group availability	Operations manager	Confirmed	Low

Dependencies that aren't resolved block prototype progress. Identify them early.

Risks of Scope Decisions

Risk	Likelihood	Impact	Mitigation
MVP too limited to generate valid feedback	Medium	High	Include override and confidence to ensure usability
Deferred features create stakeholder frustration	Medium	Medium	Clear communication about Phase 2 timeline
Policy matching accuracy insufficient	Medium	High	Plan calibration iteration before pilot
Integration more complex than estimated	Low	High	Start integration work immediately

Common Scoping Mistakes

Including Everything from Blueprint

The blueprint specifies production requirements. Including all of them in prototype creates the Cascade problem: building everything, testing nothing.

Correction: Ruthlessly apply the MVP question. What tests the core assumption? Everything else waits.

Underestimating Integration Complexity

Integration between systems always takes longer than expected. APIs don't work as documented. Data formats don't match. Security requirements add steps.

Correction: Start integration work early. Test integration independently before building features that depend on it. Reduce scope rather than extend timeline when integration proves difficult.

Forgetting Training and Support Needs

A prototype that practitioners can't use generates no useful feedback. Pilot users need orientation, support access, and feedback channels.

Correction: Include pilot support in scope. Enough for pilot participants to use the system effectively, though less than production-grade training.

Scope Creep During Build

"While we're building policy matching, we should also add..." Each addition seems reasonable. Accumulated additions delay testing indefinitely.

Correction: Formal scope change process. Any addition to MVP scope requires explicit approval with impact assessment. Good ideas go to Phase 2 backlog, not current sprint.

Scope Sign-Off

Before proceeding to implementation, confirm scope with stakeholders:

Scope Agreement Checklist

MVP scope is documented and understood
Deferred features are explicitly listed with rationale
Stakeholders with deferred features have acknowledged timing
Acceptance criteria are defined for MVP features
Dependencies are identified with owners and status
Timeline is realistic for MVP scope
Scope change process is agreed

This agreement prevents mid-build disputes about what was promised. When someone asks "Aren't you including X?", the documented scope provides the answer.

Module 5B: REALIZE — Practice

O — Operate

Step 1: Select Implementation Approach

The prototype scope is defined. Now: how to build it? This section covers the build vs. buy vs. configure decision and applies it to R-01.

The Build vs. Buy vs. Configure Decision

Framework Review

Module 5A introduced three implementation paths:

Approach	When to Use	Tradeoffs
Build	Requirements are unique; no existing tool fits; long-term flexibility matters	Maximum control, maximum cost, longest timeline
Buy	Standard solutions address most requirements; time-to-value is priority	Faster deployment, less flexibility, ongoing license cost
Configure	Existing platforms have relevant capabilities; integration is simplified	Fastest path, limited by platform capabilities

The right choice depends on:

Requirements specificity (how unique are your needs?)
Timeline pressure (how fast must you test?)
Internal capability (can you build and maintain?)
Budget constraints (what's affordable?)
Long-term ownership (who maintains this over years?)

Applying to R-01

R-01's requirements from the blueprint:

Functional:

Accept return attributes
Match to policy rules
Return policy summary with confidence
Display in CRM interface
Capture override actions

Technical:

<2 second response time
Integration with existing CRM and Order Management
Support for 50+ concurrent users

Constraints:

No changes to Order Management data structures
No additional login for representatives
No mandatory data entry beyond current workflow

Each path has distinct tradeoffs.

R-01 Implementation Options Analysis

Option A: Configure Existing CRM

What would need to happen:

Create custom policy database within CRM
Build automation rules to match return attributes to policies
Create custom UI component for policy display
Configure logging for override actions

Pros:

Fastest timeline (3-4 weeks to prototype)
No new system to integrate
Representatives stay in familiar interface
Lower cost (internal effort, no new licenses)
IT team has CRM configuration expertise

Cons:

Policy matching logic limited by CRM capabilities
Scaling may hit platform limits
Some features may require workarounds
Dependent on CRM vendor roadmap

Timeline estimate: 3-4 weeks to MVP Resource estimate: 1 CRM administrator, 0.5 developer Cost estimate: $8,000-12,000 (internal labor)

Option B: Purchase Returns Management Tool

What would need to happen:

Evaluate and select vendor
Negotiate contract and licensing
Configure tool for Lakewood policies
Build integration with existing CRM
Train administrators on new platform

Pros:

Purpose-built for returns/policy management
Vendor handles updates and improvements
May include features beyond current scope
Potentially better policy matching capabilities

Cons:

Longer timeline (vendor selection, contract, configuration)
Integration complexity (new system to connect)
Ongoing license costs
Vendor dependency for customization
Representatives may need to switch between systems

Timeline estimate: 8-12 weeks to MVP Resource estimate: 0.5 developer for integration, vendor support Cost estimate: $15,000-25,000 (licenses) + $10,000-15,000 (integration)

Option C: Build Custom Integration Layer

What would need to happen:

Design policy matching engine architecture
Develop custom matching algorithms
Build integration layer for CRM and Order Management
Create custom UI components
Implement logging and analytics

Pros:

Exact fit to requirements
Maximum flexibility for future enhancement
Full ownership and control
No vendor dependencies

Cons:

Longest timeline
Highest cost
Requires ongoing development resources
Risk of scope creep during custom development
Technical debt accumulation

Timeline estimate: 10-14 weeks to MVP Resource estimate: 2 developers, 1 architect Cost estimate: $35,000-50,000 (development)

R-01 Recommended Approach

Selected Option: Configure Existing CRM (Option A)

Rationale:

Timeline alignment: MVP in 3-4 weeks tests core assumption quickly. Longer paths delay learning without proportional benefit for prototype phase.
Risk reduction: CRM configuration is reversible. If prototype fails, minimal investment lost. Custom build or vendor commitment creates sunk costs.
Capability match: CRM's automation capabilities can handle policy matching at prototype scale. Production may require enhancement, but prototype doesn't need production capacity.
Integration simplicity: No new system means no new integration. Representatives stay in familiar interface, reducing adoption friction.
Team capability: IT team has CRM expertise. No new skills required for prototype.

What This Means for Prototype Construction:

Week 1: Policy database design and initial data entry
Week 2: Automation rules for policy matching
Week 3: UI component development and integration testing
Week 4: Pilot preparation and initial testing

Production Considerations:

CRM configuration may be insufficient for full production. If prototype validates the core assumption, production options include:

Enhanced CRM configuration with additional optimization
Migration to purchased tool (now justified by proven value)
Custom development (now scoped by real requirements)

The prototype decision doesn't lock in the production decision. Learning from prototype informs better production choice.

Vendor/Platform Evaluation

When Purchasing (Option B), Evaluate Against Blueprint

If Option B were selected, evaluation would follow this process:

Step 1: Create evaluation criteria from blueprint

Criterion	Weight	Source
Policy matching accuracy	High	Blueprint functional requirement
CRM integration capability	High	Blueprint integration requirement
Response time <2 seconds	Medium	Blueprint performance requirement
Override logging	Medium	Blueprint collaboration specification
Reporting capabilities	Low	Nice-to-have, not MVP
Mobile access	Low	Not in current scope

Step 2: Evaluate vendors against criteria

Vendor	Matching	Integration	Performance	Override	Score
Vendor A	4/5	3/5	5/5	4/5	3.9
Vendor B	5/5	4/5	4/5	3/5	4.1
Vendor C	3/5	5/5	4/5	5/5	4.1

Step 3: Proof-of-concept with top candidates

Before contract signing:

Test with actual policy data
Verify integration with actual CRM
Measure actual response times
Confirm customization capabilities

Proof-of-Concept Requirements

Test	Success Criteria	Duration
Policy matching	>75% accuracy on 50 test cases	3 days
Integration	Successful round-trip data flow	2 days
Performance	<2 second response under load	1 day
Customization	Override logging configurable	1 day

Proof-of-concept should cost nothing or minimal; vendors are motivated to support it.

Resource Requirements

For Option A (Selected): CRM Configuration

Resource	Allocation	Weeks	Total
CRM Administrator	100%	3	120 hours
Developer (integration)	50%	2	40 hours
Business Analyst	25%	4	40 hours
Patricia (SME)	10%	4	16 hours
Project Lead	25%	4	40 hours

Total effort: ~250 hours Total cost: ~$12,000 (assuming blended rate of $50/hour)

Timeline with Milestones

Week	Milestone	Deliverables
1	Policy database ready	Data structure defined, initial policies loaded
2	Matching logic complete	Automation rules configured and tested
3	UI integration complete	Policy display functional in CRM
4	Pilot ready	Testing complete, pilot group briefed

Budget Alignment

Module 3 approved $35,000 for R-01 implementation. Option A prototype consumes ~$12,000, leaving $23,000 for:

Pilot support and iteration
Production enhancement
Contingency

This allocation provides runway for learning and adjustment.

Risk Assessment

Technical Risks

Risk	Likelihood	Impact	Mitigation
CRM automation insufficient for policy complexity	Medium	High	Test complex policies early; have Option B ready
Performance degrades under load	Low	Medium	Monitor during pilot; optimize before scale
Integration breaks with CRM updates	Low	Medium	Test in sandbox after updates; maintain documentation

Timeline Risks

Risk	Likelihood	Impact	Mitigation
Policy data extraction takes longer than expected	Medium	Medium	Start immediately; Patricia availability confirmed
Testing reveals unexpected issues	Medium	Low	Built buffer into Week 4; iteration expected
Stakeholder adds scope mid-build	Medium	Medium	Scope agreement signed; change process defined

Adoption Risks

Risk	Likelihood	Impact	Mitigation
Representatives resist new workflow	Low	High	Design validated in Module 4; pilot includes skeptics
Policy matching accuracy too low	Medium	High	Calibration sprint before pilot; override available
Training insufficient for pilot	Low	Medium	Intensive support during pilot; feedback loops

Module 5B: REALIZE — Practice

O — Operate

Step 2: Testing and Measurement

The prototype is built. Before pilot launch, design the test: who participates, how long it runs, what gets measured, and how data is collected.

Pilot Design

Pilot Group Selection for R-01

The pilot group must be large enough to generate meaningful data but small enough to support intensively.

Recommended pilot size: 8 representatives

Selection criteria:

Mix of tenure levels (2 new, 4 experienced, 2 veteran)
Mix of attitudes (include 1-2 skeptics identified during Module 4 validation)
Representatives who handle returns regularly (minimum 5 Bible-dependent returns per day)
Geographic/shift distribution if applicable

R-01 Pilot Group:

Representative	Tenure	Attitude	Returns Volume	Notes
Maria T.	8 years	Champion	High	Module 4 validation participant
DeShawn W.	2 years	Supportive	High	Eager to try new tools
Jennifer R.	4 months	Neutral	Medium	New perspective, learning Bible
Alex P.	5 years	Skeptic	High	Questioned design in validation
Keisha M.	12 years	Neutral	High	Veteran knowledge, Patricia's backup
Carlos S.	1 year	Supportive	Medium	Quick learner
Patricia L.	22 years	Supportive	High	Bible expert, essential validator
Ryan K.	3 years	Skeptic	Medium	Raised concerns about accuracy

This mix ensures:

Champions who will explore and advocate
Skeptics who will stress-test and surface problems
New users who reveal whether the system is intuitive
Veterans who reveal whether it handles complex cases

Duration and Timeline

Pilot duration: 4 weeks

Week	Phase	Focus
1	Learning	Representatives orient to system; support intensive
2	Stabilization	Usage patterns establish; early issues addressed
3	Measurement	Primary data collection; behavior stabilizes
4	Validation	Confirm patterns; prepare iteration decisions

Four weeks allows:

Learning curve effects to pass
Sufficient transaction volume for statistical validity
Pattern observation over multiple workweeks
Time for edge cases to emerge

Success Criteria

Quantitative thresholds (from Module 3/4):

Metric	Baseline	Target	Measurement Method
Time per Bible-dependent return	14.2 min	<5 min	Time-motion observation
Policy matching accuracy	N/A	>80% confirmed	Override rate tracking
Incorrect policy application	4.3%	<2%	QA audit sample
Supervisor escalation rate	12%	<5%	System logging
System usage rate	N/A	>80%	System logging
Representative satisfaction	3.2/5	>4.0/5	Survey

Qualitative indicators:

Representatives prefer new workflow to old
Workarounds are minimal or absent
Patricia queries decrease significantly
Pilot participants would recommend to colleagues

Control Considerations

Before/after design: Measure each representative's performance before pilot (during Week 0 baseline) and during pilot weeks 3-4.

Controlling for variables:

Compare similar time periods (avoid end-of-month, holidays)
Note any unusual volume or complexity during pilot
Track whether returns mix was typical
Document any system issues or outages

Measurement Plan

Metrics Aligned with Module 3 Baseline

Module 3 established baselines. Module 5 measures against them using identical methodology.

Metric	Module 3 Method	Module 5 Method	Comparability Check
Task time	Time-motion observation, n=50	Time-motion observation, n=50+	Same observer training, same definition of start/end
Error rate	QA audit of 100 decisions	QA audit of 100 decisions	Same auditor, same criteria
Escalation	Manual count from logs	System tracking	Verify definition matches

Collection methodology:

Time Metrics:

Observer records start time when representative opens return case
Observer records end time when representative completes policy-based decision
Exclude customer communication and processing time (consistent with baseline)
Sample minimum 50 transactions across pilot period
Distribute samples across all pilot representatives

Quality Metrics:

QA team audits random sample of 100 return decisions
Audit criteria: Was correct policy applied? Was decision appropriate for case?
Auditor blind to whether decision made with or without system
Compare pilot error rate to baseline error rate

Behavioral Metrics:

System logs every interaction: policy displayed, override clicked, time on screen
Calculate usage rate: returns processed with system / total Bible-dependent returns
Track override rate: overrides / total recommendations
Note patterns: Who uses most? Who overrides most? What cases get overridden?

Collection Schedule:

Week	Data Collected	Responsible
0 (pre-pilot)	Baseline confirmation measurements	Business Analyst
1	Usage logging, support issues, early feedback	Project Lead
2	Continued logging, first observation session	Business Analyst
3	Primary time-motion observation, QA audit begins	Business Analyst + QA
4	Complete observation, complete audit, surveys	Full team

The R-01 Measurement Framework

Time Metrics

Target: Policy lookup time < 5 minutes (vs. 14.2 minute baseline)

Measurement:

Time from return case open to policy decision made
Excludes customer communication and return processing
Measured via observation (primary) and system timestamps (secondary)

Collection:

50+ observed transactions during weeks 3-4
Stratified by representative and case complexity
Standard deviation calculated to understand variation

Throughput Metrics

Target: Error rate < 2% (vs. 4.3% baseline)

Measurement:

QA audit of 100 return decisions during pilot
Same auditor, same criteria as baseline audit
Error = incorrect policy applied or inappropriate decision

Collection:

Random sample from all pilot representatives
Include simple and complex cases
Audit within 48 hours of decision for context availability

Focus Metrics

Target: Escalation rate < 5% (vs. 12% baseline)

Measurement:

Percentage of returns requiring supervisor involvement
Supervisor involvement = case transferred or supervisor consulted

Collection:

System logging of escalation events
Verify with supervisor records
Track by representative and case type

Adoption Metrics

Target: System usage > 80%

Measurement:

Returns processed using system / total Bible-dependent returns
Usage = policy display occurred and representative took action

Collection:

System logging (automatic)
Verify representatives aren't bypassing system
Note reasons for non-use if identified

Satisfaction Metrics

Target: Satisfaction > 4.0/5

Measurement:

Survey administered at end of week 4
5-point scale on: ease of use, accuracy, speed, preference vs. old process
Open-ended feedback questions

Collection:

All pilot representatives complete survey
Anonymous for honest feedback
Administered by neutral party, not project team

Qualitative Data Collection

Observation Protocol

During time-motion observation, note:

Points of hesitation (where do representatives pause?)
Verbal reactions (comments, sighs, frustration, satisfaction)
Workarounds (actions outside the system)
Questions to colleagues (seeking help or confirmation)
Override patterns (when and why they override)

Use structured observation form:

OBSERVATION RECORD

Observer: ________________  Date: ________  Time: ________
Representative: ____________  Tenure: ________

Transaction #: ________
Return Type: ________________
Case Complexity: Simple / Medium / Complex

Start Time: ________  End Time: ________  Total: ________

System Used: Yes / No
If No, reason: ________________________________

Policy Recommended: ________________________________
Action Taken: Accepted / Overridden / N/A
If Overridden, reason observed: ________________________________

Friction Points Observed:
________________________________
________________________________

Representative Comments (verbatim):
________________________________
________________________________

Observer Notes:
________________________________
________________________________

Interview Questions

Conduct 15-minute interviews with each pilot representative at end of week 2 and week 4.

Week 2 (early feedback):

"How would you describe your experience with the new system so far?"
"What's working well?"
"What's frustrating or confusing?"
"Have you found situations where the system doesn't help?"
"What would make it more useful?"

Week 4 (final feedback):

"How has your experience changed since we last spoke?"
"Would you want to continue using this system? Why or why not?"
"How does this compare to the old way of doing things?"
"What advice would you give a colleague starting to use this?"
"What should we change before rolling out to everyone?"

Capturing Workarounds

Workarounds indicate unmet needs. Track them systematically:

Workaround Observed	Who Used It	Frequency	What Need It Addresses
[Description]	[Reps]	[Often/Sometimes/Once]	[Underlying need]

Multiple representatives using the same workaround signals design gap.

Weekly Feedback Sessions

Hold 30-minute group session at end of each pilot week:

What went well this week?
What problems did you encounter?
What questions do you have?
What should we focus on improving?

Document themes, not individual complaints. Look for patterns.

Analysis Framework

Comparing Pilot Results to Baseline

Create comparison table:

Metric	Baseline	Pilot Result	Change	Target Met?
Task time	14.2 min	[result]	[%]	Yes/No
Error rate	4.3%	[result]	[pp]	Yes/No
Escalation rate	12%	[result]	[pp]	Yes/No
System usage	N/A	[result]	N/A	Yes/No
Satisfaction	3.2/5	[result]	[points]	Yes/No

Statistical Considerations

Sample size: 50+ observations provides reasonable confidence for time metrics. Smaller samples increase uncertainty.

Significance: For prototype testing, practical significance matters more than statistical significance. A 50% time reduction is meaningful even without p-values.

Variation: Report mean and standard deviation. High variation may indicate inconsistent experience.

Interpreting Mixed Results

Results rarely show universal improvement. Interpretation requires judgment:

Scenario: Time improved, but error rate increased

Possible cause: Representatives moving too fast, skipping verification
Response: Adjust workflow to include confirmation step

Scenario: Metrics improved, but satisfaction low

Possible cause: System works but feels burdensome
Response: Investigate friction points through interviews

Scenario: Most metrics improved, but one segment struggled

Possible cause: Complex cases not well-handled
Response: Analyze which cases fail, enhance for those

Documenting Findings

Create structured pilot report:

R-01 PILOT RESULTS REPORT

Executive Summary:
[2-3 sentences on overall outcome]

Quantitative Results:
[Table comparing baseline to pilot]

Qualitative Findings:
[Key themes from observation and interviews]

What Worked:
[List with evidence]

What Needs Improvement:
[List with specific issues]

Recommended Next Steps:
[Continue/Adjust/Pivot/Stop with rationale]

Appendices:
- Raw data
- Observation records
- Interview transcripts
- Survey results

Module 5B: REALIZE — Practice

O — Operate

Step 3: Iteration Cycles

The pilot generated data. Representatives used the system. Metrics were collected. Now: what does the data mean, and what should happen next?

Interpreting R-01 Pilot Results

Review the Results

From the pilot measurement (file 04):

Metric	Baseline	Target	Pilot Result	Assessment
Task time	14.2 min	<5 min	4.3 min	✓ Target met
Error rate	4.3%	<2%	2.1%	~ Close to target
Escalation rate	12%	<5%	7%	× Below target
System usage	N/A	>80%	87%	✓ Target met
Satisfaction	3.2/5	>4.0/5	4.2/5	✓ Target met

What's Working Well

Time reduction exceeded target: 4.3 minutes vs. 5 minute target
Representatives adopted the system: 87% usage rate
Satisfaction improved: 4.2/5 vs. 3.2/5 baseline
Patricia queries dropped dramatically (15+/day to 3/day)
New representative Jennifer R. became productive quickly

What Needs Adjustment

Escalation rate (7%) still above target (5%)
- Root cause: Complex cases where policy matching uncertain
- Specific issue: Multi-condition returns where multiple policies apply
Error rate (2.1%) slightly above target (2%)
- Root cause: Specific policy categories with calibration issues
- Specific issue: Warranty vs. satisfaction guarantees confused

Data vs. Practitioner Feedback

What data tells us: Time improved dramatically. Accuracy improved moderately. Escalations reduced but not enough.

What practitioners tell us: "The system is right most of the time, but when it's wrong, I don't know how to tell." Representatives trust the system for simple cases but want more confidence information for complex ones.

The gap: Representatives aren't confident in the system's uncertainty. When matching confidence is low, they escalate rather than risk error. Confidence indicator (Should Have feature) would address this.

The Iteration Decision

Applying the Framework

Option	Criteria	R-01 Assessment
Continue	Results positive, expand scope	4 of 5 metrics met or nearly met; not ready to expand yet
Adjust	Results mixed, modify and retest	Core value proven; specific gaps identified; clear fix path
Pivot	Core assumption wrong	Core assumption validated (time reduction works)
Stop	Opportunity not viable	Value demonstrated; stopping would waste proven progress

R-01 Decision: ADJUST

Rationale:

The core assumption, that automated policy lookup reduces representative time, is validated. Time improved from 14.2 minutes to 4.3 minutes. This is the foundation of the business case.

However, two metrics need improvement:

Escalation rate needs 2 percentage points reduction
Error rate needs 0.1 percentage point reduction

Both issues have identified root causes with clear remediation paths. Iteration will address them without rebuilding the core system.

Planning the Iteration

Iteration 1 Scope

Specific changes to make:

Add confidence indicator
- Display High/Medium/Low confidence for each policy match
- Logic: High = single policy match, clear criteria; Medium = single match, some criteria ambiguous; Low = multiple policies apply, criteria unclear
- Implementation: New UI element in CRM display; logic extension in matching engine
Calibrate problem categories
- Warranty vs. satisfaction guarantee: Add product category weighting
- Multi-condition returns: Display all applicable policies rather than best match
- Implementation: Policy database update; matching logic refinement
Revise escalation guidance
- Add "Review recommended" flag for Low confidence matches
- Change escalation prompt from "Transfer to supervisor" to "Consider policy X before escalating"
- Implementation: CRM display modification

What stays the same:

Core matching engine architecture
CRM integration approach
Override mechanism
Logging and tracking

Timeline: One Week

Day	Activity
1-2	Implement confidence indicator
3	Calibrate problem categories
4	Implement escalation guidance changes
5	Internal testing and pilot preparation

Success Criteria for Iteration:

Escalation rate: <5% (target)
Error rate: <2% (target)
Representative confidence: "I can tell when to trust it"
No regression in metrics already meeting target

The R-01 First Iteration

Changes Implemented

Confidence Indicator:

Before: Policy display showed recommendation only

POLICY MATCH: 30-day return - full refund, original payment method
[Apply] [Override]

After: Policy display shows confidence level

POLICY MATCH: 30-day return - full refund, original payment method
Confidence: HIGH
[Apply] [Override]

-- or --

POLICY MATCH: Extended warranty claim OR satisfaction guarantee
Confidence: LOW - Multiple policies may apply
Review both policies before deciding
[View All] [Apply First] [Override]

Calibration Changes:

The warranty vs. satisfaction guarantee confusion stemmed from overlapping product categories. Calibration added:

Product purchase date weighting (warranties apply to newer products)
Customer history flag (satisfaction guarantees for repeat customers)
Price threshold (high-value items get more careful matching)

Escalation Guidance:

Low confidence matches now display: "This case may need additional review. Before escalating, check if [specific policy element] applies."

This gives representatives a path to resolution without defaulting to escalation.

Pilot Impact (Week 2 of Iteration)

Metric	Pilot 1	Iteration Target	Iteration Result
Escalation rate	7%	<5%	4.8%
Error rate	2.1%	<2%	1.7%
System usage	87%	>80%	91%
Satisfaction	4.2/5	>4.0/5	4.4/5

Second Pilot Cycle

Abbreviated Second Cycle

With iteration 1 successful, a brief validation cycle confirmed:

All five metrics now meet targets
Representatives report improved confidence ("I know when to check")
Escalations that still occur are appropriate (genuinely complex cases)
Alex P. (identified skeptic) now advocates for the system

Results Trajectory

Metric	Baseline	Pilot 1	Iteration 1	Trend
Task time	14.2 min	4.3 min	4.1 min	Stable
Error rate	4.3%	2.1%	1.7%	Improving
Escalation	12%	7%	4.8%	Improving
Usage	N/A	87%	91%	Improving
Satisfaction	3.2/5	4.2/5	4.4/5	Improving

The learning loop is working. Each cycle produces measurable improvement.

Knowing When to Stop Iterating

Graduation Criteria Review

Criterion	Status
All quantitative targets met	✓
Qualitative indicators positive	✓
Practitioners would recommend to colleagues	✓
Critical issues resolved	✓
Remaining issues are minor/rare	✓

Diminishing Returns Signal

Further iteration might improve metrics marginally:

Error rate could go from 1.7% to 1.5%
Satisfaction could go from 4.4 to 4.5

But these gains require disproportionate effort. The core value is proven. Additional refinement can happen after production deployment.

"Good Enough" Determination

R-01 is good enough for production because:

Core business case validated (time reduction: 70%)
All success metrics achieved
Practitioner adoption strong (91%)
Remaining friction is edge-case, not systemic
Iteration log shows diminishing issues per cycle

Preparing for Production

The system is ready to move beyond pilot. This means:

Broader rollout to all customer service representatives
Scaling support and monitoring
Transitioning from project to operations

Module 5B: REALIZE — Practice

O — Operate

Step 4: Production Preparation

The pilot succeeded. Iteration addressed the gaps. Metrics meet targets. Practitioners support the system. The question now: is R-01 ready for production?

Production Readiness Assessment

Technical Readiness Checklist

Item	Requirement	R-01 Status
Stability	No critical bugs in last 2 weeks	✓ Zero critical issues in iteration cycle
Performance	Response time <2 seconds under expected load	✓ Averaging 1.4 seconds
Security	Security review completed, vulnerabilities addressed	✓ CRM security applies; no new vulnerabilities introduced
Backup	Data backup and recovery tested	✓ Policy database backed up nightly with CRM
Monitoring	Performance and error monitoring in place	✓ CRM monitoring extended to new components
Integration	All integrations functioning reliably	✓ Order Management and CRM integration stable
Scalability	Can handle full user population	⚠ Testing with 50 concurrent users passed; production may have 80+

Technical assessment: Ready with monitoring. Scalability is manageable risk. CRM handles current transaction volume; new components add minimal load.

Operational Readiness Checklist

Item	Requirement	R-01 Status
Help desk	Support staff trained on new system	✓ Help desk completed training; handled pilot issues
Documentation	User guides and troubleshooting docs available	✓ Quick reference guide and FAQ created
Escalation	Technical escalation path defined	✓ IT support → CRM administrator → Development
Maintenance	Maintenance schedule and procedures documented	✓ Weekly policy sync, monthly calibration review
Ownership	System owner assigned	✓ Customer Service Manager owns; IT supports

Operational assessment: Ready. Pilot provided operational learning; documentation tested with real issues.

Organizational Readiness Checklist

Item	Requirement	R-01 Status
Training	Training materials ready for all user groups	✓ 15-minute self-paced module created
Communication	Deployment communication plan executed	⚠ Plan drafted; execution begins next week
Leadership	Executive sponsor confirmed and engaged	✓ Director of Customer Service committed
Feedback	Feedback collection mechanism in place	✓ Feedback button in CRM; weekly review process
Success metrics	Ongoing measurement plan defined	✓ Dashboard created; monthly reporting schedule

Organizational assessment: Ready pending communication execution.

The Deployment Case

Summarizing Pilot Results

R-01 pilot demonstrated:

Metric	Baseline	Target	Final Result	Change
Task time	14.2 min	<5 min	4.1 min	-71%
Error rate	4.3%	<2%	1.7%	-2.6pp
Escalation rate	12%	<5%	4.8%	-7.2pp
System usage	N/A	>80%	91%	N/A
Satisfaction	3.2/5	>4.0/5	4.4/5	+1.2

All targets achieved. All trends improving.

Comparison to Module 3 Projections

Projection	Module 3 Estimate	Pilot Actual	Variance
Time savings	9.2 min/return	10.1 min/return	+10% (better)
Annual labor savings	$76,176	Est. $83,793*	+10%
Error reduction value	$15,480	Est. $17,028*	+10%
Focus improvement value	$8,260	Est. $9,086*	+10%
Total annual value	$97,516	Est. $109,907	+10%

Extrapolated from pilot; production results will confirm.

The business case is validated and exceeded.

Addressing Stakeholder Concerns

Concern: "What if it breaks?"

Response: CRM configuration means existing CRM reliability applies. Rollback procedure documented. Help desk trained. Monitoring in place.

Concern: "Are representatives ready?"

Response: 91% adoption in pilot. Training materials tested. Champions identified among pilot group to support peers.

Concern: "What about cases the pilot didn't cover?"

Response: Pilot included mix of case types and representative tenure. Edge cases will emerge; override and escalation paths handle them. Calibration process allows ongoing improvement.

Concern: "Can we really save this much time?"

Response: Pilot measured same way as baseline. Time reduction verified by observation and system data. Conservative extrapolation used.

Recommendation: Proceed with Production Deployment

Evidence supports deployment. Continued delay risks:

Pilot group creating two-tier service quality
Losing momentum and stakeholder attention
Patricia remaining as single point of failure

R-01 Production Deployment Plan

Rollout Sequence: Phased

Rather than full deployment to all 22 representatives simultaneously, roll out in two waves:

Wave	Representatives	Timeline	Rationale
1	10 representatives (including 8 pilot)	Week 1-2	Leverage pilot experience; champions support new users
2	Remaining 12 representatives	Week 3-4	Learn from Wave 1; full deployment

Why phased: Phased rollout limits risk and provides learning opportunity. Pilot representatives can support peers. Issues surface at smaller scale.

Timeline and Milestones

Week	Milestone	Activities
Week 1	Wave 1 preparation	Communication; training scheduling; system verification
Week 2	Wave 1 live	10 representatives using system; intensive support
Week 3	Wave 2 preparation	Wave 1 lessons incorporated; remaining training completed
Week 4	Wave 2 live	All 22 representatives using system; standard support
Week 5+	Stabilization	Monitoring; calibration adjustments; feedback review

Training and Communication Plan

Communication sequence:

Leadership announcement (Director): rationale and commitment
Department meeting: demonstration and Q&A
Individual scheduling: training slot assignment
Go-live notification: system availability confirmation

Training approach:

15-minute self-paced module (mandatory)
30-minute live Q&A session (optional but recommended)
Quick reference card at each workstation
Champion buddy assignment (pilot participant paired with new user)

Support and Monitoring Plan

First 30 days (intensive support):

Help desk priority queue for system issues
Daily check-in from project team
Weekly calibration review
Real-time usage monitoring

Ongoing support:

Standard help desk procedures
Monthly calibration review
Quarterly performance review
Annual system assessment

Handoff Documentation

What Operations Needs to Run the System

Document	Contents	Audience
System Overview	Architecture, integrations, data flows	IT support
Maintenance Procedures	Weekly sync, monthly calibration, backup verification	CRM administrator
Performance Thresholds	Response time targets, error rate thresholds	Monitoring team
Escalation Matrix	Who to contact for what type of issue	All support staff

What Support Needs to Troubleshoot

Document	Contents	Audience
Troubleshooting Guide	Common issues and resolutions	Help desk
Known Limitations	Cases the system handles poorly	Help desk, supervisors
Override Protocol	When and how to override recommendations	Representatives
Feedback Process	How to report issues and suggestions	All users

What Training Needs to Onboard

Document	Contents	Audience
User Guide	How to use the system day-to-day	Representatives
Quick Reference Card	Key actions on one page	Representatives
Training Module	Self-paced onboarding content	New representatives
FAQ	Common questions with answers	All users

What Leadership Needs to Track Success

Document	Contents	Audience
Success Dashboard	Key metrics, trends, alerts	Customer Service leadership
Monthly Report Template	Standardized performance summary	Department leadership
Business Case Validation	Actual vs. projected value	Executive sponsor
Sustainability Plan	Long-term ownership and monitoring	Operations leadership

Success Metrics for Production

Metrics Continuing from Pilot

Metric	Target	Collection Method	Frequency
Task time	<5 min	System timestamps + observation	Monthly sample
Error rate	<2%	QA audit	Monthly
Escalation rate	<5%	System logging	Weekly
System usage	>80%	System logging	Weekly
Satisfaction	>4.0/5	Survey	Quarterly

Additional Metrics for Scale

Metric	Target	Collection Method	Frequency
System availability	>99.5%	System monitoring	Continuous
Help desk volume	<5 tickets/week	Ticket tracking	Weekly
Training completion	100%	Training system	Until complete
Override rate	Trend monitoring	System logging	Weekly

Reporting Schedule

Report	Audience	Frequency
Operational dashboard	Operations team	Real-time
Performance summary	Customer Service Manager	Weekly
Executive summary	Director, Sponsor	Monthly
Business case validation	Leadership team	Quarterly

Escalation Triggers

Condition	Action
System availability <99%	Immediate IT escalation
Error rate >3% for 2 consecutive weeks	Calibration review
Satisfaction drops below 3.5/5	User feedback review
Help desk volume >10 tickets/week	Root cause analysis

Connection to Module 6

Production Deployment Is Not the End

Deployment delivers the capability. Sustainability preserves it.

Without intentional sustainability design:

Staff turnover erodes expertise
System updates break integrations
Calibration drifts as business changes
Monitoring attention fades
Value erodes gradually

Module 6 addresses these risks.

Handoff Artifacts for Module 6

R-01 delivers to Module 6:

Baseline metrics (from Module 3)
Pilot results and iteration log
Production deployment results (after stabilization)
Known risks and monitoring requirements
Ownership assignments and escalation paths

These artifacts become inputs for sustainability planning.

Module 5B: REALIZE — Practice

Transition to Module 6: NURTURE

What Module 5 Accomplished

Module 5 converted design into reality. The Workflow Blueprint from Module 4 became a working system with measured results.

The Journey:

Built working prototype from blueprint
- Scoped minimum viable prototype focusing on core assumption
- Selected implementation approach (Configure CRM)
- Constructed prototype within timeline discipline
Tested with practitioners in real conditions
- Designed pilot with representative group composition
- Measured against Module 3 baselines
- Collected quantitative and qualitative data
Measured results against baseline
- Time improvement: 14.2 min → 4.1 min (71% reduction)
- Error improvement: 4.3% → 1.7% (2.6 percentage points)
- Escalation improvement: 12% → 4.8% (7.2 percentage points)
- Adoption: 91% system usage
- Satisfaction: 4.4/5
Iterated based on evidence
- Interpreted pilot results systematically
- Applied iteration decision framework (Adjust)
- Implemented targeted improvements
- Validated improvements in second cycle
Prepared for production deployment
- Verified technical, operational, and organizational readiness
- Built deployment case with validated results
- Created rollout plan and handoff documentation

R-01 Results Summary

Final Pilot Metrics vs. Baseline

Metric	Baseline	Target	Final Result	Change
Task time	14.2 min	<5 min	4.1 min	-71%
Error rate	4.3%	<2%	1.7%	-2.6pp
Escalation rate	12%	<5%	4.8%	-7.2pp
System usage	N/A	>80%	91%	N/A
Satisfaction	3.2/5	>4.0/5	4.4/5	+1.2

All targets achieved. Core assumption validated.

Final Metrics vs. Module 3 Projections

Element	Module 3 Projection	Module 5 Result	Variance
Time savings	9.2 min/return	10.1 min/return	+10%
Estimated annual value	$97,516	$109,907 (projected)	+10%
Implementation cost	$35,000	~$12,000 (prototype)	-66%
Payback period	4.2 months	~1.3 months (projected)	-69%

Results exceeded projections. Business case strengthened.

Key Learnings from Iteration

Confidence indicators matter more than accuracy alone. Representatives needed to know when to trust recommendations.
Calibration requires ongoing attention. Policy categories drift; regular review catches problems early.
Champions accelerate adoption. Skeptics who converted became the strongest advocates.
Simple changes have outsized impact. The confidence indicator and escalation guidance took days to implement but moved escalation rate by 2+ percentage points.

Practitioner Feedback Summary

What worked:

"I don't have to interrupt Patricia anymore for routine questions."
"The confidence level tells me when to double-check."
"New staff can handle returns that used to require veteran knowledge."

What could be better:

"Some policies still need clearer language."
"Would be nice to see similar cases for complex situations." (Deferred feature)

Net assessment: Practitioners strongly prefer the new system.

Module 5B: REALIZE — Practice

T — Test

Measuring Implementation Quality

Module 5 built and tested the prototype. This section establishes how to measure whether the work is good: whether it produces results and whether it's done well.

Validating the Prototype

Before pilot begins, the prototype itself needs validation. Four questions:

Does it implement the blueprint specification?

The blueprint from Module 4 specified what the system should do. Validation confirms the prototype does it:

Blueprint Requirement	Prototype Status
Accept return attributes	✓ Implemented
Match to policy rules	✓ Implemented
Return policy summary with confidence	✓ Implemented
Display in CRM interface	✓ Implemented
Capture override actions	✓ Implemented

Any gaps between blueprint and prototype should be intentional (MVP scope) or flagged for remediation.

Does it function reliably?

Reliability means consistent behavior:

Same inputs produce same outputs
No unexplained failures
Error handling prevents crashes
Integration with other systems is stable

For R-01: 100 test transactions with zero failures required before pilot.

Is it usable by practitioners?

Usability means practitioners can complete their work:

Interface is comprehensible without documentation
Common tasks are efficient
Uncommon tasks are achievable
Error recovery is possible

For R-01: Three representatives complete five returns each without assistance.

Is it ready for pilot?

Pilot readiness means the system can support real work:

Data is loaded (policy database complete)
Training is available (quick reference ready)
Support is prepared (help desk briefed)
Feedback collection is ready (logging active)

Pilot readiness is not production readiness. Lower standards apply. The goal is learning.

Prototype Quality Metrics

Functional Completeness (vs. MVP Scope)

MVP Feature	Implemented	Tested	Working
Policy matching	✓	✓	✓
CRM display	✓	✓	✓
Override mechanism	✓	✓	✓
Performance (<2 sec)	✓	✓	✓

Functional completeness = Features implemented and working / Features in MVP scope

Target: 100% before pilot begins.

Technical Stability

Stability Metric	Target	Actual
Failed transactions	0 in 100 tests	0
System errors	0 critical in testing	0
Response time variance	<500ms	340ms
Recovery from errors	Graceful degradation	✓

Stability ensures the pilot tests the design, not the bugs.

Usability Assessment

Usability Factor	Method	Result
Task completion	3 reps × 5 returns	15/15
Time to learn	First successful return	<10 min
Errors made	User errors during test	2 (both recovered)
Satisfaction	Post-test rating	4.2/5

Usability ensures practitioners can actually use what was built.

Integration Reliability

Integration	Test Method	Result
CRM → Policy Engine	100 transactions	100% success
Order Management data	50 order lookups	100% success
Logging system	Action capture	100% captured

Integration reliability ensures the prototype works in its ecosystem.

Leading Indicators (During Pilot)

Leading indicators predict ultimate outcomes. Watch them early to catch problems.

Early Adoption Signals

Signal	What It Means	R-01 Target
Day 1 usage	Initial willingness	>60% of pilot group
Week 1 trend	Increasing or decreasing?	Stable or increasing
Voluntary use	Using when not required	>40%
Override rate	Trust in recommendations	<30%

Low early adoption may indicate training gaps, usability problems, or resistance. Address before measurement period.

Error/Issue Frequency

Metric	Early Warning	R-01 Target
System errors	>5/day	<2/day
User-reported issues	>3/day	<1/day
Help desk contacts	Rising trend	Stable or declining
Workaround emergence	Any pattern	Zero patterns

Rising issues require investigation. Stable or declining issues allow measurement to proceed.

Practitioner Engagement

Signal	Positive Indicator	Negative Indicator
Questions asked	"How do I..."	"Why do I have to..."
Suggestions offered	"What if we could..."	Silence
Peer discussion	Sharing tips	Sharing complaints
Override patterns	Specific cases	Everything

Engagement reveals whether practitioners are investing in the system or enduring it.

Iteration Velocity

Metric	Healthy	Unhealthy
Issues identified	Clear, specific	Vague, broad
Fix turnaround	Days	Weeks
Improvement validated	Measurable	Assumed
New issues emerging	Decreasing	Increasing

Healthy iteration shows progress. Unhealthy iteration shows churn.

Lagging Indicators (After Pilot)

Lagging indicators confirm outcomes. They're the evidence for deployment decisions.

Time Improvement vs. Baseline

Metric	Baseline	Target	Result	Assessment
Average task time	14.2 min	<5 min	4.1 min	✓ 71% improvement
Time variance	High	Reduced	Reduced	✓ More consistent
Peak time cases	28 min	<10 min	8 min	✓ Complex cases improved

Time improvement validates the core value proposition.

Quality Improvement vs. Baseline

Metric	Baseline	Target	Result	Assessment
Error rate	4.3%	<2%	1.7%	✓ 60% reduction
Error severity	Mix	Reduced severe	Reduced	✓ Remaining errors minor
Rework required	8%	<4%	3.2%	✓ Less rework

Quality improvement validates accuracy claims.

Focus Improvement vs. Baseline

Metric	Baseline	Target	Result	Assessment
Escalation rate	12%	<5%	4.8%	✓ 60% reduction
SME queries	15+/day	<5/day	3/day	✓ Patricia freed
Context switches	High	Reduced	Reduced	✓ Less interruption

Focus improvement validates cognitive load reduction.

ROI Realization vs. Projection

Element	Projected	Actual	Variance
Time savings value	$76,176/yr	$83,793/yr	+10%
Quality savings value	$15,480/yr	$17,028/yr	+10%
Focus savings value	$8,260/yr	$9,086/yr	+10%
Total annual value	$97,516/yr	$109,907/yr	+10%

ROI realization validates the business case.

Red Flags

Red flags signal problems that may not be obvious in metrics.

Adoption Doesn't Improve Over Time

Week 1 usage was 65%. Week 4 usage is still 65%. Representatives haven't increased adoption despite familiarity.

What it means: The system isn't earning trust. Practitioners use it when required but don't prefer it.

Investigation: Why aren't practitioners choosing to use it? Usability? Accuracy? Workflow friction?

Same Issues Recur Across Iterations

Iteration 1 addressed policy matching accuracy. Iteration 2 addressed policy matching accuracy. Iteration 3 addressed policy matching accuracy.

What it means: The fix isn't working. Either the diagnosis is wrong or the solution is inadequate.

Investigation: Is this a design problem? An implementation problem? A scope problem?

Practitioners Develop Workarounds for the New System

Representatives are using the system but have developed their own verification steps: checking the Bible anyway, asking Patricia to confirm.

What it means: The system is part of the workflow but hasn't replaced the old process. It's added work, not reduced it.

Investigation: What creates the need for verification? Trust? Accuracy? Specific case types?

Results Plateau Below Targets

Time improved from 14.2 minutes to 7 minutes. Three iterations later, it's still 7 minutes. Progress has stopped short of the 5-minute target.

What it means: The current approach has limits. More iteration won't reach the target.

Investigation: What's creating the floor? Is the target realistic? Does the approach need to change?

Module 5B: REALIZE — Practice

Consolidation Exercises

Learning solidifies through application and teaching. These exercises help integrate Module 5 concepts into your practice.

Module 5 Key Takeaways

These principles should guide your implementation practice:

1. Progress Over Perfection

A shipped prototype beats a perfect plan. The goal is learning. Every day spent polishing instead of testing is a day of learning lost.

2. One Visible Win Earns the Right to Continue

Skeptics don't convert through arguments. They convert through evidence. A small success demonstrated is worth more than a large success promised.

3. Prototype Is for Learning, Not Production

Prototypes aren't scaled-down production systems. They're learning instruments. Build them for speed and flexibility, not durability and performance.

4. Iteration Based on Evidence, Not Opinion

When pilots generate data, use it. Don't iterate based on hunches or preferences. Don't dismiss data because it's inconvenient. Let evidence drive decisions.

5. Pilot Is a Means, Not an End

Pilots exist to validate solutions for broader deployment. A pilot that never ends is an exception that consumes resources while denying benefits to everyone else.

Prev: ORCHESTRATE — Designing Human-AI Collaboration Next: NURTURE — Making It Stick