Skip to main content
Module 5

REALIZE — From Design to Reality

Building, testing, and proving value quickly

106 min read21,147 words0/1 deliverables checked
Reading progress0%

Module 4 taught you to design the right solution. This module teaches you to ship it before perfection kills it.

The Perfect System That Never Shipped

Nathan Okafor did everything right. As Director of Practice Technology at Cascade Legal Partners, he spent months on assessment, documented the friction in client intake across five disconnected systems, and built an airtight business case: 4.3 hours average intake time, 23% prospect abandonment, $1.8 million in annual losses from delayed billing and lost conversions. The executive committee approved full funding. Implementation began with adequate budget, visible sponsorship from the managing partner, and a September go-live target.

By November, eighteen months after approval, the system existed only in a test environment that three people used. The project had consumed $400,000 and had yet to process a single real client. Here's what happened: the scope expanded from six features to twenty-three. Each addition was individually justified. Automated conflict check requests? Three weeks, clear win. Conflict response tracking? Three weeks, clear win. Billing system integration? Four weeks, clear win. Every expansion addressed a real friction point, was requested by someone with legitimate authority, and pushed the timeline further out while core functionality remained untested. The original six-week implementation plan became an eighteen-month construction project. The team kept building. They stopped learning.

Meanwhile, the people they were building for moved on. Rachel Torres, the senior intake coordinator who had spent hours in design sessions and advocated for the project with skeptical colleagues, stopped checking in around month eight. She had clients waiting. She built her own workarounds: spreadsheets, email folders, a color-coded calendar system that made sense only to her. Inefficient by design standards, but functional. When Nathan finally reached out about pilot testing, Rachel hesitated. "I've built my own workarounds at this point. The new system would have to be significantly better than what I've cobbled together, or the transition cost isn't worth it." The project's strongest champion had become its most reluctant tester. Managing Partner Elena Reyes followed a similar arc. Eighteen months of progress reports with no visible results exhausted her attention. By the time the system was "ready," she had moved on to other priorities. The executive sponsor wasn't lost to conflict. She was lost to time.

The clarity came from Marcus Webb, a third-year associate with no stake in the project's history. "What's the one thing that proves it works?" he asked. The room went quiet. Rachel answered from the back: "Intake time. That's what started this. If the new system cuts that in half, everything else follows." Marcus followed up: "So what's the smallest version that proves intake time goes down? That's the pilot. Everything else is Phase 2." Nathan checked the project plan. Form routing, the feature that addressed the original problem, had been complete for seven months. It had been sitting in test while the team built around it. The lead developer confirmed: two weeks to deploy, maybe less. It was done. They just never turned it on.

Two weeks later, Rachel's team started using form routing. Intake time dropped from 4.3 hours to 2.1 hours. The system that saved Cascade Legal Partners $1.1 million annually started with a two-week deployment that did one thing well. Everything else came later, justified by results.

The Anchor Principle

Organizations fund projects based on projected value. They continue funding based on demonstrated value. The gap between projection and demonstration is where projects die.

Ship the smallest thing that proves value. Then expand. Nathan's eighteen-month journey could have been a six-week sprint if he had understood that progress is measured in value delivered, not capability accumulated. A system in test is a system at rest.

Three Concepts That Matter

The scope creep trap. Each addition to Nathan's project was individually reasonable. The aggregate effect was fatal. Scope expands through a series of small, justified decisions that feel like progress. The conflicts team needed integration. The billing team needed data flow. Each request came from someone with legitimate authority. The discipline is categorization: Phase 1 tests the core assumption; everything else is Phase 2. "Not no, but not yet."

Champion erosion. Champions have a shelf life. Rachel Torres went from enthusiastic advocate to reluctant participant in eight months. Elena Reyes went from executive sponsor to disengaged bystander in eighteen. Delay doesn't just cost time; it costs you the people who would have carried the project forward. Every month without visible results erodes the political capital and personal investment that made the project possible in the first place. You cannot bank enthusiasm. You spend it or you lose it.

Interactive Exercise

Champion Erosion Clock

0

Months Without Results

Month 0: Full enthusiasmMonth 18: Window closed

Rachel

Practitioner Champion

Engagement100%

"This is exactly what we needed."

Elena

Executive Sponsor

Engagement100%

"Full support. Keep me posted on progress."

The Skeptic

Senior Staff

Engagement30%

"Show me it works. I have seen this before."

"Ready for testing" vs. "ready for production." Nathan's team kept finding things that didn't work yet, so they kept delaying the pilot. They believed the pilot was supposed to test whether the system worked. They had it backwards. The pilot was supposed to reveal what didn't work. That was the point. A pilot that tests a complete system is a soft launch. A soft launch requires a complete system, which they were never going to have. The prototype's job is to learn, not to impress.

The Deliverable

Module 5 produces a Working Prototype with measured before/after results.

This is evidence, not a plan. The business case projected value; the prototype proves it. The blueprint specified the design; the prototype validates it. Before/after measurement using Module 3's baselines converts "we think this will work" into "here's what happened when we tried it."

Speed matters because stakeholder patience is finite, champions erode, and competitors don't wait. A working system that does one thing well creates more organizational energy than a promised system that does many things eventually. The form routing deployment in month eighteen created immediate enthusiasm. That enthusiasm fueled everything that followed. The momentum came from proof, not promises.

Build to learn. Ship to prove. Iterate to improve.

Interactive Exercise

Build vs. Ship

Capability Built0%
Value Delivered0%
Month 0of 18
Project startMonth 18: still building

Interactive Timeline

Scope Creep Timeline

Nathan Okafor’s team at Cascade Legal Partners set out to build a case management system with six features. Eight months later, they had 20 features — and the original problem was still unsolved.

Step through the timeline to watch scope creep happen in real time. Pay attention to the project health gauges — they tell the story the feature list doesn’t.

Module 5A: REALIZE — Theory

R — Reveal

Case Study: The Perfect System That Never Shipped

The implementation at Cascade Legal Partners should have been a success story.

Nathan Okafor had done everything by the book. As Director of Practice Technology, he had spent six months on assessment, observing how attorneys and paralegals actually conducted client intake, documenting the friction in the current process, cataloging the shadow systems that had accumulated over years of inadequate tooling. His Opportunity Portfolio identified the central problem: intake coordination required manual handoffs across five different systems, creating delays that cost the firm an estimated $1.8 million annually in delayed billing and lost client conversions.

The business case was airtight. Nathan had measured baselines with rigor: 4.3 hours average intake time, 23% of prospects abandoning during the process, $340 average administrative cost per new client. His value model projected $1.1 million in annual savings with a 62% reduction in intake time, plus capacity recovery that would allow the intake team to handle 40% more volume without additional headcount.

The workflow design had been exemplary. Nathan's team had mapped the current state in granular detail, identified friction points through practitioner observation, and designed a future state that preserved attorney judgment while automating information flow. The blueprint had been validated with attorneys, paralegals, and intake coordinators who would actually use the system. They had concerns. Everyone has concerns. They also saw the potential.

The executive committee approved full funding in February. Implementation began in March with adequate budget, visible sponsorship from the managing partner, and a target go-live of September.

By November, eighteen months after approval, the system existed only in a test environment that three people used. The September deadline had been pushed to December, then March, then "when it's ready." The project had consumed $400,000, more than the original budget, and had yet to process a single real client.

The intake process still ran on the same five disconnected systems. The shadow workarounds persisted. And Nathan's most enthusiastic early supporters had stopped attending project meetings.


What Went Wrong

The system that Nathan's team built was impressive. It did everything the blueprint specified, and considerably more.

In the months between design approval and September's original target, the scope had expanded in ways that seemed reasonable at each decision point.

The original design called for automated intake form routing. During development, someone realized that if they were routing forms, they could also generate conflict check requests automatically. Adding that feature took three weeks but eliminated a manual step. It seemed like a clear win.

Then the conflicts team asked: if the system was generating conflict requests, could it also track conflict responses and flag overdue checks? Another three weeks. Another clear win.

The billing team noticed the project and requested integration with their time-entry system, so intake data could pre-populate client matter records. Four weeks. Another clear win.

Each addition made sense in isolation. Each addressed a real friction point. Each was justified by someone with legitimate authority to make requests. And each pushed the timeline further out while the core functionality remained untested.

By September, the original six-week implementation plan had expanded to cover twenty-three distinct feature sets. The system could do remarkable things, things the original blueprint never contemplated. What it couldn't do was ship.


The Testing Trap

Nathan had planned for pilot testing. The project timeline included a four-week pilot with a small group of users before full rollout.

But the pilot never happened as designed. Every time the team approached pilot readiness, someone identified another gap.

"We can't pilot without the conflict integration. Attorneys won't trust the system if it doesn't handle conflicts."

"We can't pilot without the billing connection. The intake team will have to double-enter everything."

"We can't pilot without the client portal. That's what prospects will actually see."

Each objection was valid. Each pushed the pilot date further out. And each revealed a fundamental confusion about what the pilot was for.

Nathan's team believed the pilot was supposed to test whether the system worked. They kept finding things that didn't work yet, so they kept delaying the pilot.

What they didn't understand: the pilot was supposed to reveal what didn't work. That was the point. A pilot that tests a complete system is a soft launch. And a soft launch requires a complete system, which they were never going to have.

The team had confused "ready for testing" with "ready for production." They kept waiting for perfection before subjecting the system to reality.


The Patience Problem

Nine months into development, Managing Partner Elena Reyes asked Nathan for a status update.

"We're making excellent progress," he told her. "The system architecture is sophisticated, the integrations are complex, and we're working through the edge cases. We want to make sure we get this right."

Elena nodded. She trusted Nathan. But she also had partners asking why the firm had spent $300,000 on technology that no one was using. She had an intake team wondering if the promised improvements would ever arrive. She had client acquisition metrics that hadn't improved despite the investment.

"When will we see results?" she asked.

"The pilot is targeted for March," Nathan said. "Full rollout by June."

By March, Elena had moved on to other priorities. She had hired a new operations director whose mandate included "getting technology projects under control." The intake improvement budget was frozen pending review. When Nathan scheduled a meeting to discuss pilot launch, Elena's assistant responded that the managing partner was focused on other initiatives but wished the project well.

The executive sponsor hadn't been lost to conflict or opposition. She had been lost to time. Eighteen months of progress reports with no visible results had exhausted her political capital and attention. By the time the system was "ready," the organization had stopped caring.


The Hidden Costs

While Nathan's team built in isolation, the practitioners they were supposed to serve developed their own solutions.

Rachel Torres, the senior intake coordinator, had been one of Nathan's early champions. She had spent hours in design sessions, contributed expertise to the workflow mapping, and advocated for the project with skeptical colleagues. In the early months, she checked in regularly, eager to see progress.

By month eight, Rachel had stopped asking. She had work to do. Clients were waiting. The current system was terrible, but it was the system she had.

When Nathan finally reached out to schedule pilot testing, Rachel hesitated. "I've built my own workarounds at this point," she said. "The new system would have to be significantly better than what I've cobbled together, or the transition cost isn't worth it."

Her workarounds were inefficient by design standards: spreadsheets and email folders and a color-coded calendar system that made sense only to her. But they worked. She had adapted to the friction rather than waiting for the friction to be solved.

Rachel wasn't resisting change. She was surviving. And survival had made her less available to test something that might or might not eventually help.

The champions hadn't turned hostile. They had simply moved on.


The Moment of Clarity

The intervention came from an unlikely source.

Marcus Webb was a third-year associate who had joined the firm after the project began. He had no investment in the system's success or failure, no stake in the decisions that had brought it here. He had simply been assigned to help with testing and noticed something that insiders couldn't see.

"What problem are we testing for?" Marcus asked during a project review meeting.

"What do you mean?" Nathan replied.

"I've been using the test system for a week. It does a lot of things. But what's the one thing that proves it works? If we deployed this tomorrow and I could show you one number that proved value, what would that number be?"

The room was quiet. Nathan realized he didn't have a clear answer. The system did many things. He couldn't point to the one thing that mattered most.

"Intake time," Rachel said from the back of the room. "That's what started this. 4.3 hours average. If the new system cuts that in half, everything else follows. Better conversion, lower cost, happier clients. But we've been so focused on features that we forgot about the original problem."

Marcus nodded. "So what's the smallest version of this system that proves intake time goes down? That's the pilot. Everything else is Phase 2."

Nathan started to object. There were dependencies, integrations, features that users expected. But he stopped himself.

Twenty-three feature sets. Eighteen months. Four hundred thousand dollars. And the original problem, 4.3 hours average intake time, remained unsolved.

"What would that minimal version look like?" he asked.

"Form routing," Rachel said. "That's where the delay starts. If forms move automatically to the right person, intake time drops. The conflict integration is nice. The billing connection is nice. The client portal is nice. But form routing is the problem we set out to solve."

Nathan looked at the project plan. Form routing had been complete for seven months. It had been sitting in test while the team built features around it.

"How long to deploy just the form routing to your team?" he asked.

"Two weeks," said the lead developer. "Maybe less. It's done. We just never turned it on."


The One Visible Win

Nathan made the call that afternoon.

The project would split into two phases. Phase 1 was form routing, just form routing, deployed to Rachel's intake team within two weeks. No conflict integration. No billing connection. No client portal. Just the original problem, solved.

Phase 2 would include everything else. But Phase 2 would wait until Phase 1 proved value.

The pushback was immediate. The conflicts team had been promised integration. The billing team had been promised data flow. Other stakeholders had been waiting eighteen months for features that were now being deferred.

"We've already built it," the billing manager pointed out. "Why not include it?"

"Because including it means not shipping," Nathan said. "And not shipping means we keep running on the old system while the new system sits in test. We've proven we can build complex software. We haven't proven we can improve intake time. That's what has to happen first."

Two weeks later, Rachel's team started using the form routing system.

The results were immediate and measurable. Intake time dropped from 4.3 hours to 2.1 hours. The system was simple, but forms that previously sat in email queues now moved automatically to the right person. The bottleneck had been simple; the solution was simple.

Rachel sent Nathan a message after the first week: "This is what we needed eighteen months ago. More is coming, right?"

More was coming. But now "more" would be added to a working system, not a theoretical one. Each new feature would prove value before the next was added. The team would ship, measure, learn, and iterate.

When Nathan presented the Phase 1 results to Elena Reyes, she had a single question: "Why did this take so long?"

Nathan didn't have a good answer. But he had a better approach now.

"It won't happen again," he said. "From now on, we ship small and prove value before we build big."

The system that saved Cascade Legal Partners $1.1 million annually started with a two-week deployment that did one thing well. Everything else came later, justified by results.


The Lesson

Nathan's team had confused building with progress.

They had spent eighteen months constructing an impressive system that solved many problems, tested few assumptions, and delivered no results. Every decision to add scope, every delay waiting for completeness, every extension of the timeline had felt like progress. The system grew more capable each week.

Value comes from outcomes delivered, not capability accumulated. A system in test is still a system at rest.

The pilot that finally shipped tested one assumption: that automated form routing would reduce intake time. It did. That single validated assumption earned the right to continue. Everything that followed was built on proof, not projection.

The goal is a working system that proves value quickly enough to earn the right to continue. One visible win buys time, builds trust, and creates the foundation for everything that comes next.

Nathan's eighteen-month journey could have been a six-week sprint, if he had understood from the beginning that progress is measured in value delivered.


End of Case Study


Module 5A: REALIZE — Theory

O — Observe

Core Principles of Rapid Implementation

Module 5's anchor principle: One visible win earns the right to continue.

The business case secured approval. The workflow design earned validation. But approval and validation don't create value. Building creates value, and building requires a different mindset than planning.

The Cascade Legal Partners case illustrates the trap: eighteen months of building, zero months of learning. The team confused construction with progress, capability with value, completeness with readiness. They built an impressive system that solved many problems while proving nothing.

Module 5 provides the discipline of implementation: how to move from validated design to working prototype to production deployment, creating value at each step rather than waiting until everything is complete.


The Prototype Mindset

A Prototype Is a Learning Vehicle

A prototype is a tool for testing assumptions, a vehicle for learning whether the design actually works when it meets reality.

This distinction matters because it changes what "good" looks like. A prototype that reveals the design is wrong has succeeded. A prototype that hides problems until production has failed. The goal is to learn something true.

Nathan's team at Cascade built impressive software. They didn't learn whether automated form routing would reduce intake time until month eighteen, when the answer could have been known in month two.

Validated Learning Over Comprehensive Functionality

Every design embeds assumptions: practitioners will use the system this way; the technology will perform at this speed; the workflow will reduce friction at this point. These assumptions can be stated with confidence during design. They can only be validated through building and testing.

The prototype's purpose is to validate the assumptions that matter most. The ones the business case depends on. The ones that will determine success or failure.

For R-01 (Returns Bible), the critical assumption is that automated policy lookup will reduce representative time from 14.2 minutes to under 5 minutes. A prototype that tests this assumption, even a rough one, creates more value than a polished system that tests everything except this.

Speed Beats Completeness

When testing assumptions, speed matters more than completeness. A quick test that reveals a wrong assumption saves months of building on a flawed foundation. A slow test that confirms a right assumption arrives too late to matter.

This is counterintuitive for teams trained in quality: "We should do it right the first time." But "right" in prototype means "fast enough to learn while we still have time to adjust."

The Cascade team spent seven months with working form routing in test. They delayed learning because they wanted to learn everything at once. The result: they learned nothing until it was almost too late.

Permission to Build Something Imperfect

Prototyping requires organizational permission to build imperfect things. Teams trained on production quality standards struggle with this. They know how to build things right; they don't know how to build things fast and iterate toward right.

This permission must be explicit. Without it, teams will default to quality standards that make prototyping impossible. They will add features to avoid shipping something incomplete. They will delay testing to avoid showing something flawed.

"Perfect is the enemy of good" is a cliché. In prototyping, it's a survival rule.


The One Visible Win Principle

Early Value Earns Continuation

Organizations fund projects based on projected value. They continue funding based on demonstrated value. The gap between projection and demonstration is where projects die.

Nathan had executive support in February. By November, that support had evaporated. Time, not conflict, was the cause. Eighteen months of progress reports with no visible results exhausted stakeholder patience. When results finally arrived, the stakeholders had moved on.

A visible win early in implementation changes this dynamic. It converts projection into evidence. It gives stakeholders something to point to when questions arise. It builds momentum that carries the project through inevitable setbacks.

Stakeholder Patience Is Finite

Organizations have limited attention. Executives sponsor many initiatives. Every project competes for mindshare with every other project.

A project that takes months to show results must compete for attention the entire time. It must justify its continued existence against alternatives that might deliver faster. It must survive leadership changes, budget reviews, and shifting priorities, all before proving it deserves survival.

The one visible win shortens the window of vulnerability. It moves the project from "promising but unproven" to "proven and expanding." That transition happens not when the system is complete, but when it delivers measurable value.

Small Success Builds Momentum

A working system that does one thing well creates more organizational energy than a promised system that does many things eventually.

Rachel Torres stopped advocating for the Cascade project around month eight. By the time form routing shipped, she had built her own workarounds and lost interest. The project's strongest champion became a skeptic. Exhaustion, not opposition, drove the shift.

The form routing deployment in month eighteen created immediate enthusiasm. "This is what we needed." That enthusiasm fueled Phase 2 engagement. The momentum came not from promises, but from proof.

What Counts as a Visible Win

A visible win must be:

  • Measurable: Not "things feel better" but "intake time dropped from 4.3 hours to 2.1 hours"
  • Attributable: Clearly connected to the new system, not to other changes
  • Meaningful: Addressing a problem practitioners actually care about
  • Communicable: Easy to explain to stakeholders who aren't deeply involved

For R-01, a visible win might be: representatives can now answer policy questions in 3 minutes instead of 14 minutes. Measurable, attributable, meaningful, communicable.


Iteration Over Perfection

First Version Will Be Wrong

No design survives contact with reality unchanged. Users will behave differently than expected. Technology will perform differently than specified. Edge cases will emerge that no one anticipated.

This is normal. The first version will need adjustment. The question is how quickly adjustments can be made.

Teams that expect perfection on first release treat every problem as evidence of inadequate planning. They respond to problems by retreating to more planning. Teams that expect iteration treat every problem as information. They respond to problems by adjusting and retesting.

Problems in Prototype Are Learning

The Cascade team found problems during testing and delayed launch. They treated problems as evidence the system wasn't ready.

The correct interpretation: problems discovered in testing are problems discovered cheaply. Problems that emerge in production are problems discovered expensively. The prototype's job is to find problems, as many as possible, as quickly as possible, while they can still be addressed without damaging live operations.

A prototype that runs for weeks without revealing problems isn't well-built. It's under-tested.

Build-Measure-Learn Cycles

Each iteration follows a cycle:

  1. Build: Implement the next increment
  2. Measure: Collect data on what happened
  3. Learn: Interpret data and decide next action

The speed of this cycle determines learning velocity. A team that completes one cycle per month learns twelve things per year. A team that completes one cycle per week learns fifty things per year.

Cascade's team completed something like one-third of a cycle in eighteen months. They built extensively, measured minimally, and learned almost nothing.

The Cost of Being Wrong

Being wrong early is cheap. The form routing assumption could have been tested in week three with a small group of users. If wrong, the team would have learned it with minimal investment. If right, they would have had sixteen months to build on a proven foundation.

Being wrong late is expensive. Cascade spent $400,000 building features around a core assumption that remained untested. If form routing hadn't worked, most of that investment would have been wasted.

The prototype de-risks implementation by being wrong early, often, and cheaply.


Fast Failure as Strategy

Finding What Doesn't Work Is Valuable

Negative results are results. An assumption that proves wrong is an assumption you no longer need to build around. A feature that practitioners reject is a feature you don't need to maintain.

Teams avoid testing because they fear failure. But skipping tests doesn't prevent failure. It just delays discovery.

Fail Fast, Fail Cheap, Fail Forward

  • Fail fast: Test assumptions as early as possible
  • Fail cheap: Test with minimal investment
  • Fail forward: Each failure teaches something that improves the next attempt

The Cascade team eventually failed forward. Their Phase 1 launch taught them how to implement effectively. But they paid for eighteen months of learning-avoidance first.

Creating Conditions for Productive Failure

Productive failure requires:

  • Psychological safety: People can report problems without blame
  • Quick feedback loops: Problems surface rapidly, not months later
  • Iteration capability: The system can be changed based on what's learned
  • Clear success criteria: Teams know what they're testing for

Without these conditions, teams hide problems rather than surfacing them. Problems that can't be discussed can't be solved.


From Pilot to Production

Pilots That Stay Pilots Forever

A pilot is a test run, a limited deployment to validate assumptions before full rollout. By definition, a pilot has an end date.

But pilots frequently become permanent. "Just a few more tweaks" becomes an indefinite state. The pilot serves a small group forever while the broader organization waits indefinitely.

This happens when teams lack clear graduation criteria. Without defined thresholds, there's always another reason to delay. Another edge case. Another feature request. Another optimization opportunity.

Define Graduation Criteria Before Starting

Before pilot begins, define what success looks like:

  • What metrics must reach what thresholds?
  • What practitioner feedback constitutes validation?
  • What timeline is acceptable?
  • Who decides when criteria are met?

Without these criteria, the pilot can never end because success is undefined.

The Pilot Is Not the Destination

The pilot exists to earn the right to production. It's a means, not an end.

Teams that forget this optimize for pilot success rather than production readiness. They build solutions that work for ten users but won't scale to one hundred. They provide support levels that can't be sustained at full deployment. They create a permanent pilot that serves a small group while the original problem persists for everyone else.

Building Toward Scale from Day One

Even in prototype, consider scale:

  • Will this architecture support full deployment?
  • Can this support model be sustained?
  • Does this training approach work for everyone, not just early adopters?

The goal is to avoid building in ways that make production impossible.


Summary: The Module 5 Mindset

FromTo
Build everything, then testTest one thing, then build more
Wait until readyShip when valuable
Problems indicate failureProblems indicate learning
Perfect first releaseIterative improvement
Pilot as destinationPilot as gate to production

The discipline of Module 5 is progress over perfection: earning the right to continue through demonstrated value rather than promised capability.

Nathan's team at Cascade had everything they needed: good assessment, good business case, good design, adequate resources. What they lacked was the discipline to ship small, prove value, and build on success.

One visible win in month two would have justified eighteen months of development. Instead, eighteen months of development struggled to justify itself.

Build to learn. Ship to prove. Iterate to improve. That's the Module 5 mindset.



Module 5A: REALIZE — Theory

O — Observe

Prototype Construction

The blueprint specifies what to build. This section addresses how to build it: the methodology of translating design into working prototype while maintaining the discipline of speed over completeness.


Minimum Viable Prototype

What "Minimum" Means

Minimum is not "as little as possible." It's "the smallest scope that tests the core assumption."

The core assumption is the one the business case depends on. For R-01, the core assumption is that automated policy lookup reduces representative time. A minimum viable prototype tests this assumption. It skips every other assumption, every other feature, every edge case.

To identify minimum scope, ask: "What is the one thing that must prove true for this opportunity to deliver value?" Everything that tests this assumption is in scope. Everything else is out of scope for the first prototype.

This is harder than it sounds. Teams identify many things that seem essential:

  • "We can't test without X because users expect it."
  • "We can't deploy without Y because it's part of the workflow."
  • "We need Z or the data won't be accurate."

Each may be true for production. None is necessarily true for prototype. The prototype's job is to learn, not to impress.

What "Viable" Means

Viable means functional enough to generate real feedback. A prototype that doesn't work isn't viable. A prototype that works but can't be used by real people on real tasks isn't viable.

The threshold is usability, not polish. Can practitioners complete actual work using this prototype? Will the experience generate meaningful feedback about whether the design works?

For R-01, a viable prototype would:

  • Accept return attributes from representatives
  • Match attributes to policy rules
  • Display relevant policy information
  • Allow representatives to make decisions based on displayed information

It would not need:

  • Perfect policy matching accuracy (learning will improve this)
  • Integration with every downstream system
  • Polished user interface
  • Complete exception handling

The Discipline of Cutting Scope

Scope cutting requires discipline because every omitted feature has an advocate. The conflicts team wants integration. The billing team wants data flow. The training team wants onboarding support.

These requests are legitimate. They will eventually be addressed. But addressing them now delays learning about the core assumption.

The discipline: "Not no, but not yet." Every feature request gets categorized:

  • Phase 1 (MVP): Tests core assumption
  • Phase 2: Enhances validated solution
  • Future: Valuable but not urgent

This categorization must be visible and respected. Scope creep begins when categories blur.

Features to Include vs. Defer vs. Never Build

CategoryCriteriaExample (R-01)
IncludeTests core assumptionPolicy lookup and display
IncludeRequired for testing to functionBasic CRM integration
DeferValuable but not required for testBilling system integration
DeferEdge case handlingComplex exception workflows
NeverRequested but unnecessaryIndividual override tracking

"Never build" requires courage. Some requested features add complexity without value, or they conflict with design principles. Identifying these early prevents scope creep later.


Build vs. Buy vs. Configure

When to Build Custom

Build custom when:

  • Requirements are unique to your organization
  • No existing tool addresses the core workflow
  • Integration requirements make external tools impractical
  • Long-term ownership and flexibility matter

Building provides maximum control but maximum cost. Custom solutions require development resources, ongoing maintenance, and organizational capability to support.

For R-01: Building custom might mean developing a policy engine specifically for Lakewood Medical Supply's returns policies. This provides exact fit but requires sustained investment.

When to Purchase Existing Tools

Buy when:

  • Standard solutions address 80%+ of requirements
  • Time-to-value matters more than perfect fit
  • Vendor ecosystem provides ongoing innovation
  • Internal capability to build and maintain is limited

Purchasing provides faster deployment but less flexibility. The organization adapts to the tool rather than the tool adapting to the organization.

For R-01: Purchasing might mean acquiring a customer service knowledge base tool with policy matching capabilities. Faster deployment, but may require workflow adaptation.

When to Configure Existing Platforms

Configure when:

  • Platforms already in use have relevant capabilities
  • Configuration provides adequate functionality
  • Integration is simplified by staying within platform
  • Total cost of ownership favors leverage over purchase

Configuration provides the fastest path when platforms are capable. Many organizations have tools with untapped features that address current needs.

For R-01: Configuration might mean extending the existing CRM to display policy information through custom fields and automation rules. Fastest path if the CRM platform supports it.

Decision Framework

FactorBuildBuyConfigure
Time to prototypeSlowestMediumFastest
Fit to requirementsExactApproximateVariable
Ongoing costHighestMediumLowest
FlexibilityHighestLimitedLimited
Internal capability requiredHighestLowMedium

The right choice depends on context. A team with strong development capability might build. A team with limited resources might configure. Neither is universally correct.

The R-01 Example

R-01 could be implemented through any path:

Option A: Configure existing CRM

  • Add policy database as custom object
  • Create automation rules to match return attributes to policies
  • Display policy information in customer service interface
  • Timeline: 3-4 weeks to prototype

Option B: Purchase knowledge management tool

  • Acquire tool designed for policy/knowledge management
  • Integrate with existing CRM through API
  • Configure matching rules within new tool
  • Timeline: 6-8 weeks to prototype

Option C: Build custom integration layer

  • Develop policy engine with custom matching logic
  • Build integration layer connecting Order Management, CRM, and policy database
  • Create custom interface for policy display
  • Timeline: 10-12 weeks to prototype

For MVP purposes, Option A is likely preferred. It's fastest to prototype and tests the core assumption. If prototype validates the assumption, later phases might evolve toward Option C for greater capability.


Integration Strategy

Connecting to Existing Systems

Prototypes rarely exist in isolation. They must connect to existing systems for data, for workflow, for context.

Integration approach significantly affects timeline and complexity:

API-First Integration

  • Clean separation between systems
  • Well-defined interfaces
  • Changes in one system don't break others
  • Requires API availability and documentation

Manual Bridge

  • Human intermediary handles data transfer
  • Faster to implement for prototype
  • Doesn't scale to production
  • Useful for testing assumptions before investing in integration

Data Export/Import

  • Batch transfer of data between systems
  • Simpler than real-time integration
  • May be sufficient for prototype testing
  • Production may require more sophisticated approach

Handling Integration Constraints

Integration often reveals constraints that aren't visible during design:

  • APIs that don't exist or don't expose needed data
  • Security policies that prevent direct connection
  • Performance limitations that affect user experience
  • Data format mismatches that require transformation

For prototype, the response to constraints should prioritize speed:

  • Can we work around this constraint for testing purposes?
  • Can we simulate the integration to test the workflow?
  • Can we use manual processes temporarily to validate the design?

The goal is testing the core assumption, not solving every integration challenge.

When Integration Complexity Should Reduce Scope

Sometimes integration complexity exceeds prototype value. A planned integration that would take eight weeks might be better replaced by a manual workaround that takes three days.

The question: "Does this integration test our core assumption, or is it infrastructure for later phases?"

If it's infrastructure for later phases, defer it. The prototype should answer the essential question with minimum investment.


Technology Selection Process

Evaluating Against Blueprint Requirements

The Module 4 blueprint specifies requirements in tool-agnostic terms. Technology selection evaluates available options against these requirements.

Evaluation criteria derived from blueprint:

  • Does it meet functional requirements?
  • Does it integrate with specified systems?
  • Does it meet performance requirements?
  • Does it respect specified constraints?

Secondary criteria for prototype:

  • How quickly can we deploy?
  • How easily can we iterate?
  • What learning curve does the team face?
  • What risks does this choice introduce?

Avoiding Vendor-Driven Design

Technology vendors have capabilities they want to demonstrate. Sales processes emphasize what tools can do, not what you need done.

The danger: selecting a tool and then redesigning the workflow to fit the tool's strengths. This inverts the correct sequence (design workflow, then select tool).

Protection: evaluate against blueprint requirements, not vendor demonstrations. Ask "Does this tool do what our blueprint specifies?" not "What can this tool do?"

Proof-of-Concept Before Commitment

Major technology investments should be preceded by proof-of-concept: a limited test that validates the tool can actually deliver what's needed.

The proof-of-concept tests:

  • Can the tool handle your specific data and workflows?
  • Does performance meet requirements under realistic conditions?
  • Can your team configure and operate it effectively?
  • Do hidden constraints or limitations emerge?

This test should happen before contract signing, not after. Vendors are motivated to support proof-of-concept because it advances the sale. Use this motivation.

The "Good Enough" Threshold

No tool is perfect. Selection requires identifying what matters most and accepting limitations in what matters less.

For prototype, "good enough" means:

  • Tests the core assumption
  • Can be deployed within timeline
  • Supports iteration based on learning
  • Doesn't introduce risks that could sink the project

Production may require higher standards. Prototype requires faster decisions.

Module 5A: REALIZE — Theory

O — Observe

T — Testing Frameworks

Building the prototype is half the work. Testing it effectively, gathering the data that validates or refutes assumptions, is the other half. This section covers how to test prototypes in ways that generate actionable learning.


T — Testing Human-AI Workflows

Different from Testing Pure Software

Software testing asks: "Does the system function as specified?" Human-AI workflow testing asks: "Does the workflow produce the intended outcomes when humans and systems work together?"

The distinction matters because the system can function perfectly while the workflow fails. The technology may perform as designed, but:

  • Humans may not use it as intended
  • The interaction may create friction the design didn't anticipate
  • Trust may not develop as assumed
  • Behavior may not change as predicted

Testing human-AI workflows requires observing the entire interaction, not just the system's behavior.

The Human Element

Human behavior in testing includes:

  • Adoption patterns: Do practitioners use the system when they could?
  • Usage patterns: Do they use it as designed, or develop workarounds?
  • Trust signals: Do they rely on system recommendations, or override consistently?
  • Behavioral change: Does their overall workflow change as intended?

These patterns emerge over time. Single-day testing won't reveal whether practitioners trust a recommendation system. Extended testing reveals whether trust develops, deteriorates, or never forms.

What to Observe Beyond System Function

System metrics tell part of the story. Observation tells the rest.

Watch for:

  • Moments of hesitation, where practitioners pause before acting
  • Workarounds, actions taken outside the system to accomplish tasks
  • Verbal commentary, what practitioners say while working
  • Help-seeking, when they ask colleagues for guidance
  • Abandonment, when they leave the system to finish work elsewhere

These observations surface friction that metrics miss.

Combining Quantitative and Qualitative

Neither metrics nor observation alone provides complete understanding.

Metrics reveal what happened: time dropped from X to Y, error rate changed from A to B. They don't explain why, or whether the change will persist, or what problems lurk beneath surface improvement.

Observation reveals context: practitioners hesitate at step 3 because the language is confusing, or they override frequently because system recommendations don't match reality. But observation is limited by sample size and observer bias.

Effective testing combines both:

  • Quantitative metrics for what changed
  • Qualitative observation for why and how
  • Practitioner interviews for perception and experience
  • Behavioral analysis for patterns over time

Pilot Group Selection

Size: Small Enough to Support, Large Enough to Learn

Pilot groups face a tradeoff:

  • Too small: Results may not generalize; individual variation dominates
  • Too large: Support burden overwhelms; feedback is difficult to process

A reasonable pilot size depends on context. For R-01, a pilot of 6-10 representatives might be appropriate: enough to see patterns, small enough to provide intensive support and gather detailed feedback.

The right size allows:

  • Direct relationship with each pilot participant
  • Rapid response to issues that emerge
  • Detailed feedback collection
  • Reasonable statistical validity for key metrics

Composition: Mix of Enthusiasts and Skeptics

Pilots populated only by enthusiasts will succeed; pilots populated only by skeptics will fail. Neither result is informative.

Effective pilot composition includes:

  • Early adopters who will explore and provide feedback willingly
  • Mainstream users who represent typical behavior
  • Skeptics who will stress-test the system and surface weaknesses

The mix creates realistic conditions. Early adopters show what's possible. Skeptics reveal what's broken. Mainstream users indicate whether the design works for normal people doing normal work.

Duration: Long Enough to See Patterns

Short pilots reveal whether the system functions. Extended pilots reveal whether it works.

The difference: functioning is about technology; working is about workflow. A system might function correctly while the workflow remains inefficient because practitioners haven't adapted, trust hasn't developed, or edge cases haven't emerged.

Minimum pilot duration should allow:

  • Initial learning curve to pass (often 1-2 weeks)
  • Representative volume of work (enough transactions to measure)
  • Pattern stabilization (behavior settles into routine)
  • Edge case emergence (unusual situations surface)

For R-01, a reasonable pilot duration might be 4-6 weeks. Enough time for representatives to move past novelty, develop routine usage patterns, and encounter various return scenarios.

Geographic and Functional Considerations

If the production deployment will span locations or functions, the pilot should include variation:

  • Different locations may have different work patterns
  • Different shifts may have different volumes
  • Different practitioners may have different experience levels

A pilot that succeeds in one context and fails in another provides valuable information, but only if both contexts are tested.


Measurement Against Baseline

Using Module 3 Baselines

Module 3 established baseline metrics through rigorous measurement. Module 5 testing uses the same metrics for comparison.

For R-01, baseline metrics included:

  • Average time for Bible-dependent returns: 14.2 minutes
  • Incorrect policy application rate: 4.3%
  • Supervisor escalation rate: 12%
  • Patricia-specific queries: 15+/day

Pilot measurement must use the same definitions, same methodology, and same rigor. If the baseline measured task time from return initiation to resolution, pilot measurement must use the same boundaries.

Same Methodology, Same Rigor

Methodological consistency enables comparison. If baseline measurement used time-motion observation of 50 transactions, pilot measurement should use comparable sampling.

Inconsistent methodology makes comparison unreliable. A pilot that measures differently than baseline will produce results that can't be interpreted. Was the change real, or an artifact of measurement?

Before/After Measurement Design

The simplest comparison: measure the pilot group before prototype deployment and after. The difference indicates change.

This approach has limitations:

  • Other factors may have changed between measurements
  • The "before" measurement may already reflect Hawthorne effects (behavior change from being observed)
  • Individual variation may dominate small samples

More rigorous designs use control groups or time-series analysis, but these require larger samples and longer durations. For most prototypes, before/after measurement of the pilot group provides adequate evidence.

Controlling for Variables

Factors other than the prototype can affect results:

  • Volume changes: Busy periods differ from slow periods
  • Seasonal effects: Some work varies by time of year
  • Learning effects: Performance improves as practitioners gain experience
  • Staff changes: Different people may perform differently

Controlling for these variables is challenging in real-world pilots. At minimum:

  • Note any unusual conditions during pilot
  • Compare similar time periods (e.g., same day of week)
  • Consider whether observed changes could have other explanations
  • Be conservative in attributing results to the prototype

The Three Lenses in Testing

Module 3's three ROI lenses (Time, Throughput, and Focus) provide structure for testing.

Time: Is It Actually Faster?

The Time lens measures whether the prototype reduces time spent on work.

For R-01:

  • Baseline: 14.2 minutes average for Bible-dependent returns
  • Target: <5 minutes
  • Measurement: Time-motion observation of pilot transactions

Results might show:

  • Average time reduced to 4.8 minutes (target met)
  • Standard deviation remains high (some transactions still take long)
  • Time improvement varies by case complexity

Throughput: Is Quality/Volume Actually Improved?

The Throughput lens measures whether the prototype improves work quality or capacity.

For R-01:

  • Baseline: 4.3% incorrect policy application
  • Target: <2%
  • Measurement: QA audit of pilot decisions

Results might show:

  • Error rate dropped to 1.8% (target met)
  • Most errors now occur in specific case types
  • Practitioners feel more confident in decisions

Focus: Is Cognitive Load Actually Reduced?

The Focus lens measures whether the prototype reduces cognitive burden and risk.

For R-01:

  • Baseline: 12% supervisor escalation rate, 15+/day Patricia queries
  • Target: <5% escalation, <3/day Patricia queries
  • Measurement: System tracking of escalations, observation of Patricia queries

Results might show:

  • Escalation rate dropped to 7% (partial improvement)
  • Patricia queries dropped to 4/day (partial improvement)
  • Representatives report feeling more self-sufficient

Each Lens May Show Different Results

A prototype might improve Time while Throughput worsens, or improve Focus while Time increases. Different lenses can reveal different stories.

For R-01, a possible mixed result:

  • Time improved significantly (wins on speed)
  • Throughput improved moderately (better accuracy)
  • Focus improved partially (still some escalation)

Mixed results require interpretation. Is the improvement enough? Which areas need iteration? Does the overall pattern justify production deployment?

Module 5A: REALIZE — Theory

O — Observe

Iteration Methodology

Testing generates data. Iteration converts that data into improvement. This section covers how to interpret feedback, decide what to do next, and maintain progress through the learning cycle.


The Build-Measure-Learn Cycle

Build: Implement the Next Increment

Building in iteration differs from building initially. The initial build implements the prototype scope. Iteration builds implement specific changes responding to specific findings.

An iteration build should:

  • Address one finding at a time (avoid combining changes)
  • Have clear scope (what's being changed and why)
  • Be timeboxed (hours or days, not weeks)
  • Be testable (the change can be observed and measured)

For R-01, an iteration build might be: "Policy matching accuracy was 78%; adding product category as a matching factor should improve accuracy." That's a specific change, testable, with clear rationale.

Measure: Collect Data on What Happened

After implementing a change, measure its effect. Did the change produce the intended improvement? Did it create unintended consequences?

Measurement in iteration should be:

  • Focused: Measure the specific thing that was changed
  • Quick: Get results in days, not weeks
  • Comparative: Compare to pre-change baseline

For the R-01 example: After adding product category matching, measure policy matching accuracy. Did it improve from 78%? Did it affect anything else negatively?

Learn: Interpret Data and Decide Next Action

Learning converts measurement into decision:

  • If the change worked, incorporate it and move to the next issue
  • If the change didn't work, understand why and try a different approach
  • If the change revealed new issues, add them to the iteration backlog

Learning requires intellectual honesty. A change that was supposed to help but didn't is useful information, if acknowledged. Teams that explain away negative results don't learn from them.

Cycle Speed Matters

The learning rate is proportional to cycle speed. Faster cycles mean more learning in less time.

Consider two teams:

  • Team A completes one build-measure-learn cycle per month
  • Team B completes one cycle per week

In three months, Team A has completed 3 cycles. Team B has completed 12 cycles. Team B has four times the learning, which translates to better outcomes.

Cycle speed depends on:

  • Build complexity (simpler changes build faster)
  • Measurement latency (quick metrics enable quick cycles)
  • Decision process (clear authority enables quick decisions)
  • Technical capability (fast deployment enables fast testing)

Reading Prototype Feedback

What Metrics Tell You

Metrics provide objective measurement of specific outcomes. They tell you what changed, by how much, with what variation.

For R-01, metrics might show:

  • Average policy lookup time: 3.2 minutes (down from 14.2)
  • Policy matching accuracy: 83% (users confirm 83% of recommendations)
  • Error rate: 2.1% (down from 4.3%)
  • Escalation rate: 8% (down from 12%)

These numbers indicate progress toward goals. They don't explain why progress occurred or didn't occur.

What Practitioner Behavior Tells You

Behavior reveals what metrics can't capture:

  • Are practitioners using the system enthusiastically, reluctantly, or minimally?
  • Where do they hesitate or struggle?
  • What workarounds have they developed?
  • How has their overall work pattern changed?

Behavioral observation adds context to metrics. A time improvement might be driven by the system working well, or by practitioners giving up on difficult cases and processing only easy ones. Metrics alone can't distinguish these scenarios.

What Silence Tells You

Absence of feedback is data. When practitioners stop commenting on the system, it may mean:

  • The system works so well they don't notice it (good)
  • They've stopped using it (bad)
  • They've adapted in ways that avoid friction (needs investigation)

Silence requires investigation. Don't assume silence means satisfaction.

Distinguishing Signal from Noise

Not all feedback matters equally:

  • Single-user complaints may reflect individual preference, not design flaw
  • Rare edge cases may not justify design changes
  • Early confusion may resolve with experience

Signal indicators:

  • Multiple practitioners report similar issues
  • Issues persist over time
  • Issues affect core workflow, not peripheral features
  • Practitioners develop consistent workarounds

Noise indicators:

  • Isolated complaints from single users
  • Issues that fade as practitioners gain experience
  • Preference differences that don't affect outcomes
  • Requests for features that weren't part of scope

The Iteration Decision Framework

Continue: Results Positive, Expand Scope

When to continue:

  • Core assumptions validated by data
  • Metrics meet or exceed targets
  • Practitioners are satisfied and effective
  • No major issues remain unresolved

Continue means "proceed to next phase," which might be broader pilot, additional features, or production deployment.

Adjust: Results Mixed, Modify and Retest

When to adjust:

  • Some metrics meet targets, others don't
  • Practitioners report fixable friction
  • Issues are implementation problems, not design problems
  • The core approach is working, with specific gaps

Adjustment should be targeted. Identify specific issues, implement specific fixes, test specific improvements. Avoid broad redesign in response to specific problems.

Pivot: Core Assumption Wrong, Redesign Approach

When to pivot:

  • Core assumption disproved by testing
  • Practitioners fundamentally reject the workflow
  • Issues trace to design principles, not implementation details
  • Fixing individual problems won't address root cause

Pivot is serious. It means the design was wrong, not merely incomplete. Pivot should return to Module 4 principles rather than tweaking the prototype.

Pivot is also rare. Most pilots reveal adjustment needs, not fundamental design failures. If assessment (Module 2), calculation (Module 3), and design (Module 4) were done well, pivot is unlikely.

Stop: Opportunity Isn't Viable

When to stop:

  • Core assumption disproved and alternative approaches unlikely to succeed
  • Value proposition no longer holds after accounting for reality
  • Organizational conditions have changed, making the opportunity obsolete
  • Continued investment isn't justified by potential return

Stop is painful but sometimes correct. The discipline of Module 5 is learning what works, which includes learning when something should be abandoned.

Stop should be documented: What was learned? Why did this fail? What would need to be true for a future attempt to succeed?


Scope Management During Iteration

Resisting "While We're Fixing That, Let's Also..."

Iteration is vulnerable to scope creep. Each fix creates temptation to add more:

  • "While we're updating the policy matching, let's also add..."
  • "Since we're touching that code, we should..."
  • "Users are asking for X anyway, might as well..."

These additions derail iteration focus. They turn targeted fixes into expanded scope. They slow cycle speed and blur measurement.

The discipline: each iteration has one focus. Additional requests go to the backlog, not into the current cycle.

Each Iteration Should Have One Focus

Single-focus iteration enables:

  • Clear measurement (did this specific change help?)
  • Fast cycles (one change builds faster than many)
  • Meaningful learning (attribution is clear)
  • Manageable complexity (fewer things can go wrong)

When iteration scope expands, benefits erode. Multiple simultaneous changes make it impossible to know which change caused which effect.

Deferring Good Ideas That Aren't Urgent

Good ideas arrive constantly during iteration. Some come from practitioners, some from stakeholders, some from the team. Many are genuinely valuable.

The backlog captures these ideas for later evaluation. Deferral is prioritization, not rejection.

Questions for backlog triage:

  • Does this address a current iteration goal?
  • Is this urgent (blocking progress) or important (valuable when ready)?
  • Can this wait for a future phase without significant cost?

Most good ideas can wait. The ones that can't should dominate current iteration focus.

The Discipline of Incremental Improvement

Progress happens through many small improvements, not one large transformation.

Each iteration:

  • Addresses one issue
  • Produces measurable improvement
  • Creates foundation for next iteration

Accumulated iterations produce substantial progress. A team that makes ten small improvements over five weeks may achieve more than a team that attempts one large improvement over the same period.

Module 5A: REALIZE — Theory

O — Observe

From Pilot to Production

The pilot validated the prototype. Metrics improved. Practitioners provided positive feedback. Iteration addressed the rough edges. The system works.

Now what?

The transition from pilot to production is where many projects stall. The pilot becomes permanent, serving a small group forever while the broader organization waits indefinitely. Or the deployment happens without adequate preparation, and production reveals problems the pilot never surfaced.

This section covers how to graduate from validated pilot to successful production deployment.


Defining Pilot Success

Quantitative Thresholds

Before pilot begins, success criteria should be defined. These criteria provide objective targets:

For R-01:

  • Time per Bible-dependent return: <5 minutes (baseline: 14.2 minutes)
  • Incorrect policy application: <2% (baseline: 4.3%)
  • Supervisor escalation rate: <5% (baseline: 12%)
  • System usage rate: >80% (pilot group)
  • Practitioner satisfaction: >4.0/5

Success means meeting these thresholds consistently, repeatedly over the pilot duration.

Qualitative Indicators

Numbers alone don't define success. Qualitative factors matter:

  • Do practitioners prefer the new workflow to the old?
  • Has behavior genuinely changed, or is compliance superficial?
  • Are workarounds emerging that indicate unresolved friction?
  • Would practitioners advocate for the system to their colleagues?

A pilot that meets quantitative targets while practitioners quietly hate the system is a ticking time bomb that will fail at scale.

Comparison to Module 3 Projections

Module 3's ROI model made projections about expected value. Pilot results should be compared to those projections:

For R-01:

  • Projected time savings: 9.2 minutes/return
  • Actual time savings: 10.1 minutes/return (exceeded projection)
  • Projected error reduction: 2.3 percentage points
  • Actual error reduction: 2.2 percentage points (met projection)
  • Projected escalation reduction: 7 percentage points
  • Actual escalation reduction: 4 percentage points (partially met)

This comparison validates the business case. Results that exceed projection strengthen the case for production. Results that fall short require explanation and possibly revised projections.

What "Good Enough" Looks Like

Perfection isn't the standard. "Good enough" means:

  • Core value proposition demonstrated
  • Critical success metrics met
  • Remaining issues are minor, rare, or have clear remediation paths
  • Production deployment won't create significant new problems
  • The organization will be better off with the system than without it

Waiting for perfection means waiting forever. At some point, the system is ready. Defining that point in advance prevents endless refinement.


The Pilot Trap

Pilots That Never End

A pilot should have a defined end date. When pilots continue indefinitely, several dynamics are typically at play:

Fear of Scale: "It works for 10 users, but what about 100?" Concerns about scale prevent commitment to deployment.

Perfectionism: "Just a few more tweaks" becomes permanent state. Each improvement reveals another opportunity.

Ownership Ambiguity: No one has authority to declare the pilot successful and proceed.

Risk Aversion: Production deployment feels risky. Pilot feels safe. Safety wins.

Lost Momentum: Original urgency faded. No one is pushing for completion.

"Just a Few More Tweaks" as Avoidance

There's always something else to improve. The policy matching could be 2% more accurate. The interface could be slightly smoother. The documentation could be more complete.

These improvements are genuine. They're also endless. If the standard is "nothing left to improve," deployment never happens.

The discipline: Is the system better than what it replaces? If yes, deploy it. Continue improving after deployment, not instead of deployment.

Loss of Urgency After Initial Success

Early pilots generate excitement. The first positive results create energy. Champions celebrate progress.

As pilots extend, urgency fades. Initial excitement becomes routine. Champions move to other priorities. Stakeholders who were eager become indifferent.

By the time deployment is "ready," no one cares anymore. The project that could have been a success story becomes a footnote.

How Pilots Become Permanent Exceptions

Some organizations have multiple permanent pilots, systems that serve small groups indefinitely because deployment never happened.

These pilots create problems:

  • Resource drain: Small groups get support that broader deployment would amortize
  • Inequity: Some practitioners have better tools than others for no good reason
  • Technical debt: Pilots built for small scale accumulate workarounds as they persist
  • Organizational confusion: Which system is official? Which is temporary?

A pilot is a test, not a destination. If it passes the test, deploy it. If it fails, kill it. Either way, it shouldn't persist.


Scaling Considerations

What Worked for 10 May Not Work for 100

Pilot conditions differ from production conditions:

Support intensity: Pilot users get intensive support. Production users get standard support.

User selection: Pilot users are often early adopters. Production includes skeptics and reluctant users.

Volume: Pilot handles limited transactions. Production handles full volume.

Edge cases: Pilot encounters some variation. Production encounters all variation.

Scaling requires anticipating these differences. What assumptions held in pilot may not hold in production?

Infrastructure for Production

Technical infrastructure that supported pilot may need enhancement:

  • Performance: Will the system handle peak loads?
  • Reliability: What happens when components fail?
  • Recovery: How quickly can the system be restored after problems?
  • Monitoring: How will ongoing performance be tracked?

These requirements exist during pilot but become critical at scale. A pilot can tolerate occasional problems; production cannot.

Training and Support at Scale

Pilot training was intensive and personal. Production training must be scalable:

  • Can new users be onboarded without one-on-one attention?
  • Do training materials exist that work without facilitators?
  • Is support infrastructure ready for volume?
  • Are escalation paths defined?

Change Management for Broader Rollout

Pilot users volunteered or were selected. Production users will have the system imposed on them. This changes the dynamic.

Change management for production:

  • Communication: Why is this happening? What's in it for practitioners?
  • Timeline: When will changes affect each group?
  • Support: Where can practitioners get help?
  • Feedback: How can practitioners report problems?

Practitioners who feel informed and supported adopt more readily than practitioners who feel surprised and abandoned.


Production Readiness

Technical Checklist

Before production deployment, verify:

CategoryItemStatus
StabilityNo critical bugs in last 2 weeks
PerformanceResponse time meets requirements under load
SecuritySecurity review completed, vulnerabilities addressed
BackupData backup and recovery tested
MonitoringPerformance and error monitoring in place
IntegrationAll integrations functioning reliably

Operational Checklist

CategoryItemStatus
SupportHelp desk trained on new system
DocumentationUser guides and troubleshooting docs available
EscalationTechnical escalation path defined
MaintenanceMaintenance schedule and procedures documented
OwnershipSystem owner assigned

Organizational Checklist

CategoryItemStatus
TrainingTraining materials ready for all user groups
CommunicationDeployment communication plan executed
LeadershipExecutive sponsor confirmed and engaged
FeedbackFeedback collection mechanism in place
Success metricsOngoing measurement plan defined

Documentation for Handoff

Production deployment transfers responsibility from project team to operations. Documentation enables this handoff:

  • System documentation: What it does, how it works, how to maintain it
  • Operational procedures: Daily, weekly, monthly tasks
  • Troubleshooting guides: Common problems and solutions
  • Contact information: Who to escalate to for what issues

Documentation created during development is often insufficient for operations. Handoff documentation should be created with operational users in mind.


The Deployment Decision

Who Decides

The deployment decision should have clear ownership. Typically:

  • Project sponsor approves based on results
  • Technical lead certifies readiness
  • Operations lead confirms support capability
  • Business owner validates expected value

If approval authority is ambiguous, deployment stalls in committee.

Building the Case for Deployment

The deployment recommendation summarizes:

  • Pilot results vs. success criteria
  • Comparison to Module 3 projections
  • Remaining risks and mitigations
  • Recommended deployment approach
  • Timeline and resource requirements

This is a decision document, not a status report. It should enable a decision, not defer one.

Handling Stakeholder Concerns

Stakeholders may have concerns about deployment:

"What if it breaks?" Show reliability data from pilot. Document rollback procedures.

"Are practitioners ready?" Show adoption data and feedback. Describe training plan.

"What about the edge cases we haven't tested?" Acknowledge remaining uncertainty. Show how edge cases will be monitored and addressed.

"Is the timing right?" Discuss organizational readiness. Note that delay has costs too.

Concerns should be addressed directly, not dismissed. Unaddressed concerns become deployment blockers.

Timing and Sequencing

Deployment timing matters:

  • Avoid major business cycles (end of quarter, holidays)
  • Consider training logistics (when can users be trained?)
  • Account for support availability (who handles problems?)
  • Coordinate with other initiatives (avoid change saturation)

Sequencing options:

  • Big bang: Everyone at once. Faster but higher risk.
  • Phased: Groups deploy sequentially. Slower but lower risk.
  • Parallel: New and old systems run simultaneously. Safe but expensive.

The right approach depends on organizational tolerance for risk and operational complexity.

Module 5B: REALIZE — Practice

R — Reveal

Introduction

Module 5A established the principles of rapid implementation. This practice module provides the methodology: how to move from validated blueprint to working prototype to production deployment, creating value at each step.


Why This Module Exists

The gap between design and deployment is where organizations lose momentum.

Module 4 produced a validated Workflow Blueprint: a specification of how work should flow, what technology should do, and how humans and AI should collaborate. That blueprint represents significant investment: assessment, calculation, design, validation.

But a blueprint is a plan, not a result. The plan must become reality. Module 5 provides the discipline to make that happen without falling into the traps that stalled Cascade Legal Partners for eighteen months.

The deliverable: A Working Prototype with measured before/after results, evidence that the design works, ready for production deployment.


Learning Objectives

By completing Module 5B, you will be able to:

  1. Scope a minimum viable prototype that tests core assumptions without building everything at once

  2. Select an implementation approach (build, buy, or configure) based on requirements and constraints

  3. Construct or configure the prototype within timeline discipline, avoiding scope creep

  4. Design and execute pilot testing with appropriate group composition, duration, and measurement

  5. Measure results against Module 3 baselines using consistent methodology across all three ROI lenses

  6. Iterate based on evidence using the build-measure-learn cycle to address issues systematically

  7. Prepare for production deployment with appropriate readiness verification and handoff documentation


The Practitioner's Challenge

Three tensions define implementation:

Speed vs. Completeness

The faster you ship, the sooner you learn. But incomplete systems frustrate users and generate invalid feedback. Finding the minimum that enables meaningful testing requires discipline.

Quality vs. Iteration

Production quality standards evolved for good reason. But applying them to prototypes delays learning. Building for iteration means accepting imperfection now to enable improvement later.

Confidence vs. Evidence

The design feels right. Stakeholders are enthusiastic. Practitioners validated the blueprint. But confidence isn't evidence. Only testing reveals whether the design actually works. The temptation to declare victory early, before data confirms success, must be resisted.


Field Note

A technology director at a regional retailer described the moment his team's implementation approach changed:

"We had been building for four months. The system was sophisticated. It did everything we'd designed and more. But we hadn't tested anything with actual users. Every time we got close to pilot, someone found another gap. 'We can't test without X.' 'Y needs to be finished first.' Always reasonable, always delaying.

"Then a competitor launched something similar. Less sophisticated than what we were building, honestly pretty basic. But they were in market, learning from real customers, iterating based on real feedback. We were still planning our pilot.

"That's when we realized: their bad version that shipped beat our good version that didn't. They were learning while we were building. We stripped back to essentials and deployed in three weeks. It wasn't pretty, but it worked. And we learned more in those three weeks than we had in four months of building.

"Now we have a rule: if you can't describe what you'll learn from shipping, you're not ready to ship. But if you can describe what you'll learn, you're already late."


What You're Receiving

Module 5 receives the following from prior modules:

From Module 4: Validated Workflow Blueprint

The blueprint specifies:

  • Current-state workflow with documented friction
  • Future-state design with human-AI collaboration
  • Technology requirements (tool-agnostic)
  • Adoption design elements
  • Success metrics aligned with ROI model

For R-01, the blueprint documents:

  • Current state: 8 steps, 14-28 minutes, high friction at policy search and interpretation
  • Future state: 5-6 steps, 9-14 minutes, Preparation pattern with Policy Engine
  • Integration requirements: Order Management, CRM, Policy Engine
  • Success targets: <5 min time, <2% error, <5% escalation, >80% adoption

From Module 3: Baseline Metrics

The ROI model established baselines:

  • Time per Bible-dependent return: 14.2 minutes
  • Incorrect policy application: 4.3%
  • Supervisor escalation rate: 12%
  • Patricia-specific queries: 15+/day

These baselines become the comparison point for pilot measurement.

From Module 3: Success Criteria

The business case defined success:

  • Annual value: $97,516
  • Implementation cost: $35,000
  • Payback period: 4.2 months
  • ROI: 736%

Pilot results must validate (or invalidate) these projections.


Module Structure

Module 5B proceeds through six stages:

1. Prototype Scoping

Translating the complete blueprint into minimum viable scope. What must be tested first? What can wait?

2. Implementation Approach

Selecting build, buy, or configure. Evaluating options against R-01 requirements. Documenting the decision.

3. Testing and Measurement

Designing the pilot. Selecting participants. Defining measurement methodology. Executing the test.

4. Iteration Cycles

Interpreting results. Deciding next actions. Implementing improvements. Retesting.

5. Production Preparation

Verifying readiness. Building the deployment case. Preparing handoff documentation.

6. Transition to Module 6

Connecting proven prototype to sustainability planning. What carries forward.


The R-01 Implementation

Throughout Module 5B, we continue the R-01 example from previous modules:

  • Module 2 identified R-01 (Returns Bible Not in System) as a high-priority opportunity
  • Module 3 quantified the value: $97,516 annual savings
  • Module 4 designed the solution: Preparation pattern with automated policy lookup

Module 5 builds it:

  • Scoping the minimum prototype that tests policy lookup improvement
  • Selecting implementation approach (configure CRM vs. build custom)
  • Testing with representative pilot group
  • Measuring against the 14.2-minute baseline
  • Iterating based on what testing reveals
  • Preparing for deployment to all customer service representatives

By the end of Module 5, R-01 will be a working system with demonstrated results. A reality, no longer a design document.



Module 5B: REALIZE — Practice

O — Observe

Prototype Scoping Methodology

The blueprint specifies the complete solution. The prototype tests the core assumptions. This section covers how to translate comprehensive design into focused prototype scope.


From Blueprint to Prototype Scope

The Blueprint Specifies the Complete Future State

Module 4's blueprint documents everything needed for full implementation:

  • All workflow steps and decision points
  • All human-AI collaboration specifications
  • All integration requirements
  • All adoption design elements

This completeness is necessary for production. It is often counterproductive for initial prototype.

The Prototype Tests Core Assumptions

Every design embeds assumptions:

  • The technology can do what we specified
  • Practitioners will use it as designed
  • The workflow will reduce friction as projected
  • Integration will work reliably

Some assumptions are more critical than others. The business case depends on certain assumptions being true. If they're wrong, everything else is irrelevant.

The prototype tests these critical assumptions first. Non-critical assumptions can wait.

Identifying Essential First-Test Components

To identify what must be in the prototype, ask:

  • What assumption does the business case depend on most?
  • If this assumption is wrong, does the opportunity still exist?
  • What's the smallest thing we can build that tests this assumption?

For R-01, the critical assumption is: automated policy lookup will reduce representative time from 14.2 minutes to under 5 minutes.

Everything that tests this assumption is essential. Everything that doesn't is deferrable.


The MVP Question

"What Is the Smallest Thing We Can Build That Tests Our Core Assumption?"

This question forces ruthless prioritization. Not "what would be nice to have." Not "what stakeholders expect." Not "what the blueprint specifies." Just: what tests the core assumption?

For R-01, the answer might be:

  • Policy Engine that matches return attributes to policies
  • Display of matched policy in representative's CRM view
  • Ability for representative to act on the displayed information

That's it. Not billing integration. Not documentation automation. Not exception handling workflow. Just: can automated policy lookup reduce the time representatives spend finding policies?

Distinguishing "Nice to Have" from "Must Have for Testing"

FeatureMust Have (MVP)Nice to HaveRationale
Policy matching engineTests core assumption
Policy display in CRMTests core assumption
Override mechanismRequired for fair test
Similar case displayValuable but not essential for time test
Automatic documentationEfficiency gain, not core test
Billing integrationDownstream value, not core test
Exception routing workflowHandles 15% of cases, not typical flow
Manager dashboardObserver feature, not practitioner test

The must-haves test whether automated policy lookup works. The nice-to-haves make it better but aren't needed to answer the essential question.


Scope Categories

MoSCoW Prioritization for Prototype

CategoryDefinitionR-01 Example
Must HaveRequired to test core value proposition; prototype fails without itPolicy matching, CRM display, override capability
Should HaveImproves test validity but not essential; include if time permitsSimilar case references, confidence indicators
Could HaveValuable but can wait; include in later iterationsException handling workflow, training mode
Won't Have (this version)Explicitly deferred; not part of prototype scopeBilling integration, manager reporting, mobile access

The discipline is in Won't Have. Every stakeholder has features they consider essential. MVP discipline requires explicit deferral with clear rationale.

Scope Documentation

Document scope decisions formally:

R-01 PROTOTYPE SCOPE DOCUMENT

MVP Scope (Must Have):
1. Policy Engine integration
   - Receive return attributes from CRM
   - Match to applicable policy rules
   - Return policy summary and confidence level

2. CRM Display
   - Show policy information in existing representative view
   - No navigation to separate application
   - Display appears when return details entered

3. Override Mechanism
   - One-click "doesn't apply" option
   - No explanation required
   - Action logged for learning

Deferred to Phase 2 (Should/Could Have):
- Similar case display
- Exception handling workflow
- Automatic documentation
- Confidence threshold alerts

Out of Scope (Won't Have):
- Billing system integration
- Manager dashboard and reporting
- Mobile access
- Multi-language support

Rationale: MVP tests whether automated policy lookup reduces time. All deferred features are valuable but not required to validate core assumption.

The R-01 Prototype Scope

Full Scope from Blueprint (Review)

Module 4's blueprint specified:

Future-State Workflow:

  1. Gather return info → Policy Engine identifies policies
  2. Policy review → System surfaces summary and similar cases
  3. Exception handling → System flags unusual cases
  4. Customer communication → Policy summary available
  5. Return processing → Decision logged automatically
  6. Documentation → Derived from workflow

Technology Requirements:

  • Policy Engine integration
  • CRM integration (read/write)
  • Similar case matching
  • Automatic documentation
  • Performance: <2 second response

Adoption Design:

  • Optional acknowledgment for experienced reps
  • One-click override
  • Training integration

MVP Scope for First Prototype

For initial prototype, scope reduces to:

Blueprint ElementMVP StatusRationale
Policy Engine integrationIncludeCore assumption
CRM displayIncludeCore assumption
Override mechanismIncludeFair test requires escape
Similar case matchingDeferValuable but not core test
Automatic documentationDeferEfficiency, not core value
Exception workflowDefer15% of cases, test typical first
Performance (<2 sec)IncludePoor performance invalidates test
Training modeDeferNot needed for pilot with support

What's Deferred and Why

Similar case matching: Helps representatives make decisions but isn't required to test whether automated policy lookup reduces time. If core assumption validates, add this in Phase 2.

Automatic documentation: Saves time at the end of the workflow but doesn't affect the policy lookup test. The time savings from documentation automation can be measured separately.

Exception workflow: Handles the 15% of cases that are unusual. Testing the 85% typical flow first provides cleaner signal. Exception handling adds complexity that obscures core learning.

Manager reporting: Observer feature, not practitioner feature. Violates the Module 4 principle of designing for practitioners first.

Timeline Implications

ScopeEstimated TimelineRisk Level
Full blueprint10-12 weeksHigher (more complexity)
MVP + Phase 2 features6-8 weeksMedium
MVP only3-4 weeksLower (focused scope)

MVP timeline enables testing the core assumption in one month rather than three. If the assumption validates, additional features follow. If it doesn't, less has been wasted.


Scope Documentation

Feature List with Categorization

Create a formal scope document for stakeholder alignment:

FeatureCategoryAcceptance CriteriaDependencies
Policy matchingMust HaveMatches return attributes to policy with >75% accuracyPolicy database loaded
CRM displayMust HavePolicy appears within 2 seconds of return entryCRM API access
Override buttonMust HaveSingle click dismisses recommendationNone
Confidence indicatorShould HaveShows high/medium/low based on match qualityPolicy matching complete
Similar casesCould HaveShows 2-3 prior cases with similar attributesCase history database
Auto-documentationWon't Have (v1)Records decision without manual entryDefer to Phase 2

Acceptance Criteria for "Done"

Define what "done" means for MVP:

  • Policy matching: Successfully matches 50+ test cases with >75% accuracy
  • CRM display: Policy information renders within 2 seconds, consistently
  • Override: Button functions, action is logged
  • Integration: No errors in 100 consecutive transactions
  • User test: 3 representatives can complete workflow without assistance

These criteria define when the prototype is ready for pilot. Testable, not perfect.

Dependencies and Prerequisites

DependencyOwnerStatusRisk
Policy database contentPatricia (SME)In progressMedium - requires knowledge extraction
CRM API accessIT departmentApprovedLow
Test environmentDevelopment teamAvailableLow
Pilot group availabilityOperations managerConfirmedLow

Dependencies that aren't resolved block prototype progress. Identify them early.

Risks of Scope Decisions

RiskLikelihoodImpactMitigation
MVP too limited to generate valid feedbackMediumHighInclude override and confidence to ensure usability
Deferred features create stakeholder frustrationMediumMediumClear communication about Phase 2 timeline
Policy matching accuracy insufficientMediumHighPlan calibration iteration before pilot
Integration more complex than estimatedLowHighStart integration work immediately

Common Scoping Mistakes

Including Everything from Blueprint

The blueprint specifies production requirements. Including all of them in prototype creates the Cascade problem: building everything, testing nothing.

Correction: Ruthlessly apply the MVP question. What tests the core assumption? Everything else waits.

Underestimating Integration Complexity

Integration between systems always takes longer than expected. APIs don't work as documented. Data formats don't match. Security requirements add steps.

Correction: Start integration work early. Test integration independently before building features that depend on it. Reduce scope rather than extend timeline when integration proves difficult.

Forgetting Training and Support Needs

A prototype that practitioners can't use generates no useful feedback. Pilot users need orientation, support access, and feedback channels.

Correction: Include pilot support in scope. Enough for pilot participants to use the system effectively, though less than production-grade training.

Scope Creep During Build

"While we're building policy matching, we should also add..." Each addition seems reasonable. Accumulated additions delay testing indefinitely.

Correction: Formal scope change process. Any addition to MVP scope requires explicit approval with impact assessment. Good ideas go to Phase 2 backlog, not current sprint.


Scope Sign-Off

Before proceeding to implementation, confirm scope with stakeholders:

Scope Agreement Checklist

  • MVP scope is documented and understood
  • Deferred features are explicitly listed with rationale
  • Stakeholders with deferred features have acknowledged timing
  • Acceptance criteria are defined for MVP features
  • Dependencies are identified with owners and status
  • Timeline is realistic for MVP scope
  • Scope change process is agreed

This agreement prevents mid-build disputes about what was promised. When someone asks "Aren't you including X?", the documented scope provides the answer.



Module 5B: REALIZE — Practice

O — Operate

Step 1: Select Implementation Approach

The prototype scope is defined. Now: how to build it? This section covers the build vs. buy vs. configure decision and applies it to R-01.


The Build vs. Buy vs. Configure Decision

Framework Review

Module 5A introduced three implementation paths:

ApproachWhen to UseTradeoffs
BuildRequirements are unique; no existing tool fits; long-term flexibility mattersMaximum control, maximum cost, longest timeline
BuyStandard solutions address most requirements; time-to-value is priorityFaster deployment, less flexibility, ongoing license cost
ConfigureExisting platforms have relevant capabilities; integration is simplifiedFastest path, limited by platform capabilities

The right choice depends on:

  • Requirements specificity (how unique are your needs?)
  • Timeline pressure (how fast must you test?)
  • Internal capability (can you build and maintain?)
  • Budget constraints (what's affordable?)
  • Long-term ownership (who maintains this over years?)

Applying to R-01

R-01's requirements from the blueprint:

Functional:

  • Accept return attributes
  • Match to policy rules
  • Return policy summary with confidence
  • Display in CRM interface
  • Capture override actions

Technical:

  • <2 second response time
  • Integration with existing CRM and Order Management
  • Support for 50+ concurrent users

Constraints:

  • No changes to Order Management data structures
  • No additional login for representatives
  • No mandatory data entry beyond current workflow

Each path has distinct tradeoffs.


R-01 Implementation Options Analysis

Option A: Configure Existing CRM

What would need to happen:

  • Create custom policy database within CRM
  • Build automation rules to match return attributes to policies
  • Create custom UI component for policy display
  • Configure logging for override actions

Pros:

  • Fastest timeline (3-4 weeks to prototype)
  • No new system to integrate
  • Representatives stay in familiar interface
  • Lower cost (internal effort, no new licenses)
  • IT team has CRM configuration expertise

Cons:

  • Policy matching logic limited by CRM capabilities
  • Scaling may hit platform limits
  • Some features may require workarounds
  • Dependent on CRM vendor roadmap

Timeline estimate: 3-4 weeks to MVP Resource estimate: 1 CRM administrator, 0.5 developer Cost estimate: $8,000-12,000 (internal labor)


Option B: Purchase Returns Management Tool

What would need to happen:

  • Evaluate and select vendor
  • Negotiate contract and licensing
  • Configure tool for Lakewood policies
  • Build integration with existing CRM
  • Train administrators on new platform

Pros:

  • Purpose-built for returns/policy management
  • Vendor handles updates and improvements
  • May include features beyond current scope
  • Potentially better policy matching capabilities

Cons:

  • Longer timeline (vendor selection, contract, configuration)
  • Integration complexity (new system to connect)
  • Ongoing license costs
  • Vendor dependency for customization
  • Representatives may need to switch between systems

Timeline estimate: 8-12 weeks to MVP Resource estimate: 0.5 developer for integration, vendor support Cost estimate: $15,000-25,000 (licenses) + $10,000-15,000 (integration)


Option C: Build Custom Integration Layer

What would need to happen:

  • Design policy matching engine architecture
  • Develop custom matching algorithms
  • Build integration layer for CRM and Order Management
  • Create custom UI components
  • Implement logging and analytics

Pros:

  • Exact fit to requirements
  • Maximum flexibility for future enhancement
  • Full ownership and control
  • No vendor dependencies

Cons:

  • Longest timeline
  • Highest cost
  • Requires ongoing development resources
  • Risk of scope creep during custom development
  • Technical debt accumulation

Timeline estimate: 10-14 weeks to MVP Resource estimate: 2 developers, 1 architect Cost estimate: $35,000-50,000 (development)


Selected Option: Configure Existing CRM (Option A)

Rationale:

  1. Timeline alignment: MVP in 3-4 weeks tests core assumption quickly. Longer paths delay learning without proportional benefit for prototype phase.

  2. Risk reduction: CRM configuration is reversible. If prototype fails, minimal investment lost. Custom build or vendor commitment creates sunk costs.

  3. Capability match: CRM's automation capabilities can handle policy matching at prototype scale. Production may require enhancement, but prototype doesn't need production capacity.

  4. Integration simplicity: No new system means no new integration. Representatives stay in familiar interface, reducing adoption friction.

  5. Team capability: IT team has CRM expertise. No new skills required for prototype.

What This Means for Prototype Construction:

  • Week 1: Policy database design and initial data entry
  • Week 2: Automation rules for policy matching
  • Week 3: UI component development and integration testing
  • Week 4: Pilot preparation and initial testing

Production Considerations:

CRM configuration may be insufficient for full production. If prototype validates the core assumption, production options include:

  • Enhanced CRM configuration with additional optimization
  • Migration to purchased tool (now justified by proven value)
  • Custom development (now scoped by real requirements)

The prototype decision doesn't lock in the production decision. Learning from prototype informs better production choice.


Vendor/Platform Evaluation

When Purchasing (Option B), Evaluate Against Blueprint

If Option B were selected, evaluation would follow this process:

Step 1: Create evaluation criteria from blueprint

CriterionWeightSource
Policy matching accuracyHighBlueprint functional requirement
CRM integration capabilityHighBlueprint integration requirement
Response time <2 secondsMediumBlueprint performance requirement
Override loggingMediumBlueprint collaboration specification
Reporting capabilitiesLowNice-to-have, not MVP
Mobile accessLowNot in current scope

Step 2: Evaluate vendors against criteria

VendorMatchingIntegrationPerformanceOverrideScore
Vendor A4/53/55/54/53.9
Vendor B5/54/54/53/54.1
Vendor C3/55/54/55/54.1

Step 3: Proof-of-concept with top candidates

Before contract signing:

  • Test with actual policy data
  • Verify integration with actual CRM
  • Measure actual response times
  • Confirm customization capabilities

Proof-of-Concept Requirements

TestSuccess CriteriaDuration
Policy matching>75% accuracy on 50 test cases3 days
IntegrationSuccessful round-trip data flow2 days
Performance<2 second response under load1 day
CustomizationOverride logging configurable1 day

Proof-of-concept should cost nothing or minimal; vendors are motivated to support it.


Resource Requirements

For Option A (Selected): CRM Configuration

ResourceAllocationWeeksTotal
CRM Administrator100%3120 hours
Developer (integration)50%240 hours
Business Analyst25%440 hours
Patricia (SME)10%416 hours
Project Lead25%440 hours

Total effort: ~250 hours Total cost: ~$12,000 (assuming blended rate of $50/hour)

Timeline with Milestones

WeekMilestoneDeliverables
1Policy database readyData structure defined, initial policies loaded
2Matching logic completeAutomation rules configured and tested
3UI integration completePolicy display functional in CRM
4Pilot readyTesting complete, pilot group briefed

Budget Alignment

Module 3 approved $35,000 for R-01 implementation. Option A prototype consumes ~$12,000, leaving $23,000 for:

  • Pilot support and iteration
  • Production enhancement
  • Contingency

This allocation provides runway for learning and adjustment.


Risk Assessment

Technical Risks

RiskLikelihoodImpactMitigation
CRM automation insufficient for policy complexityMediumHighTest complex policies early; have Option B ready
Performance degrades under loadLowMediumMonitor during pilot; optimize before scale
Integration breaks with CRM updatesLowMediumTest in sandbox after updates; maintain documentation

Timeline Risks

RiskLikelihoodImpactMitigation
Policy data extraction takes longer than expectedMediumMediumStart immediately; Patricia availability confirmed
Testing reveals unexpected issuesMediumLowBuilt buffer into Week 4; iteration expected
Stakeholder adds scope mid-buildMediumMediumScope agreement signed; change process defined

Adoption Risks

RiskLikelihoodImpactMitigation
Representatives resist new workflowLowHighDesign validated in Module 4; pilot includes skeptics
Policy matching accuracy too lowMediumHighCalibration sprint before pilot; override available
Training insufficient for pilotLowMediumIntensive support during pilot; feedback loops

Module 5B: REALIZE — Practice

O — Operate

Step 2: Testing and Measurement

The prototype is built. Before pilot launch, design the test: who participates, how long it runs, what gets measured, and how data is collected.


Pilot Design

Pilot Group Selection for R-01

The pilot group must be large enough to generate meaningful data but small enough to support intensively.

Recommended pilot size: 8 representatives

Selection criteria:

  • Mix of tenure levels (2 new, 4 experienced, 2 veteran)
  • Mix of attitudes (include 1-2 skeptics identified during Module 4 validation)
  • Representatives who handle returns regularly (minimum 5 Bible-dependent returns per day)
  • Geographic/shift distribution if applicable

R-01 Pilot Group:

RepresentativeTenureAttitudeReturns VolumeNotes
Maria T.8 yearsChampionHighModule 4 validation participant
DeShawn W.2 yearsSupportiveHighEager to try new tools
Jennifer R.4 monthsNeutralMediumNew perspective, learning Bible
Alex P.5 yearsSkepticHighQuestioned design in validation
Keisha M.12 yearsNeutralHighVeteran knowledge, Patricia's backup
Carlos S.1 yearSupportiveMediumQuick learner
Patricia L.22 yearsSupportiveHighBible expert, essential validator
Ryan K.3 yearsSkepticMediumRaised concerns about accuracy

This mix ensures:

  • Champions who will explore and advocate
  • Skeptics who will stress-test and surface problems
  • New users who reveal whether the system is intuitive
  • Veterans who reveal whether it handles complex cases

Duration and Timeline

Pilot duration: 4 weeks

WeekPhaseFocus
1LearningRepresentatives orient to system; support intensive
2StabilizationUsage patterns establish; early issues addressed
3MeasurementPrimary data collection; behavior stabilizes
4ValidationConfirm patterns; prepare iteration decisions

Four weeks allows:

  • Learning curve effects to pass
  • Sufficient transaction volume for statistical validity
  • Pattern observation over multiple workweeks
  • Time for edge cases to emerge

Success Criteria

Quantitative thresholds (from Module 3/4):

MetricBaselineTargetMeasurement Method
Time per Bible-dependent return14.2 min<5 minTime-motion observation
Policy matching accuracyN/A>80% confirmedOverride rate tracking
Incorrect policy application4.3%<2%QA audit sample
Supervisor escalation rate12%<5%System logging
System usage rateN/A>80%System logging
Representative satisfaction3.2/5>4.0/5Survey

Qualitative indicators:

  • Representatives prefer new workflow to old
  • Workarounds are minimal or absent
  • Patricia queries decrease significantly
  • Pilot participants would recommend to colleagues

Control Considerations

Before/after design: Measure each representative's performance before pilot (during Week 0 baseline) and during pilot weeks 3-4.

Controlling for variables:

  • Compare similar time periods (avoid end-of-month, holidays)
  • Note any unusual volume or complexity during pilot
  • Track whether returns mix was typical
  • Document any system issues or outages

Measurement Plan

Metrics Aligned with Module 3 Baseline

Module 3 established baselines. Module 5 measures against them using identical methodology.

MetricModule 3 MethodModule 5 MethodComparability Check
Task timeTime-motion observation, n=50Time-motion observation, n=50+Same observer training, same definition of start/end
Error rateQA audit of 100 decisionsQA audit of 100 decisionsSame auditor, same criteria
EscalationManual count from logsSystem trackingVerify definition matches

Collection methodology:

Time Metrics:

  • Observer records start time when representative opens return case
  • Observer records end time when representative completes policy-based decision
  • Exclude customer communication and processing time (consistent with baseline)
  • Sample minimum 50 transactions across pilot period
  • Distribute samples across all pilot representatives

Quality Metrics:

  • QA team audits random sample of 100 return decisions
  • Audit criteria: Was correct policy applied? Was decision appropriate for case?
  • Auditor blind to whether decision made with or without system
  • Compare pilot error rate to baseline error rate

Behavioral Metrics:

  • System logs every interaction: policy displayed, override clicked, time on screen
  • Calculate usage rate: returns processed with system / total Bible-dependent returns
  • Track override rate: overrides / total recommendations
  • Note patterns: Who uses most? Who overrides most? What cases get overridden?

Collection Schedule:

WeekData CollectedResponsible
0 (pre-pilot)Baseline confirmation measurementsBusiness Analyst
1Usage logging, support issues, early feedbackProject Lead
2Continued logging, first observation sessionBusiness Analyst
3Primary time-motion observation, QA audit beginsBusiness Analyst + QA
4Complete observation, complete audit, surveysFull team

The R-01 Measurement Framework

Time Metrics

Target: Policy lookup time < 5 minutes (vs. 14.2 minute baseline)

Measurement:

  • Time from return case open to policy decision made
  • Excludes customer communication and return processing
  • Measured via observation (primary) and system timestamps (secondary)

Collection:

  • 50+ observed transactions during weeks 3-4
  • Stratified by representative and case complexity
  • Standard deviation calculated to understand variation

Throughput Metrics

Target: Error rate < 2% (vs. 4.3% baseline)

Measurement:

  • QA audit of 100 return decisions during pilot
  • Same auditor, same criteria as baseline audit
  • Error = incorrect policy applied or inappropriate decision

Collection:

  • Random sample from all pilot representatives
  • Include simple and complex cases
  • Audit within 48 hours of decision for context availability

Focus Metrics

Target: Escalation rate < 5% (vs. 12% baseline)

Measurement:

  • Percentage of returns requiring supervisor involvement
  • Supervisor involvement = case transferred or supervisor consulted

Collection:

  • System logging of escalation events
  • Verify with supervisor records
  • Track by representative and case type

Adoption Metrics

Target: System usage > 80%

Measurement:

  • Returns processed using system / total Bible-dependent returns
  • Usage = policy display occurred and representative took action

Collection:

  • System logging (automatic)
  • Verify representatives aren't bypassing system
  • Note reasons for non-use if identified

Satisfaction Metrics

Target: Satisfaction > 4.0/5

Measurement:

  • Survey administered at end of week 4
  • 5-point scale on: ease of use, accuracy, speed, preference vs. old process
  • Open-ended feedback questions

Collection:

  • All pilot representatives complete survey
  • Anonymous for honest feedback
  • Administered by neutral party, not project team

Qualitative Data Collection

Observation Protocol

During time-motion observation, note:

  • Points of hesitation (where do representatives pause?)
  • Verbal reactions (comments, sighs, frustration, satisfaction)
  • Workarounds (actions outside the system)
  • Questions to colleagues (seeking help or confirmation)
  • Override patterns (when and why they override)

Use structured observation form:

OBSERVATION RECORD

Observer: ________________  Date: ________  Time: ________
Representative: ____________  Tenure: ________

Transaction #: ________
Return Type: ________________
Case Complexity: Simple / Medium / Complex

Start Time: ________  End Time: ________  Total: ________

System Used: Yes / No
If No, reason: ________________________________

Policy Recommended: ________________________________
Action Taken: Accepted / Overridden / N/A
If Overridden, reason observed: ________________________________

Friction Points Observed:
________________________________
________________________________

Representative Comments (verbatim):
________________________________
________________________________

Observer Notes:
________________________________
________________________________

Interview Questions

Conduct 15-minute interviews with each pilot representative at end of week 2 and week 4.

Week 2 (early feedback):

  • "How would you describe your experience with the new system so far?"
  • "What's working well?"
  • "What's frustrating or confusing?"
  • "Have you found situations where the system doesn't help?"
  • "What would make it more useful?"

Week 4 (final feedback):

  • "How has your experience changed since we last spoke?"
  • "Would you want to continue using this system? Why or why not?"
  • "How does this compare to the old way of doing things?"
  • "What advice would you give a colleague starting to use this?"
  • "What should we change before rolling out to everyone?"

Capturing Workarounds

Workarounds indicate unmet needs. Track them systematically:

Workaround ObservedWho Used ItFrequencyWhat Need It Addresses
[Description][Reps][Often/Sometimes/Once][Underlying need]

Multiple representatives using the same workaround signals design gap.

Weekly Feedback Sessions

Hold 30-minute group session at end of each pilot week:

  • What went well this week?
  • What problems did you encounter?
  • What questions do you have?
  • What should we focus on improving?

Document themes, not individual complaints. Look for patterns.


Analysis Framework

Comparing Pilot Results to Baseline

Create comparison table:

MetricBaselinePilot ResultChangeTarget Met?
Task time14.2 min[result][%]Yes/No
Error rate4.3%[result][pp]Yes/No
Escalation rate12%[result][pp]Yes/No
System usageN/A[result]N/AYes/No
Satisfaction3.2/5[result][points]Yes/No

Statistical Considerations

Sample size: 50+ observations provides reasonable confidence for time metrics. Smaller samples increase uncertainty.

Significance: For prototype testing, practical significance matters more than statistical significance. A 50% time reduction is meaningful even without p-values.

Variation: Report mean and standard deviation. High variation may indicate inconsistent experience.

Interpreting Mixed Results

Results rarely show universal improvement. Interpretation requires judgment:

Scenario: Time improved, but error rate increased

  • Possible cause: Representatives moving too fast, skipping verification
  • Response: Adjust workflow to include confirmation step

Scenario: Metrics improved, but satisfaction low

  • Possible cause: System works but feels burdensome
  • Response: Investigate friction points through interviews

Scenario: Most metrics improved, but one segment struggled

  • Possible cause: Complex cases not well-handled
  • Response: Analyze which cases fail, enhance for those

Documenting Findings

Create structured pilot report:

R-01 PILOT RESULTS REPORT

Executive Summary:
[2-3 sentences on overall outcome]

Quantitative Results:
[Table comparing baseline to pilot]

Qualitative Findings:
[Key themes from observation and interviews]

What Worked:
[List with evidence]

What Needs Improvement:
[List with specific issues]

Recommended Next Steps:
[Continue/Adjust/Pivot/Stop with rationale]

Appendices:
- Raw data
- Observation records
- Interview transcripts
- Survey results

Module 5B: REALIZE — Practice

O — Operate

Step 3: Iteration Cycles

The pilot generated data. Representatives used the system. Metrics were collected. Now: what does the data mean, and what should happen next?


Interpreting R-01 Pilot Results

Review the Results

From the pilot measurement (file 04):

MetricBaselineTargetPilot ResultAssessment
Task time14.2 min<5 min4.3 min✓ Target met
Error rate4.3%<2%2.1%~ Close to target
Escalation rate12%<5%7%× Below target
System usageN/A>80%87%✓ Target met
Satisfaction3.2/5>4.0/54.2/5✓ Target met

What's Working Well

  • Time reduction exceeded target: 4.3 minutes vs. 5 minute target
  • Representatives adopted the system: 87% usage rate
  • Satisfaction improved: 4.2/5 vs. 3.2/5 baseline
  • Patricia queries dropped dramatically (15+/day to 3/day)
  • New representative Jennifer R. became productive quickly

What Needs Adjustment

  • Escalation rate (7%) still above target (5%)

    • Root cause: Complex cases where policy matching uncertain
    • Specific issue: Multi-condition returns where multiple policies apply
  • Error rate (2.1%) slightly above target (2%)

    • Root cause: Specific policy categories with calibration issues
    • Specific issue: Warranty vs. satisfaction guarantees confused

Data vs. Practitioner Feedback

What data tells us: Time improved dramatically. Accuracy improved moderately. Escalations reduced but not enough.

What practitioners tell us: "The system is right most of the time, but when it's wrong, I don't know how to tell." Representatives trust the system for simple cases but want more confidence information for complex ones.

The gap: Representatives aren't confident in the system's uncertainty. When matching confidence is low, they escalate rather than risk error. Confidence indicator (Should Have feature) would address this.


The Iteration Decision

Applying the Framework

OptionCriteriaR-01 Assessment
ContinueResults positive, expand scope4 of 5 metrics met or nearly met; not ready to expand yet
AdjustResults mixed, modify and retestCore value proven; specific gaps identified; clear fix path
PivotCore assumption wrongCore assumption validated (time reduction works)
StopOpportunity not viableValue demonstrated; stopping would waste proven progress

R-01 Decision: ADJUST

Rationale:

The core assumption, that automated policy lookup reduces representative time, is validated. Time improved from 14.2 minutes to 4.3 minutes. This is the foundation of the business case.

However, two metrics need improvement:

  • Escalation rate needs 2 percentage points reduction
  • Error rate needs 0.1 percentage point reduction

Both issues have identified root causes with clear remediation paths. Iteration will address them without rebuilding the core system.


Planning the Iteration

Iteration 1 Scope

Specific changes to make:

  1. Add confidence indicator

    • Display High/Medium/Low confidence for each policy match
    • Logic: High = single policy match, clear criteria; Medium = single match, some criteria ambiguous; Low = multiple policies apply, criteria unclear
    • Implementation: New UI element in CRM display; logic extension in matching engine
  2. Calibrate problem categories

    • Warranty vs. satisfaction guarantee: Add product category weighting
    • Multi-condition returns: Display all applicable policies rather than best match
    • Implementation: Policy database update; matching logic refinement
  3. Revise escalation guidance

    • Add "Review recommended" flag for Low confidence matches
    • Change escalation prompt from "Transfer to supervisor" to "Consider policy X before escalating"
    • Implementation: CRM display modification

What stays the same:

  • Core matching engine architecture
  • CRM integration approach
  • Override mechanism
  • Logging and tracking

Timeline: One Week

DayActivity
1-2Implement confidence indicator
3Calibrate problem categories
4Implement escalation guidance changes
5Internal testing and pilot preparation

Success Criteria for Iteration:

  • Escalation rate: <5% (target)
  • Error rate: <2% (target)
  • Representative confidence: "I can tell when to trust it"
  • No regression in metrics already meeting target

The R-01 First Iteration

Changes Implemented

Confidence Indicator:

Before: Policy display showed recommendation only

POLICY MATCH: 30-day return - full refund, original payment method
[Apply] [Override]

After: Policy display shows confidence level

POLICY MATCH: 30-day return - full refund, original payment method
Confidence: HIGH
[Apply] [Override]

-- or --

POLICY MATCH: Extended warranty claim OR satisfaction guarantee
Confidence: LOW - Multiple policies may apply
Review both policies before deciding
[View All] [Apply First] [Override]

Calibration Changes:

The warranty vs. satisfaction guarantee confusion stemmed from overlapping product categories. Calibration added:

  • Product purchase date weighting (warranties apply to newer products)
  • Customer history flag (satisfaction guarantees for repeat customers)
  • Price threshold (high-value items get more careful matching)

Escalation Guidance:

Low confidence matches now display: "This case may need additional review. Before escalating, check if [specific policy element] applies."

This gives representatives a path to resolution without defaulting to escalation.

Pilot Impact (Week 2 of Iteration)

MetricPilot 1Iteration TargetIteration Result
Escalation rate7%<5%4.8%
Error rate2.1%<2%1.7%
System usage87%>80%91%
Satisfaction4.2/5>4.0/54.4/5

Second Pilot Cycle

Abbreviated Second Cycle

With iteration 1 successful, a brief validation cycle confirmed:

  • All five metrics now meet targets
  • Representatives report improved confidence ("I know when to check")
  • Escalations that still occur are appropriate (genuinely complex cases)
  • Alex P. (identified skeptic) now advocates for the system

Results Trajectory

MetricBaselinePilot 1Iteration 1Trend
Task time14.2 min4.3 min4.1 minStable
Error rate4.3%2.1%1.7%Improving
Escalation12%7%4.8%Improving
UsageN/A87%91%Improving
Satisfaction3.2/54.2/54.4/5Improving

The learning loop is working. Each cycle produces measurable improvement.


Knowing When to Stop Iterating

Graduation Criteria Review

CriterionStatus
All quantitative targets met
Qualitative indicators positive
Practitioners would recommend to colleagues
Critical issues resolved
Remaining issues are minor/rare

Diminishing Returns Signal

Further iteration might improve metrics marginally:

  • Error rate could go from 1.7% to 1.5%
  • Satisfaction could go from 4.4 to 4.5

But these gains require disproportionate effort. The core value is proven. Additional refinement can happen after production deployment.

"Good Enough" Determination

R-01 is good enough for production because:

  • Core business case validated (time reduction: 70%)
  • All success metrics achieved
  • Practitioner adoption strong (91%)
  • Remaining friction is edge-case, not systemic
  • Iteration log shows diminishing issues per cycle

Preparing for Production

The system is ready to move beyond pilot. This means:

  • Broader rollout to all customer service representatives
  • Scaling support and monitoring
  • Transitioning from project to operations

Module 5B: REALIZE — Practice

O — Operate

Step 4: Production Preparation

The pilot succeeded. Iteration addressed the gaps. Metrics meet targets. Practitioners support the system. The question now: is R-01 ready for production?


Production Readiness Assessment

Technical Readiness Checklist

ItemRequirementR-01 Status
StabilityNo critical bugs in last 2 weeks✓ Zero critical issues in iteration cycle
PerformanceResponse time <2 seconds under expected load✓ Averaging 1.4 seconds
SecuritySecurity review completed, vulnerabilities addressed✓ CRM security applies; no new vulnerabilities introduced
BackupData backup and recovery tested✓ Policy database backed up nightly with CRM
MonitoringPerformance and error monitoring in place✓ CRM monitoring extended to new components
IntegrationAll integrations functioning reliably✓ Order Management and CRM integration stable
ScalabilityCan handle full user population⚠ Testing with 50 concurrent users passed; production may have 80+

Technical assessment: Ready with monitoring. Scalability is manageable risk. CRM handles current transaction volume; new components add minimal load.


Operational Readiness Checklist

ItemRequirementR-01 Status
Help deskSupport staff trained on new system✓ Help desk completed training; handled pilot issues
DocumentationUser guides and troubleshooting docs available✓ Quick reference guide and FAQ created
EscalationTechnical escalation path defined✓ IT support → CRM administrator → Development
MaintenanceMaintenance schedule and procedures documented✓ Weekly policy sync, monthly calibration review
OwnershipSystem owner assigned✓ Customer Service Manager owns; IT supports

Operational assessment: Ready. Pilot provided operational learning; documentation tested with real issues.


Organizational Readiness Checklist

ItemRequirementR-01 Status
TrainingTraining materials ready for all user groups✓ 15-minute self-paced module created
CommunicationDeployment communication plan executed⚠ Plan drafted; execution begins next week
LeadershipExecutive sponsor confirmed and engaged✓ Director of Customer Service committed
FeedbackFeedback collection mechanism in place✓ Feedback button in CRM; weekly review process
Success metricsOngoing measurement plan defined✓ Dashboard created; monthly reporting schedule

Organizational assessment: Ready pending communication execution.


The Deployment Case

Summarizing Pilot Results

R-01 pilot demonstrated:

MetricBaselineTargetFinal ResultChange
Task time14.2 min<5 min4.1 min-71%
Error rate4.3%<2%1.7%-2.6pp
Escalation rate12%<5%4.8%-7.2pp
System usageN/A>80%91%N/A
Satisfaction3.2/5>4.0/54.4/5+1.2

All targets achieved. All trends improving.

Comparison to Module 3 Projections

ProjectionModule 3 EstimatePilot ActualVariance
Time savings9.2 min/return10.1 min/return+10% (better)
Annual labor savings$76,176Est. $83,793*+10%
Error reduction value$15,480Est. $17,028*+10%
Focus improvement value$8,260Est. $9,086*+10%
Total annual value$97,516Est. $109,907+10%

Extrapolated from pilot; production results will confirm.

The business case is validated and exceeded.

Addressing Stakeholder Concerns

Concern: "What if it breaks?"

  • Response: CRM configuration means existing CRM reliability applies. Rollback procedure documented. Help desk trained. Monitoring in place.

Concern: "Are representatives ready?"

  • Response: 91% adoption in pilot. Training materials tested. Champions identified among pilot group to support peers.

Concern: "What about cases the pilot didn't cover?"

  • Response: Pilot included mix of case types and representative tenure. Edge cases will emerge; override and escalation paths handle them. Calibration process allows ongoing improvement.

Concern: "Can we really save this much time?"

  • Response: Pilot measured same way as baseline. Time reduction verified by observation and system data. Conservative extrapolation used.

Recommendation: Proceed with Production Deployment

Evidence supports deployment. Continued delay risks:

  • Pilot group creating two-tier service quality
  • Losing momentum and stakeholder attention
  • Patricia remaining as single point of failure

R-01 Production Deployment Plan

Rollout Sequence: Phased

Rather than full deployment to all 22 representatives simultaneously, roll out in two waves:

WaveRepresentativesTimelineRationale
110 representatives (including 8 pilot)Week 1-2Leverage pilot experience; champions support new users
2Remaining 12 representativesWeek 3-4Learn from Wave 1; full deployment

Why phased: Phased rollout limits risk and provides learning opportunity. Pilot representatives can support peers. Issues surface at smaller scale.

Timeline and Milestones

WeekMilestoneActivities
Week 1Wave 1 preparationCommunication; training scheduling; system verification
Week 2Wave 1 live10 representatives using system; intensive support
Week 3Wave 2 preparationWave 1 lessons incorporated; remaining training completed
Week 4Wave 2 liveAll 22 representatives using system; standard support
Week 5+StabilizationMonitoring; calibration adjustments; feedback review

Training and Communication Plan

Communication sequence:

  1. Leadership announcement (Director): rationale and commitment
  2. Department meeting: demonstration and Q&A
  3. Individual scheduling: training slot assignment
  4. Go-live notification: system availability confirmation

Training approach:

  • 15-minute self-paced module (mandatory)
  • 30-minute live Q&A session (optional but recommended)
  • Quick reference card at each workstation
  • Champion buddy assignment (pilot participant paired with new user)

Support and Monitoring Plan

First 30 days (intensive support):

  • Help desk priority queue for system issues
  • Daily check-in from project team
  • Weekly calibration review
  • Real-time usage monitoring

Ongoing support:

  • Standard help desk procedures
  • Monthly calibration review
  • Quarterly performance review
  • Annual system assessment

Handoff Documentation

What Operations Needs to Run the System

DocumentContentsAudience
System OverviewArchitecture, integrations, data flowsIT support
Maintenance ProceduresWeekly sync, monthly calibration, backup verificationCRM administrator
Performance ThresholdsResponse time targets, error rate thresholdsMonitoring team
Escalation MatrixWho to contact for what type of issueAll support staff

What Support Needs to Troubleshoot

DocumentContentsAudience
Troubleshooting GuideCommon issues and resolutionsHelp desk
Known LimitationsCases the system handles poorlyHelp desk, supervisors
Override ProtocolWhen and how to override recommendationsRepresentatives
Feedback ProcessHow to report issues and suggestionsAll users

What Training Needs to Onboard

DocumentContentsAudience
User GuideHow to use the system day-to-dayRepresentatives
Quick Reference CardKey actions on one pageRepresentatives
Training ModuleSelf-paced onboarding contentNew representatives
FAQCommon questions with answersAll users

What Leadership Needs to Track Success

DocumentContentsAudience
Success DashboardKey metrics, trends, alertsCustomer Service leadership
Monthly Report TemplateStandardized performance summaryDepartment leadership
Business Case ValidationActual vs. projected valueExecutive sponsor
Sustainability PlanLong-term ownership and monitoringOperations leadership

Success Metrics for Production

Metrics Continuing from Pilot

MetricTargetCollection MethodFrequency
Task time<5 minSystem timestamps + observationMonthly sample
Error rate<2%QA auditMonthly
Escalation rate<5%System loggingWeekly
System usage>80%System loggingWeekly
Satisfaction>4.0/5SurveyQuarterly

Additional Metrics for Scale

MetricTargetCollection MethodFrequency
System availability>99.5%System monitoringContinuous
Help desk volume<5 tickets/weekTicket trackingWeekly
Training completion100%Training systemUntil complete
Override rateTrend monitoringSystem loggingWeekly

Reporting Schedule

ReportAudienceFrequency
Operational dashboardOperations teamReal-time
Performance summaryCustomer Service ManagerWeekly
Executive summaryDirector, SponsorMonthly
Business case validationLeadership teamQuarterly

Escalation Triggers

ConditionAction
System availability <99%Immediate IT escalation
Error rate >3% for 2 consecutive weeksCalibration review
Satisfaction drops below 3.5/5User feedback review
Help desk volume >10 tickets/weekRoot cause analysis

Connection to Module 6

Production Deployment Is Not the End

Deployment delivers the capability. Sustainability preserves it.

Without intentional sustainability design:

  • Staff turnover erodes expertise
  • System updates break integrations
  • Calibration drifts as business changes
  • Monitoring attention fades
  • Value erodes gradually

Module 6 addresses these risks.

Handoff Artifacts for Module 6

R-01 delivers to Module 6:

  • Baseline metrics (from Module 3)
  • Pilot results and iteration log
  • Production deployment results (after stabilization)
  • Known risks and monitoring requirements
  • Ownership assignments and escalation paths

These artifacts become inputs for sustainability planning.


Module 5B: REALIZE — Practice

Transition to Module 6: NURTURE

What Module 5 Accomplished

Module 5 converted design into reality. The Workflow Blueprint from Module 4 became a working system with measured results.

The Journey:

  1. Built working prototype from blueprint

    • Scoped minimum viable prototype focusing on core assumption
    • Selected implementation approach (Configure CRM)
    • Constructed prototype within timeline discipline
  2. Tested with practitioners in real conditions

    • Designed pilot with representative group composition
    • Measured against Module 3 baselines
    • Collected quantitative and qualitative data
  3. Measured results against baseline

    • Time improvement: 14.2 min → 4.1 min (71% reduction)
    • Error improvement: 4.3% → 1.7% (2.6 percentage points)
    • Escalation improvement: 12% → 4.8% (7.2 percentage points)
    • Adoption: 91% system usage
    • Satisfaction: 4.4/5
  4. Iterated based on evidence

    • Interpreted pilot results systematically
    • Applied iteration decision framework (Adjust)
    • Implemented targeted improvements
    • Validated improvements in second cycle
  5. Prepared for production deployment

    • Verified technical, operational, and organizational readiness
    • Built deployment case with validated results
    • Created rollout plan and handoff documentation

R-01 Results Summary

Final Pilot Metrics vs. Baseline

MetricBaselineTargetFinal ResultChange
Task time14.2 min<5 min4.1 min-71%
Error rate4.3%<2%1.7%-2.6pp
Escalation rate12%<5%4.8%-7.2pp
System usageN/A>80%91%N/A
Satisfaction3.2/5>4.0/54.4/5+1.2

All targets achieved. Core assumption validated.

Final Metrics vs. Module 3 Projections

ElementModule 3 ProjectionModule 5 ResultVariance
Time savings9.2 min/return10.1 min/return+10%
Estimated annual value$97,516$109,907 (projected)+10%
Implementation cost$35,000~$12,000 (prototype)-66%
Payback period4.2 months~1.3 months (projected)-69%

Results exceeded projections. Business case strengthened.

Key Learnings from Iteration

  1. Confidence indicators matter more than accuracy alone. Representatives needed to know when to trust recommendations.

  2. Calibration requires ongoing attention. Policy categories drift; regular review catches problems early.

  3. Champions accelerate adoption. Skeptics who converted became the strongest advocates.

  4. Simple changes have outsized impact. The confidence indicator and escalation guidance took days to implement but moved escalation rate by 2+ percentage points.

Practitioner Feedback Summary

What worked:

  • "I don't have to interrupt Patricia anymore for routine questions."
  • "The confidence level tells me when to double-check."
  • "New staff can handle returns that used to require veteran knowledge."

What could be better:

  • "Some policies still need clearer language."
  • "Would be nice to see similar cases for complex situations." (Deferred feature)

Net assessment: Practitioners strongly prefer the new system.


Module 5B: REALIZE — Practice

T — Test

Measuring Implementation Quality

Module 5 built and tested the prototype. This section establishes how to measure whether the work is good: whether it produces results and whether it's done well.


Validating the Prototype

Before pilot begins, the prototype itself needs validation. Four questions:

Does it implement the blueprint specification?

The blueprint from Module 4 specified what the system should do. Validation confirms the prototype does it:

Blueprint RequirementPrototype Status
Accept return attributes✓ Implemented
Match to policy rules✓ Implemented
Return policy summary with confidence✓ Implemented
Display in CRM interface✓ Implemented
Capture override actions✓ Implemented

Any gaps between blueprint and prototype should be intentional (MVP scope) or flagged for remediation.

Does it function reliably?

Reliability means consistent behavior:

  • Same inputs produce same outputs
  • No unexplained failures
  • Error handling prevents crashes
  • Integration with other systems is stable

For R-01: 100 test transactions with zero failures required before pilot.

Is it usable by practitioners?

Usability means practitioners can complete their work:

  • Interface is comprehensible without documentation
  • Common tasks are efficient
  • Uncommon tasks are achievable
  • Error recovery is possible

For R-01: Three representatives complete five returns each without assistance.

Is it ready for pilot?

Pilot readiness means the system can support real work:

  • Data is loaded (policy database complete)
  • Training is available (quick reference ready)
  • Support is prepared (help desk briefed)
  • Feedback collection is ready (logging active)

Pilot readiness is not production readiness. Lower standards apply. The goal is learning.


Prototype Quality Metrics

Functional Completeness (vs. MVP Scope)

MVP FeatureImplementedTestedWorking
Policy matching
CRM display
Override mechanism
Performance (<2 sec)

Functional completeness = Features implemented and working / Features in MVP scope

Target: 100% before pilot begins.

Technical Stability

Stability MetricTargetActual
Failed transactions0 in 100 tests0
System errors0 critical in testing0
Response time variance<500ms340ms
Recovery from errorsGraceful degradation

Stability ensures the pilot tests the design, not the bugs.

Usability Assessment

Usability FactorMethodResult
Task completion3 reps × 5 returns15/15
Time to learnFirst successful return<10 min
Errors madeUser errors during test2 (both recovered)
SatisfactionPost-test rating4.2/5

Usability ensures practitioners can actually use what was built.

Integration Reliability

IntegrationTest MethodResult
CRM → Policy Engine100 transactions100% success
Order Management data50 order lookups100% success
Logging systemAction capture100% captured

Integration reliability ensures the prototype works in its ecosystem.


Leading Indicators (During Pilot)

Leading indicators predict ultimate outcomes. Watch them early to catch problems.

Early Adoption Signals

SignalWhat It MeansR-01 Target
Day 1 usageInitial willingness>60% of pilot group
Week 1 trendIncreasing or decreasing?Stable or increasing
Voluntary useUsing when not required>40%
Override rateTrust in recommendations<30%

Low early adoption may indicate training gaps, usability problems, or resistance. Address before measurement period.

Error/Issue Frequency

MetricEarly WarningR-01 Target
System errors>5/day<2/day
User-reported issues>3/day<1/day
Help desk contactsRising trendStable or declining
Workaround emergenceAny patternZero patterns

Rising issues require investigation. Stable or declining issues allow measurement to proceed.

Practitioner Engagement

SignalPositive IndicatorNegative Indicator
Questions asked"How do I...""Why do I have to..."
Suggestions offered"What if we could..."Silence
Peer discussionSharing tipsSharing complaints
Override patternsSpecific casesEverything

Engagement reveals whether practitioners are investing in the system or enduring it.

Iteration Velocity

MetricHealthyUnhealthy
Issues identifiedClear, specificVague, broad
Fix turnaroundDaysWeeks
Improvement validatedMeasurableAssumed
New issues emergingDecreasingIncreasing

Healthy iteration shows progress. Unhealthy iteration shows churn.


Lagging Indicators (After Pilot)

Lagging indicators confirm outcomes. They're the evidence for deployment decisions.

Time Improvement vs. Baseline

MetricBaselineTargetResultAssessment
Average task time14.2 min<5 min4.1 min✓ 71% improvement
Time varianceHighReducedReduced✓ More consistent
Peak time cases28 min<10 min8 min✓ Complex cases improved

Time improvement validates the core value proposition.

Quality Improvement vs. Baseline

MetricBaselineTargetResultAssessment
Error rate4.3%<2%1.7%✓ 60% reduction
Error severityMixReduced severeReduced✓ Remaining errors minor
Rework required8%<4%3.2%✓ Less rework

Quality improvement validates accuracy claims.

Focus Improvement vs. Baseline

MetricBaselineTargetResultAssessment
Escalation rate12%<5%4.8%✓ 60% reduction
SME queries15+/day<5/day3/day✓ Patricia freed
Context switchesHighReducedReduced✓ Less interruption

Focus improvement validates cognitive load reduction.

ROI Realization vs. Projection

ElementProjectedActualVariance
Time savings value$76,176/yr$83,793/yr+10%
Quality savings value$15,480/yr$17,028/yr+10%
Focus savings value$8,260/yr$9,086/yr+10%
Total annual value$97,516/yr$109,907/yr+10%

ROI realization validates the business case.


Red Flags

Red flags signal problems that may not be obvious in metrics.

Adoption Doesn't Improve Over Time

Week 1 usage was 65%. Week 4 usage is still 65%. Representatives haven't increased adoption despite familiarity.

What it means: The system isn't earning trust. Practitioners use it when required but don't prefer it.

Investigation: Why aren't practitioners choosing to use it? Usability? Accuracy? Workflow friction?

Same Issues Recur Across Iterations

Iteration 1 addressed policy matching accuracy. Iteration 2 addressed policy matching accuracy. Iteration 3 addressed policy matching accuracy.

What it means: The fix isn't working. Either the diagnosis is wrong or the solution is inadequate.

Investigation: Is this a design problem? An implementation problem? A scope problem?

Practitioners Develop Workarounds for the New System

Representatives are using the system but have developed their own verification steps: checking the Bible anyway, asking Patricia to confirm.

What it means: The system is part of the workflow but hasn't replaced the old process. It's added work, not reduced it.

Investigation: What creates the need for verification? Trust? Accuracy? Specific case types?

Results Plateau Below Targets

Time improved from 14.2 minutes to 7 minutes. Three iterations later, it's still 7 minutes. Progress has stopped short of the 5-minute target.

What it means: The current approach has limits. More iteration won't reach the target.

Investigation: What's creating the floor? Is the target realistic? Does the approach need to change?


Module 5B: REALIZE — Practice

S — Share

Consolidation Exercises

Learning solidifies through application and teaching. These exercises help integrate Module 5 concepts into your practice.


Module 5 Key Takeaways

These principles should guide your implementation practice:

1. Progress Over Perfection

A shipped prototype beats a perfect plan. The goal is learning. Every day spent polishing instead of testing is a day of learning lost.

2. One Visible Win Earns the Right to Continue

Skeptics don't convert through arguments. They convert through evidence. A small success demonstrated is worth more than a large success promised.

3. Prototype Is for Learning, Not Production

Prototypes aren't scaled-down production systems. They're learning instruments. Build them for speed and flexibility, not durability and performance.

4. Iteration Based on Evidence, Not Opinion

When pilots generate data, use it. Don't iterate based on hunches or preferences. Don't dismiss data because it's inconvenient. Let evidence drive decisions.

5. Pilot Is a Means, Not an End

Pilots exist to validate solutions for broader deployment. A pilot that never ends is an exception that consumes resources while denying benefits to everyone else.