REALIZE — From Design to Reality
Building, testing, and proving value quickly
Module 4 taught you to design the right solution. This module teaches you to ship it before perfection kills it.
The Perfect System That Never Shipped
Nathan Okafor did everything right. As Director of Practice Technology at Cascade Legal Partners, he spent months on assessment, documented the friction in client intake across five disconnected systems, and built an airtight business case: 4.3 hours average intake time, 23% prospect abandonment, $1.8 million in annual losses from delayed billing and lost conversions. The executive committee approved full funding. Implementation began with adequate budget, visible sponsorship from the managing partner, and a September go-live target.
By November, eighteen months after approval, the system existed only in a test environment that three people used. The project had consumed $400,000 and had yet to process a single real client. Here's what happened: the scope expanded from six features to twenty-three. Each addition was individually justified. Automated conflict check requests? Three weeks, clear win. Conflict response tracking? Three weeks, clear win. Billing system integration? Four weeks, clear win. Every expansion addressed a real friction point, was requested by someone with legitimate authority, and pushed the timeline further out while core functionality remained untested. The original six-week implementation plan became an eighteen-month construction project. The team kept building. They stopped learning.
Meanwhile, the people they were building for moved on. Rachel Torres, the senior intake coordinator who had spent hours in design sessions and advocated for the project with skeptical colleagues, stopped checking in around month eight. She had clients waiting. She built her own workarounds: spreadsheets, email folders, a color-coded calendar system that made sense only to her. Inefficient by design standards, but functional. When Nathan finally reached out about pilot testing, Rachel hesitated. "I've built my own workarounds at this point. The new system would have to be significantly better than what I've cobbled together, or the transition cost isn't worth it." The project's strongest champion had become its most reluctant tester. Managing Partner Elena Reyes followed a similar arc. Eighteen months of progress reports with no visible results exhausted her attention. By the time the system was "ready," she had moved on to other priorities. The executive sponsor wasn't lost to conflict. She was lost to time.
The clarity came from Marcus Webb, a third-year associate with no stake in the project's history. "What's the one thing that proves it works?" he asked. The room went quiet. Rachel answered from the back: "Intake time. That's what started this. If the new system cuts that in half, everything else follows." Marcus followed up: "So what's the smallest version that proves intake time goes down? That's the pilot. Everything else is Phase 2." Nathan checked the project plan. Form routing, the feature that addressed the original problem, had been complete for seven months. It had been sitting in test while the team built around it. The lead developer confirmed: two weeks to deploy, maybe less. It was done. They just never turned it on.
Two weeks later, Rachel's team started using form routing. Intake time dropped from 4.3 hours to 2.1 hours. The system that saved Cascade Legal Partners $1.1 million annually started with a two-week deployment that did one thing well. Everything else came later, justified by results.
The Anchor Principle
Organizations fund projects based on projected value. They continue funding based on demonstrated value. The gap between projection and demonstration is where projects die.
Ship the smallest thing that proves value. Then expand. Nathan's eighteen-month journey could have been a six-week sprint if he had understood that progress is measured in value delivered, not capability accumulated. A system in test is a system at rest.
Three Concepts That Matter
The scope creep trap. Each addition to Nathan's project was individually reasonable. The aggregate effect was fatal. Scope expands through a series of small, justified decisions that feel like progress. The conflicts team needed integration. The billing team needed data flow. Each request came from someone with legitimate authority. The discipline is categorization: Phase 1 tests the core assumption; everything else is Phase 2. "Not no, but not yet."
Champion erosion. Champions have a shelf life. Rachel Torres went from enthusiastic advocate to reluctant participant in eight months. Elena Reyes went from executive sponsor to disengaged bystander in eighteen. Delay doesn't just cost time; it costs you the people who would have carried the project forward. Every month without visible results erodes the political capital and personal investment that made the project possible in the first place. You cannot bank enthusiasm. You spend it or you lose it.
Interactive Exercise
Champion Erosion Clock
0
Months Without Results
Rachel
Practitioner Champion
"This is exactly what we needed."
Elena
Executive Sponsor
"Full support. Keep me posted on progress."
The Skeptic
Senior Staff
"Show me it works. I have seen this before."
"Ready for testing" vs. "ready for production." Nathan's team kept finding things that didn't work yet, so they kept delaying the pilot. They believed the pilot was supposed to test whether the system worked. They had it backwards. The pilot was supposed to reveal what didn't work. That was the point. A pilot that tests a complete system is a soft launch. A soft launch requires a complete system, which they were never going to have. The prototype's job is to learn, not to impress.
The Deliverable
Module 5 produces a Working Prototype with measured before/after results.
This is evidence, not a plan. The business case projected value; the prototype proves it. The blueprint specified the design; the prototype validates it. Before/after measurement using Module 3's baselines converts "we think this will work" into "here's what happened when we tried it."
Speed matters because stakeholder patience is finite, champions erode, and competitors don't wait. A working system that does one thing well creates more organizational energy than a promised system that does many things eventually. The form routing deployment in month eighteen created immediate enthusiasm. That enthusiasm fueled everything that followed. The momentum came from proof, not promises.
Build to learn. Ship to prove. Iterate to improve.
Interactive Exercise
Build vs. Ship
Interactive Timeline
Scope Creep Timeline
Nathan Okafor’s team at Cascade Legal Partners set out to build a case management system with six features. Eight months later, they had 20 features — and the original problem was still unsolved.
Step through the timeline to watch scope creep happen in real time. Pay attention to the project health gauges — they tell the story the feature list doesn’t.
Module 5A: REALIZE — Theory
R — Reveal
Case Study: The Perfect System That Never Shipped
The implementation at Cascade Legal Partners should have been a success story.
Nathan Okafor had done everything by the book. As Director of Practice Technology, he had spent six months on assessment, observing how attorneys and paralegals actually conducted client intake, documenting the friction in the current process, cataloging the shadow systems that had accumulated over years of inadequate tooling. His Opportunity Portfolio identified the central problem: intake coordination required manual handoffs across five different systems, creating delays that cost the firm an estimated $1.8 million annually in delayed billing and lost client conversions.
The business case was airtight. Nathan had measured baselines with rigor: 4.3 hours average intake time, 23% of prospects abandoning during the process, $340 average administrative cost per new client. His value model projected $1.1 million in annual savings with a 62% reduction in intake time, plus capacity recovery that would allow the intake team to handle 40% more volume without additional headcount.
The workflow design had been exemplary. Nathan's team had mapped the current state in granular detail, identified friction points through practitioner observation, and designed a future state that preserved attorney judgment while automating information flow. The blueprint had been validated with attorneys, paralegals, and intake coordinators who would actually use the system. They had concerns. Everyone has concerns. They also saw the potential.
The executive committee approved full funding in February. Implementation began in March with adequate budget, visible sponsorship from the managing partner, and a target go-live of September.
By November, eighteen months after approval, the system existed only in a test environment that three people used. The September deadline had been pushed to December, then March, then "when it's ready." The project had consumed $400,000, more than the original budget, and had yet to process a single real client.
The intake process still ran on the same five disconnected systems. The shadow workarounds persisted. And Nathan's most enthusiastic early supporters had stopped attending project meetings.
What Went Wrong
The system that Nathan's team built was impressive. It did everything the blueprint specified, and considerably more.
In the months between design approval and September's original target, the scope had expanded in ways that seemed reasonable at each decision point.
The original design called for automated intake form routing. During development, someone realized that if they were routing forms, they could also generate conflict check requests automatically. Adding that feature took three weeks but eliminated a manual step. It seemed like a clear win.
Then the conflicts team asked: if the system was generating conflict requests, could it also track conflict responses and flag overdue checks? Another three weeks. Another clear win.
The billing team noticed the project and requested integration with their time-entry system, so intake data could pre-populate client matter records. Four weeks. Another clear win.
Each addition made sense in isolation. Each addressed a real friction point. Each was justified by someone with legitimate authority to make requests. And each pushed the timeline further out while the core functionality remained untested.
By September, the original six-week implementation plan had expanded to cover twenty-three distinct feature sets. The system could do remarkable things, things the original blueprint never contemplated. What it couldn't do was ship.
The Testing Trap
Nathan had planned for pilot testing. The project timeline included a four-week pilot with a small group of users before full rollout.
But the pilot never happened as designed. Every time the team approached pilot readiness, someone identified another gap.
"We can't pilot without the conflict integration. Attorneys won't trust the system if it doesn't handle conflicts."
"We can't pilot without the billing connection. The intake team will have to double-enter everything."
"We can't pilot without the client portal. That's what prospects will actually see."
Each objection was valid. Each pushed the pilot date further out. And each revealed a fundamental confusion about what the pilot was for.
Nathan's team believed the pilot was supposed to test whether the system worked. They kept finding things that didn't work yet, so they kept delaying the pilot.
What they didn't understand: the pilot was supposed to reveal what didn't work. That was the point. A pilot that tests a complete system is a soft launch. And a soft launch requires a complete system, which they were never going to have.
The team had confused "ready for testing" with "ready for production." They kept waiting for perfection before subjecting the system to reality.
The Patience Problem
Nine months into development, Managing Partner Elena Reyes asked Nathan for a status update.
"We're making excellent progress," he told her. "The system architecture is sophisticated, the integrations are complex, and we're working through the edge cases. We want to make sure we get this right."
Elena nodded. She trusted Nathan. But she also had partners asking why the firm had spent $300,000 on technology that no one was using. She had an intake team wondering if the promised improvements would ever arrive. She had client acquisition metrics that hadn't improved despite the investment.
"When will we see results?" she asked.
"The pilot is targeted for March," Nathan said. "Full rollout by June."
By March, Elena had moved on to other priorities. She had hired a new operations director whose mandate included "getting technology projects under control." The intake improvement budget was frozen pending review. When Nathan scheduled a meeting to discuss pilot launch, Elena's assistant responded that the managing partner was focused on other initiatives but wished the project well.
The executive sponsor hadn't been lost to conflict or opposition. She had been lost to time. Eighteen months of progress reports with no visible results had exhausted her political capital and attention. By the time the system was "ready," the organization had stopped caring.
The Hidden Costs
While Nathan's team built in isolation, the practitioners they were supposed to serve developed their own solutions.
Rachel Torres, the senior intake coordinator, had been one of Nathan's early champions. She had spent hours in design sessions, contributed expertise to the workflow mapping, and advocated for the project with skeptical colleagues. In the early months, she checked in regularly, eager to see progress.
By month eight, Rachel had stopped asking. She had work to do. Clients were waiting. The current system was terrible, but it was the system she had.
When Nathan finally reached out to schedule pilot testing, Rachel hesitated. "I've built my own workarounds at this point," she said. "The new system would have to be significantly better than what I've cobbled together, or the transition cost isn't worth it."
Her workarounds were inefficient by design standards: spreadsheets and email folders and a color-coded calendar system that made sense only to her. But they worked. She had adapted to the friction rather than waiting for the friction to be solved.
Rachel wasn't resisting change. She was surviving. And survival had made her less available to test something that might or might not eventually help.
The champions hadn't turned hostile. They had simply moved on.
The Moment of Clarity
The intervention came from an unlikely source.
Marcus Webb was a third-year associate who had joined the firm after the project began. He had no investment in the system's success or failure, no stake in the decisions that had brought it here. He had simply been assigned to help with testing and noticed something that insiders couldn't see.
"What problem are we testing for?" Marcus asked during a project review meeting.
"What do you mean?" Nathan replied.
"I've been using the test system for a week. It does a lot of things. But what's the one thing that proves it works? If we deployed this tomorrow and I could show you one number that proved value, what would that number be?"
The room was quiet. Nathan realized he didn't have a clear answer. The system did many things. He couldn't point to the one thing that mattered most.
"Intake time," Rachel said from the back of the room. "That's what started this. 4.3 hours average. If the new system cuts that in half, everything else follows. Better conversion, lower cost, happier clients. But we've been so focused on features that we forgot about the original problem."
Marcus nodded. "So what's the smallest version of this system that proves intake time goes down? That's the pilot. Everything else is Phase 2."
Nathan started to object. There were dependencies, integrations, features that users expected. But he stopped himself.
Twenty-three feature sets. Eighteen months. Four hundred thousand dollars. And the original problem, 4.3 hours average intake time, remained unsolved.
"What would that minimal version look like?" he asked.
"Form routing," Rachel said. "That's where the delay starts. If forms move automatically to the right person, intake time drops. The conflict integration is nice. The billing connection is nice. The client portal is nice. But form routing is the problem we set out to solve."
Nathan looked at the project plan. Form routing had been complete for seven months. It had been sitting in test while the team built features around it.
"How long to deploy just the form routing to your team?" he asked.
"Two weeks," said the lead developer. "Maybe less. It's done. We just never turned it on."
The One Visible Win
Nathan made the call that afternoon.
The project would split into two phases. Phase 1 was form routing, just form routing, deployed to Rachel's intake team within two weeks. No conflict integration. No billing connection. No client portal. Just the original problem, solved.
Phase 2 would include everything else. But Phase 2 would wait until Phase 1 proved value.
The pushback was immediate. The conflicts team had been promised integration. The billing team had been promised data flow. Other stakeholders had been waiting eighteen months for features that were now being deferred.
"We've already built it," the billing manager pointed out. "Why not include it?"
"Because including it means not shipping," Nathan said. "And not shipping means we keep running on the old system while the new system sits in test. We've proven we can build complex software. We haven't proven we can improve intake time. That's what has to happen first."
Two weeks later, Rachel's team started using the form routing system.
The results were immediate and measurable. Intake time dropped from 4.3 hours to 2.1 hours. The system was simple, but forms that previously sat in email queues now moved automatically to the right person. The bottleneck had been simple; the solution was simple.
Rachel sent Nathan a message after the first week: "This is what we needed eighteen months ago. More is coming, right?"
More was coming. But now "more" would be added to a working system, not a theoretical one. Each new feature would prove value before the next was added. The team would ship, measure, learn, and iterate.
When Nathan presented the Phase 1 results to Elena Reyes, she had a single question: "Why did this take so long?"
Nathan didn't have a good answer. But he had a better approach now.
"It won't happen again," he said. "From now on, we ship small and prove value before we build big."
The system that saved Cascade Legal Partners $1.1 million annually started with a two-week deployment that did one thing well. Everything else came later, justified by results.
The Lesson
Nathan's team had confused building with progress.
They had spent eighteen months constructing an impressive system that solved many problems, tested few assumptions, and delivered no results. Every decision to add scope, every delay waiting for completeness, every extension of the timeline had felt like progress. The system grew more capable each week.
Value comes from outcomes delivered, not capability accumulated. A system in test is still a system at rest.
The pilot that finally shipped tested one assumption: that automated form routing would reduce intake time. It did. That single validated assumption earned the right to continue. Everything that followed was built on proof, not projection.
The goal is a working system that proves value quickly enough to earn the right to continue. One visible win buys time, builds trust, and creates the foundation for everything that comes next.
Nathan's eighteen-month journey could have been a six-week sprint, if he had understood from the beginning that progress is measured in value delivered.
End of Case Study
Module 5A: REALIZE — Theory
O — Observe
Core Principles of Rapid Implementation
Module 5's anchor principle: One visible win earns the right to continue.
The business case secured approval. The workflow design earned validation. But approval and validation don't create value. Building creates value, and building requires a different mindset than planning.
The Cascade Legal Partners case illustrates the trap: eighteen months of building, zero months of learning. The team confused construction with progress, capability with value, completeness with readiness. They built an impressive system that solved many problems while proving nothing.
Module 5 provides the discipline of implementation: how to move from validated design to working prototype to production deployment, creating value at each step rather than waiting until everything is complete.
The Prototype Mindset
A Prototype Is a Learning Vehicle
A prototype is a tool for testing assumptions, a vehicle for learning whether the design actually works when it meets reality.
This distinction matters because it changes what "good" looks like. A prototype that reveals the design is wrong has succeeded. A prototype that hides problems until production has failed. The goal is to learn something true.
Nathan's team at Cascade built impressive software. They didn't learn whether automated form routing would reduce intake time until month eighteen, when the answer could have been known in month two.
Validated Learning Over Comprehensive Functionality
Every design embeds assumptions: practitioners will use the system this way; the technology will perform at this speed; the workflow will reduce friction at this point. These assumptions can be stated with confidence during design. They can only be validated through building and testing.
The prototype's purpose is to validate the assumptions that matter most. The ones the business case depends on. The ones that will determine success or failure.
For R-01 (Returns Bible), the critical assumption is that automated policy lookup will reduce representative time from 14.2 minutes to under 5 minutes. A prototype that tests this assumption, even a rough one, creates more value than a polished system that tests everything except this.
Speed Beats Completeness
When testing assumptions, speed matters more than completeness. A quick test that reveals a wrong assumption saves months of building on a flawed foundation. A slow test that confirms a right assumption arrives too late to matter.
This is counterintuitive for teams trained in quality: "We should do it right the first time." But "right" in prototype means "fast enough to learn while we still have time to adjust."
The Cascade team spent seven months with working form routing in test. They delayed learning because they wanted to learn everything at once. The result: they learned nothing until it was almost too late.
Permission to Build Something Imperfect
Prototyping requires organizational permission to build imperfect things. Teams trained on production quality standards struggle with this. They know how to build things right; they don't know how to build things fast and iterate toward right.
This permission must be explicit. Without it, teams will default to quality standards that make prototyping impossible. They will add features to avoid shipping something incomplete. They will delay testing to avoid showing something flawed.
"Perfect is the enemy of good" is a cliché. In prototyping, it's a survival rule.
The One Visible Win Principle
Early Value Earns Continuation
Organizations fund projects based on projected value. They continue funding based on demonstrated value. The gap between projection and demonstration is where projects die.
Nathan had executive support in February. By November, that support had evaporated. Time, not conflict, was the cause. Eighteen months of progress reports with no visible results exhausted stakeholder patience. When results finally arrived, the stakeholders had moved on.
A visible win early in implementation changes this dynamic. It converts projection into evidence. It gives stakeholders something to point to when questions arise. It builds momentum that carries the project through inevitable setbacks.
Stakeholder Patience Is Finite
Organizations have limited attention. Executives sponsor many initiatives. Every project competes for mindshare with every other project.
A project that takes months to show results must compete for attention the entire time. It must justify its continued existence against alternatives that might deliver faster. It must survive leadership changes, budget reviews, and shifting priorities, all before proving it deserves survival.
The one visible win shortens the window of vulnerability. It moves the project from "promising but unproven" to "proven and expanding." That transition happens not when the system is complete, but when it delivers measurable value.
Small Success Builds Momentum
A working system that does one thing well creates more organizational energy than a promised system that does many things eventually.
Rachel Torres stopped advocating for the Cascade project around month eight. By the time form routing shipped, she had built her own workarounds and lost interest. The project's strongest champion became a skeptic. Exhaustion, not opposition, drove the shift.
The form routing deployment in month eighteen created immediate enthusiasm. "This is what we needed." That enthusiasm fueled Phase 2 engagement. The momentum came not from promises, but from proof.
What Counts as a Visible Win
A visible win must be:
- Measurable: Not "things feel better" but "intake time dropped from 4.3 hours to 2.1 hours"
- Attributable: Clearly connected to the new system, not to other changes
- Meaningful: Addressing a problem practitioners actually care about
- Communicable: Easy to explain to stakeholders who aren't deeply involved
For R-01, a visible win might be: representatives can now answer policy questions in 3 minutes instead of 14 minutes. Measurable, attributable, meaningful, communicable.
Iteration Over Perfection
First Version Will Be Wrong
No design survives contact with reality unchanged. Users will behave differently than expected. Technology will perform differently than specified. Edge cases will emerge that no one anticipated.
This is normal. The first version will need adjustment. The question is how quickly adjustments can be made.
Teams that expect perfection on first release treat every problem as evidence of inadequate planning. They respond to problems by retreating to more planning. Teams that expect iteration treat every problem as information. They respond to problems by adjusting and retesting.
Problems in Prototype Are Learning
The Cascade team found problems during testing and delayed launch. They treated problems as evidence the system wasn't ready.
The correct interpretation: problems discovered in testing are problems discovered cheaply. Problems that emerge in production are problems discovered expensively. The prototype's job is to find problems, as many as possible, as quickly as possible, while they can still be addressed without damaging live operations.
A prototype that runs for weeks without revealing problems isn't well-built. It's under-tested.
Build-Measure-Learn Cycles
Each iteration follows a cycle:
- Build: Implement the next increment
- Measure: Collect data on what happened
- Learn: Interpret data and decide next action
The speed of this cycle determines learning velocity. A team that completes one cycle per month learns twelve things per year. A team that completes one cycle per week learns fifty things per year.
Cascade's team completed something like one-third of a cycle in eighteen months. They built extensively, measured minimally, and learned almost nothing.
The Cost of Being Wrong
Being wrong early is cheap. The form routing assumption could have been tested in week three with a small group of users. If wrong, the team would have learned it with minimal investment. If right, they would have had sixteen months to build on a proven foundation.
Being wrong late is expensive. Cascade spent $400,000 building features around a core assumption that remained untested. If form routing hadn't worked, most of that investment would have been wasted.
The prototype de-risks implementation by being wrong early, often, and cheaply.
Fast Failure as Strategy
Finding What Doesn't Work Is Valuable
Negative results are results. An assumption that proves wrong is an assumption you no longer need to build around. A feature that practitioners reject is a feature you don't need to maintain.
Teams avoid testing because they fear failure. But skipping tests doesn't prevent failure. It just delays discovery.
Fail Fast, Fail Cheap, Fail Forward
- Fail fast: Test assumptions as early as possible
- Fail cheap: Test with minimal investment
- Fail forward: Each failure teaches something that improves the next attempt
The Cascade team eventually failed forward. Their Phase 1 launch taught them how to implement effectively. But they paid for eighteen months of learning-avoidance first.
Creating Conditions for Productive Failure
Productive failure requires:
- Psychological safety: People can report problems without blame
- Quick feedback loops: Problems surface rapidly, not months later
- Iteration capability: The system can be changed based on what's learned
- Clear success criteria: Teams know what they're testing for
Without these conditions, teams hide problems rather than surfacing them. Problems that can't be discussed can't be solved.
From Pilot to Production
Pilots That Stay Pilots Forever
A pilot is a test run, a limited deployment to validate assumptions before full rollout. By definition, a pilot has an end date.
But pilots frequently become permanent. "Just a few more tweaks" becomes an indefinite state. The pilot serves a small group forever while the broader organization waits indefinitely.
This happens when teams lack clear graduation criteria. Without defined thresholds, there's always another reason to delay. Another edge case. Another feature request. Another optimization opportunity.
Define Graduation Criteria Before Starting
Before pilot begins, define what success looks like:
- What metrics must reach what thresholds?
- What practitioner feedback constitutes validation?
- What timeline is acceptable?
- Who decides when criteria are met?
Without these criteria, the pilot can never end because success is undefined.
The Pilot Is Not the Destination
The pilot exists to earn the right to production. It's a means, not an end.
Teams that forget this optimize for pilot success rather than production readiness. They build solutions that work for ten users but won't scale to one hundred. They provide support levels that can't be sustained at full deployment. They create a permanent pilot that serves a small group while the original problem persists for everyone else.
Building Toward Scale from Day One
Even in prototype, consider scale:
- Will this architecture support full deployment?
- Can this support model be sustained?
- Does this training approach work for everyone, not just early adopters?
The goal is to avoid building in ways that make production impossible.
Summary: The Module 5 Mindset
| From | To |
|---|---|
| Build everything, then test | Test one thing, then build more |
| Wait until ready | Ship when valuable |
| Problems indicate failure | Problems indicate learning |
| Perfect first release | Iterative improvement |
| Pilot as destination | Pilot as gate to production |
The discipline of Module 5 is progress over perfection: earning the right to continue through demonstrated value rather than promised capability.
Nathan's team at Cascade had everything they needed: good assessment, good business case, good design, adequate resources. What they lacked was the discipline to ship small, prove value, and build on success.
One visible win in month two would have justified eighteen months of development. Instead, eighteen months of development struggled to justify itself.
Build to learn. Ship to prove. Iterate to improve. That's the Module 5 mindset.
Module 5A: REALIZE — Theory
O — Observe
Prototype Construction
The blueprint specifies what to build. This section addresses how to build it: the methodology of translating design into working prototype while maintaining the discipline of speed over completeness.
Minimum Viable Prototype
What "Minimum" Means
Minimum is not "as little as possible." It's "the smallest scope that tests the core assumption."
The core assumption is the one the business case depends on. For R-01, the core assumption is that automated policy lookup reduces representative time. A minimum viable prototype tests this assumption. It skips every other assumption, every other feature, every edge case.
To identify minimum scope, ask: "What is the one thing that must prove true for this opportunity to deliver value?" Everything that tests this assumption is in scope. Everything else is out of scope for the first prototype.
This is harder than it sounds. Teams identify many things that seem essential:
- "We can't test without X because users expect it."
- "We can't deploy without Y because it's part of the workflow."
- "We need Z or the data won't be accurate."
Each may be true for production. None is necessarily true for prototype. The prototype's job is to learn, not to impress.
What "Viable" Means
Viable means functional enough to generate real feedback. A prototype that doesn't work isn't viable. A prototype that works but can't be used by real people on real tasks isn't viable.
The threshold is usability, not polish. Can practitioners complete actual work using this prototype? Will the experience generate meaningful feedback about whether the design works?
For R-01, a viable prototype would:
- Accept return attributes from representatives
- Match attributes to policy rules
- Display relevant policy information
- Allow representatives to make decisions based on displayed information
It would not need:
- Perfect policy matching accuracy (learning will improve this)
- Integration with every downstream system
- Polished user interface
- Complete exception handling
The Discipline of Cutting Scope
Scope cutting requires discipline because every omitted feature has an advocate. The conflicts team wants integration. The billing team wants data flow. The training team wants onboarding support.
These requests are legitimate. They will eventually be addressed. But addressing them now delays learning about the core assumption.
The discipline: "Not no, but not yet." Every feature request gets categorized:
- Phase 1 (MVP): Tests core assumption
- Phase 2: Enhances validated solution
- Future: Valuable but not urgent
This categorization must be visible and respected. Scope creep begins when categories blur.
Features to Include vs. Defer vs. Never Build
| Category | Criteria | Example (R-01) |
|---|---|---|
| Include | Tests core assumption | Policy lookup and display |
| Include | Required for testing to function | Basic CRM integration |
| Defer | Valuable but not required for test | Billing system integration |
| Defer | Edge case handling | Complex exception workflows |
| Never | Requested but unnecessary | Individual override tracking |
"Never build" requires courage. Some requested features add complexity without value, or they conflict with design principles. Identifying these early prevents scope creep later.
Build vs. Buy vs. Configure
When to Build Custom
Build custom when:
- Requirements are unique to your organization
- No existing tool addresses the core workflow
- Integration requirements make external tools impractical
- Long-term ownership and flexibility matter
Building provides maximum control but maximum cost. Custom solutions require development resources, ongoing maintenance, and organizational capability to support.
For R-01: Building custom might mean developing a policy engine specifically for Lakewood Medical Supply's returns policies. This provides exact fit but requires sustained investment.
When to Purchase Existing Tools
Buy when:
- Standard solutions address 80%+ of requirements
- Time-to-value matters more than perfect fit
- Vendor ecosystem provides ongoing innovation
- Internal capability to build and maintain is limited
Purchasing provides faster deployment but less flexibility. The organization adapts to the tool rather than the tool adapting to the organization.
For R-01: Purchasing might mean acquiring a customer service knowledge base tool with policy matching capabilities. Faster deployment, but may require workflow adaptation.
When to Configure Existing Platforms
Configure when:
- Platforms already in use have relevant capabilities
- Configuration provides adequate functionality
- Integration is simplified by staying within platform
- Total cost of ownership favors leverage over purchase
Configuration provides the fastest path when platforms are capable. Many organizations have tools with untapped features that address current needs.
For R-01: Configuration might mean extending the existing CRM to display policy information through custom fields and automation rules. Fastest path if the CRM platform supports it.
Decision Framework
| Factor | Build | Buy | Configure |
|---|---|---|---|
| Time to prototype | Slowest | Medium | Fastest |
| Fit to requirements | Exact | Approximate | Variable |
| Ongoing cost | Highest | Medium | Lowest |
| Flexibility | Highest | Limited | Limited |
| Internal capability required | Highest | Low | Medium |
The right choice depends on context. A team with strong development capability might build. A team with limited resources might configure. Neither is universally correct.
The R-01 Example
R-01 could be implemented through any path:
Option A: Configure existing CRM
- Add policy database as custom object
- Create automation rules to match return attributes to policies
- Display policy information in customer service interface
- Timeline: 3-4 weeks to prototype
Option B: Purchase knowledge management tool
- Acquire tool designed for policy/knowledge management
- Integrate with existing CRM through API
- Configure matching rules within new tool
- Timeline: 6-8 weeks to prototype
Option C: Build custom integration layer
- Develop policy engine with custom matching logic
- Build integration layer connecting Order Management, CRM, and policy database
- Create custom interface for policy display
- Timeline: 10-12 weeks to prototype
For MVP purposes, Option A is likely preferred. It's fastest to prototype and tests the core assumption. If prototype validates the assumption, later phases might evolve toward Option C for greater capability.
Integration Strategy
Connecting to Existing Systems
Prototypes rarely exist in isolation. They must connect to existing systems for data, for workflow, for context.
Integration approach significantly affects timeline and complexity:
API-First Integration
- Clean separation between systems
- Well-defined interfaces
- Changes in one system don't break others
- Requires API availability and documentation
Manual Bridge
- Human intermediary handles data transfer
- Faster to implement for prototype
- Doesn't scale to production
- Useful for testing assumptions before investing in integration
Data Export/Import
- Batch transfer of data between systems
- Simpler than real-time integration
- May be sufficient for prototype testing
- Production may require more sophisticated approach
Handling Integration Constraints
Integration often reveals constraints that aren't visible during design:
- APIs that don't exist or don't expose needed data
- Security policies that prevent direct connection
- Performance limitations that affect user experience
- Data format mismatches that require transformation
For prototype, the response to constraints should prioritize speed:
- Can we work around this constraint for testing purposes?
- Can we simulate the integration to test the workflow?
- Can we use manual processes temporarily to validate the design?
The goal is testing the core assumption, not solving every integration challenge.
When Integration Complexity Should Reduce Scope
Sometimes integration complexity exceeds prototype value. A planned integration that would take eight weeks might be better replaced by a manual workaround that takes three days.
The question: "Does this integration test our core assumption, or is it infrastructure for later phases?"
If it's infrastructure for later phases, defer it. The prototype should answer the essential question with minimum investment.
Technology Selection Process
Evaluating Against Blueprint Requirements
The Module 4 blueprint specifies requirements in tool-agnostic terms. Technology selection evaluates available options against these requirements.
Evaluation criteria derived from blueprint:
- Does it meet functional requirements?
- Does it integrate with specified systems?
- Does it meet performance requirements?
- Does it respect specified constraints?
Secondary criteria for prototype:
- How quickly can we deploy?
- How easily can we iterate?
- What learning curve does the team face?
- What risks does this choice introduce?
Avoiding Vendor-Driven Design
Technology vendors have capabilities they want to demonstrate. Sales processes emphasize what tools can do, not what you need done.
The danger: selecting a tool and then redesigning the workflow to fit the tool's strengths. This inverts the correct sequence (design workflow, then select tool).
Protection: evaluate against blueprint requirements, not vendor demonstrations. Ask "Does this tool do what our blueprint specifies?" not "What can this tool do?"
Proof-of-Concept Before Commitment
Major technology investments should be preceded by proof-of-concept: a limited test that validates the tool can actually deliver what's needed.
The proof-of-concept tests:
- Can the tool handle your specific data and workflows?
- Does performance meet requirements under realistic conditions?
- Can your team configure and operate it effectively?
- Do hidden constraints or limitations emerge?
This test should happen before contract signing, not after. Vendors are motivated to support proof-of-concept because it advances the sale. Use this motivation.
The "Good Enough" Threshold
No tool is perfect. Selection requires identifying what matters most and accepting limitations in what matters less.
For prototype, "good enough" means:
- Tests the core assumption
- Can be deployed within timeline
- Supports iteration based on learning
- Doesn't introduce risks that could sink the project
Production may require higher standards. Prototype requires faster decisions.
Module 5A: REALIZE — Theory
O — Observe
T — Testing Frameworks
Building the prototype is half the work. Testing it effectively, gathering the data that validates or refutes assumptions, is the other half. This section covers how to test prototypes in ways that generate actionable learning.
T — Testing Human-AI Workflows
Different from Testing Pure Software
Software testing asks: "Does the system function as specified?" Human-AI workflow testing asks: "Does the workflow produce the intended outcomes when humans and systems work together?"
The distinction matters because the system can function perfectly while the workflow fails. The technology may perform as designed, but:
- Humans may not use it as intended
- The interaction may create friction the design didn't anticipate
- Trust may not develop as assumed
- Behavior may not change as predicted
Testing human-AI workflows requires observing the entire interaction, not just the system's behavior.
The Human Element
Human behavior in testing includes:
- Adoption patterns: Do practitioners use the system when they could?
- Usage patterns: Do they use it as designed, or develop workarounds?
- Trust signals: Do they rely on system recommendations, or override consistently?
- Behavioral change: Does their overall workflow change as intended?
These patterns emerge over time. Single-day testing won't reveal whether practitioners trust a recommendation system. Extended testing reveals whether trust develops, deteriorates, or never forms.
What to Observe Beyond System Function
System metrics tell part of the story. Observation tells the rest.
Watch for:
- Moments of hesitation, where practitioners pause before acting
- Workarounds, actions taken outside the system to accomplish tasks
- Verbal commentary, what practitioners say while working
- Help-seeking, when they ask colleagues for guidance
- Abandonment, when they leave the system to finish work elsewhere
These observations surface friction that metrics miss.
Combining Quantitative and Qualitative
Neither metrics nor observation alone provides complete understanding.
Metrics reveal what happened: time dropped from X to Y, error rate changed from A to B. They don't explain why, or whether the change will persist, or what problems lurk beneath surface improvement.
Observation reveals context: practitioners hesitate at step 3 because the language is confusing, or they override frequently because system recommendations don't match reality. But observation is limited by sample size and observer bias.
Effective testing combines both:
- Quantitative metrics for what changed
- Qualitative observation for why and how
- Practitioner interviews for perception and experience
- Behavioral analysis for patterns over time
Pilot Group Selection
Size: Small Enough to Support, Large Enough to Learn
Pilot groups face a tradeoff:
- Too small: Results may not generalize; individual variation dominates
- Too large: Support burden overwhelms; feedback is difficult to process
A reasonable pilot size depends on context. For R-01, a pilot of 6-10 representatives might be appropriate: enough to see patterns, small enough to provide intensive support and gather detailed feedback.
The right size allows:
- Direct relationship with each pilot participant
- Rapid response to issues that emerge
- Detailed feedback collection
- Reasonable statistical validity for key metrics
Composition: Mix of Enthusiasts and Skeptics
Pilots populated only by enthusiasts will succeed; pilots populated only by skeptics will fail. Neither result is informative.
Effective pilot composition includes:
- Early adopters who will explore and provide feedback willingly
- Mainstream users who represent typical behavior
- Skeptics who will stress-test the system and surface weaknesses
The mix creates realistic conditions. Early adopters show what's possible. Skeptics reveal what's broken. Mainstream users indicate whether the design works for normal people doing normal work.
Duration: Long Enough to See Patterns
Short pilots reveal whether the system functions. Extended pilots reveal whether it works.
The difference: functioning is about technology; working is about workflow. A system might function correctly while the workflow remains inefficient because practitioners haven't adapted, trust hasn't developed, or edge cases haven't emerged.
Minimum pilot duration should allow:
- Initial learning curve to pass (often 1-2 weeks)
- Representative volume of work (enough transactions to measure)
- Pattern stabilization (behavior settles into routine)
- Edge case emergence (unusual situations surface)
For R-01, a reasonable pilot duration might be 4-6 weeks. Enough time for representatives to move past novelty, develop routine usage patterns, and encounter various return scenarios.
Geographic and Functional Considerations
If the production deployment will span locations or functions, the pilot should include variation:
- Different locations may have different work patterns
- Different shifts may have different volumes
- Different practitioners may have different experience levels
A pilot that succeeds in one context and fails in another provides valuable information, but only if both contexts are tested.
Measurement Against Baseline
Using Module 3 Baselines
Module 3 established baseline metrics through rigorous measurement. Module 5 testing uses the same metrics for comparison.
For R-01, baseline metrics included:
- Average time for Bible-dependent returns: 14.2 minutes
- Incorrect policy application rate: 4.3%
- Supervisor escalation rate: 12%
- Patricia-specific queries: 15+/day
Pilot measurement must use the same definitions, same methodology, and same rigor. If the baseline measured task time from return initiation to resolution, pilot measurement must use the same boundaries.
Same Methodology, Same Rigor
Methodological consistency enables comparison. If baseline measurement used time-motion observation of 50 transactions, pilot measurement should use comparable sampling.
Inconsistent methodology makes comparison unreliable. A pilot that measures differently than baseline will produce results that can't be interpreted. Was the change real, or an artifact of measurement?
Before/After Measurement Design
The simplest comparison: measure the pilot group before prototype deployment and after. The difference indicates change.
This approach has limitations:
- Other factors may have changed between measurements
- The "before" measurement may already reflect Hawthorne effects (behavior change from being observed)
- Individual variation may dominate small samples
More rigorous designs use control groups or time-series analysis, but these require larger samples and longer durations. For most prototypes, before/after measurement of the pilot group provides adequate evidence.
Controlling for Variables
Factors other than the prototype can affect results:
- Volume changes: Busy periods differ from slow periods
- Seasonal effects: Some work varies by time of year
- Learning effects: Performance improves as practitioners gain experience
- Staff changes: Different people may perform differently
Controlling for these variables is challenging in real-world pilots. At minimum:
- Note any unusual conditions during pilot
- Compare similar time periods (e.g., same day of week)
- Consider whether observed changes could have other explanations
- Be conservative in attributing results to the prototype
The Three Lenses in Testing
Module 3's three ROI lenses (Time, Throughput, and Focus) provide structure for testing.
Time: Is It Actually Faster?
The Time lens measures whether the prototype reduces time spent on work.
For R-01:
- Baseline: 14.2 minutes average for Bible-dependent returns
- Target: <5 minutes
- Measurement: Time-motion observation of pilot transactions
Results might show:
- Average time reduced to 4.8 minutes (target met)
- Standard deviation remains high (some transactions still take long)
- Time improvement varies by case complexity
Throughput: Is Quality/Volume Actually Improved?
The Throughput lens measures whether the prototype improves work quality or capacity.
For R-01:
- Baseline: 4.3% incorrect policy application
- Target: <2%
- Measurement: QA audit of pilot decisions
Results might show:
- Error rate dropped to 1.8% (target met)
- Most errors now occur in specific case types
- Practitioners feel more confident in decisions
Focus: Is Cognitive Load Actually Reduced?
The Focus lens measures whether the prototype reduces cognitive burden and risk.
For R-01:
- Baseline: 12% supervisor escalation rate, 15+/day Patricia queries
- Target: <5% escalation, <3/day Patricia queries
- Measurement: System tracking of escalations, observation of Patricia queries
Results might show:
- Escalation rate dropped to 7% (partial improvement)
- Patricia queries dropped to 4/day (partial improvement)
- Representatives report feeling more self-sufficient
Each Lens May Show Different Results
A prototype might improve Time while Throughput worsens, or improve Focus while Time increases. Different lenses can reveal different stories.
For R-01, a possible mixed result:
- Time improved significantly (wins on speed)
- Throughput improved moderately (better accuracy)
- Focus improved partially (still some escalation)
Mixed results require interpretation. Is the improvement enough? Which areas need iteration? Does the overall pattern justify production deployment?
Module 5A: REALIZE — Theory
O — Observe
Iteration Methodology
Testing generates data. Iteration converts that data into improvement. This section covers how to interpret feedback, decide what to do next, and maintain progress through the learning cycle.
The Build-Measure-Learn Cycle
Build: Implement the Next Increment
Building in iteration differs from building initially. The initial build implements the prototype scope. Iteration builds implement specific changes responding to specific findings.
An iteration build should:
- Address one finding at a time (avoid combining changes)
- Have clear scope (what's being changed and why)
- Be timeboxed (hours or days, not weeks)
- Be testable (the change can be observed and measured)
For R-01, an iteration build might be: "Policy matching accuracy was 78%; adding product category as a matching factor should improve accuracy." That's a specific change, testable, with clear rationale.
Measure: Collect Data on What Happened
After implementing a change, measure its effect. Did the change produce the intended improvement? Did it create unintended consequences?
Measurement in iteration should be:
- Focused: Measure the specific thing that was changed
- Quick: Get results in days, not weeks
- Comparative: Compare to pre-change baseline
For the R-01 example: After adding product category matching, measure policy matching accuracy. Did it improve from 78%? Did it affect anything else negatively?
Learn: Interpret Data and Decide Next Action
Learning converts measurement into decision:
- If the change worked, incorporate it and move to the next issue
- If the change didn't work, understand why and try a different approach
- If the change revealed new issues, add them to the iteration backlog
Learning requires intellectual honesty. A change that was supposed to help but didn't is useful information, if acknowledged. Teams that explain away negative results don't learn from them.
Cycle Speed Matters
The learning rate is proportional to cycle speed. Faster cycles mean more learning in less time.
Consider two teams:
- Team A completes one build-measure-learn cycle per month
- Team B completes one cycle per week
In three months, Team A has completed 3 cycles. Team B has completed 12 cycles. Team B has four times the learning, which translates to better outcomes.
Cycle speed depends on:
- Build complexity (simpler changes build faster)
- Measurement latency (quick metrics enable quick cycles)
- Decision process (clear authority enables quick decisions)
- Technical capability (fast deployment enables fast testing)
Reading Prototype Feedback
What Metrics Tell You
Metrics provide objective measurement of specific outcomes. They tell you what changed, by how much, with what variation.
For R-01, metrics might show:
- Average policy lookup time: 3.2 minutes (down from 14.2)
- Policy matching accuracy: 83% (users confirm 83% of recommendations)
- Error rate: 2.1% (down from 4.3%)
- Escalation rate: 8% (down from 12%)
These numbers indicate progress toward goals. They don't explain why progress occurred or didn't occur.
What Practitioner Behavior Tells You
Behavior reveals what metrics can't capture:
- Are practitioners using the system enthusiastically, reluctantly, or minimally?
- Where do they hesitate or struggle?
- What workarounds have they developed?
- How has their overall work pattern changed?
Behavioral observation adds context to metrics. A time improvement might be driven by the system working well, or by practitioners giving up on difficult cases and processing only easy ones. Metrics alone can't distinguish these scenarios.
What Silence Tells You
Absence of feedback is data. When practitioners stop commenting on the system, it may mean:
- The system works so well they don't notice it (good)
- They've stopped using it (bad)
- They've adapted in ways that avoid friction (needs investigation)
Silence requires investigation. Don't assume silence means satisfaction.
Distinguishing Signal from Noise
Not all feedback matters equally:
- Single-user complaints may reflect individual preference, not design flaw
- Rare edge cases may not justify design changes
- Early confusion may resolve with experience
Signal indicators:
- Multiple practitioners report similar issues
- Issues persist over time
- Issues affect core workflow, not peripheral features
- Practitioners develop consistent workarounds
Noise indicators:
- Isolated complaints from single users
- Issues that fade as practitioners gain experience
- Preference differences that don't affect outcomes
- Requests for features that weren't part of scope
The Iteration Decision Framework
Continue: Results Positive, Expand Scope
When to continue:
- Core assumptions validated by data
- Metrics meet or exceed targets
- Practitioners are satisfied and effective
- No major issues remain unresolved
Continue means "proceed to next phase," which might be broader pilot, additional features, or production deployment.
Adjust: Results Mixed, Modify and Retest
When to adjust:
- Some metrics meet targets, others don't
- Practitioners report fixable friction
- Issues are implementation problems, not design problems
- The core approach is working, with specific gaps
Adjustment should be targeted. Identify specific issues, implement specific fixes, test specific improvements. Avoid broad redesign in response to specific problems.
Pivot: Core Assumption Wrong, Redesign Approach
When to pivot:
- Core assumption disproved by testing
- Practitioners fundamentally reject the workflow
- Issues trace to design principles, not implementation details
- Fixing individual problems won't address root cause
Pivot is serious. It means the design was wrong, not merely incomplete. Pivot should return to Module 4 principles rather than tweaking the prototype.
Pivot is also rare. Most pilots reveal adjustment needs, not fundamental design failures. If assessment (Module 2), calculation (Module 3), and design (Module 4) were done well, pivot is unlikely.
Stop: Opportunity Isn't Viable
When to stop:
- Core assumption disproved and alternative approaches unlikely to succeed
- Value proposition no longer holds after accounting for reality
- Organizational conditions have changed, making the opportunity obsolete
- Continued investment isn't justified by potential return
Stop is painful but sometimes correct. The discipline of Module 5 is learning what works, which includes learning when something should be abandoned.
Stop should be documented: What was learned? Why did this fail? What would need to be true for a future attempt to succeed?
Scope Management During Iteration
Resisting "While We're Fixing That, Let's Also..."
Iteration is vulnerable to scope creep. Each fix creates temptation to add more:
- "While we're updating the policy matching, let's also add..."
- "Since we're touching that code, we should..."
- "Users are asking for X anyway, might as well..."
These additions derail iteration focus. They turn targeted fixes into expanded scope. They slow cycle speed and blur measurement.
The discipline: each iteration has one focus. Additional requests go to the backlog, not into the current cycle.
Each Iteration Should Have One Focus
Single-focus iteration enables:
- Clear measurement (did this specific change help?)
- Fast cycles (one change builds faster than many)
- Meaningful learning (attribution is clear)
- Manageable complexity (fewer things can go wrong)
When iteration scope expands, benefits erode. Multiple simultaneous changes make it impossible to know which change caused which effect.
Deferring Good Ideas That Aren't Urgent
Good ideas arrive constantly during iteration. Some come from practitioners, some from stakeholders, some from the team. Many are genuinely valuable.
The backlog captures these ideas for later evaluation. Deferral is prioritization, not rejection.
Questions for backlog triage:
- Does this address a current iteration goal?
- Is this urgent (blocking progress) or important (valuable when ready)?
- Can this wait for a future phase without significant cost?
Most good ideas can wait. The ones that can't should dominate current iteration focus.
The Discipline of Incremental Improvement
Progress happens through many small improvements, not one large transformation.
Each iteration:
- Addresses one issue
- Produces measurable improvement
- Creates foundation for next iteration
Accumulated iterations produce substantial progress. A team that makes ten small improvements over five weeks may achieve more than a team that attempts one large improvement over the same period.
Module 5A: REALIZE — Theory
O — Observe
From Pilot to Production
The pilot validated the prototype. Metrics improved. Practitioners provided positive feedback. Iteration addressed the rough edges. The system works.
Now what?
The transition from pilot to production is where many projects stall. The pilot becomes permanent, serving a small group forever while the broader organization waits indefinitely. Or the deployment happens without adequate preparation, and production reveals problems the pilot never surfaced.
This section covers how to graduate from validated pilot to successful production deployment.
Defining Pilot Success
Quantitative Thresholds
Before pilot begins, success criteria should be defined. These criteria provide objective targets:
For R-01:
- Time per Bible-dependent return: <5 minutes (baseline: 14.2 minutes)
- Incorrect policy application: <2% (baseline: 4.3%)
- Supervisor escalation rate: <5% (baseline: 12%)
- System usage rate: >80% (pilot group)
- Practitioner satisfaction: >4.0/5
Success means meeting these thresholds consistently, repeatedly over the pilot duration.
Qualitative Indicators
Numbers alone don't define success. Qualitative factors matter:
- Do practitioners prefer the new workflow to the old?
- Has behavior genuinely changed, or is compliance superficial?
- Are workarounds emerging that indicate unresolved friction?
- Would practitioners advocate for the system to their colleagues?
A pilot that meets quantitative targets while practitioners quietly hate the system is a ticking time bomb that will fail at scale.
Comparison to Module 3 Projections
Module 3's ROI model made projections about expected value. Pilot results should be compared to those projections:
For R-01:
- Projected time savings: 9.2 minutes/return
- Actual time savings: 10.1 minutes/return (exceeded projection)
- Projected error reduction: 2.3 percentage points
- Actual error reduction: 2.2 percentage points (met projection)
- Projected escalation reduction: 7 percentage points
- Actual escalation reduction: 4 percentage points (partially met)
This comparison validates the business case. Results that exceed projection strengthen the case for production. Results that fall short require explanation and possibly revised projections.
What "Good Enough" Looks Like
Perfection isn't the standard. "Good enough" means:
- Core value proposition demonstrated
- Critical success metrics met
- Remaining issues are minor, rare, or have clear remediation paths
- Production deployment won't create significant new problems
- The organization will be better off with the system than without it
Waiting for perfection means waiting forever. At some point, the system is ready. Defining that point in advance prevents endless refinement.
The Pilot Trap
Pilots That Never End
A pilot should have a defined end date. When pilots continue indefinitely, several dynamics are typically at play:
Fear of Scale: "It works for 10 users, but what about 100?" Concerns about scale prevent commitment to deployment.
Perfectionism: "Just a few more tweaks" becomes permanent state. Each improvement reveals another opportunity.
Ownership Ambiguity: No one has authority to declare the pilot successful and proceed.
Risk Aversion: Production deployment feels risky. Pilot feels safe. Safety wins.
Lost Momentum: Original urgency faded. No one is pushing for completion.
"Just a Few More Tweaks" as Avoidance
There's always something else to improve. The policy matching could be 2% more accurate. The interface could be slightly smoother. The documentation could be more complete.
These improvements are genuine. They're also endless. If the standard is "nothing left to improve," deployment never happens.
The discipline: Is the system better than what it replaces? If yes, deploy it. Continue improving after deployment, not instead of deployment.
Loss of Urgency After Initial Success
Early pilots generate excitement. The first positive results create energy. Champions celebrate progress.
As pilots extend, urgency fades. Initial excitement becomes routine. Champions move to other priorities. Stakeholders who were eager become indifferent.
By the time deployment is "ready," no one cares anymore. The project that could have been a success story becomes a footnote.
How Pilots Become Permanent Exceptions
Some organizations have multiple permanent pilots, systems that serve small groups indefinitely because deployment never happened.
These pilots create problems:
- Resource drain: Small groups get support that broader deployment would amortize
- Inequity: Some practitioners have better tools than others for no good reason
- Technical debt: Pilots built for small scale accumulate workarounds as they persist
- Organizational confusion: Which system is official? Which is temporary?
A pilot is a test, not a destination. If it passes the test, deploy it. If it fails, kill it. Either way, it shouldn't persist.
Scaling Considerations
What Worked for 10 May Not Work for 100
Pilot conditions differ from production conditions:
Support intensity: Pilot users get intensive support. Production users get standard support.
User selection: Pilot users are often early adopters. Production includes skeptics and reluctant users.
Volume: Pilot handles limited transactions. Production handles full volume.
Edge cases: Pilot encounters some variation. Production encounters all variation.
Scaling requires anticipating these differences. What assumptions held in pilot may not hold in production?
Infrastructure for Production
Technical infrastructure that supported pilot may need enhancement:
- Performance: Will the system handle peak loads?
- Reliability: What happens when components fail?
- Recovery: How quickly can the system be restored after problems?
- Monitoring: How will ongoing performance be tracked?
These requirements exist during pilot but become critical at scale. A pilot can tolerate occasional problems; production cannot.
Training and Support at Scale
Pilot training was intensive and personal. Production training must be scalable:
- Can new users be onboarded without one-on-one attention?
- Do training materials exist that work without facilitators?
- Is support infrastructure ready for volume?
- Are escalation paths defined?
Change Management for Broader Rollout
Pilot users volunteered or were selected. Production users will have the system imposed on them. This changes the dynamic.
Change management for production:
- Communication: Why is this happening? What's in it for practitioners?
- Timeline: When will changes affect each group?
- Support: Where can practitioners get help?
- Feedback: How can practitioners report problems?
Practitioners who feel informed and supported adopt more readily than practitioners who feel surprised and abandoned.
Production Readiness
Technical Checklist
Before production deployment, verify:
| Category | Item | Status |
|---|---|---|
| Stability | No critical bugs in last 2 weeks | ☐ |
| Performance | Response time meets requirements under load | ☐ |
| Security | Security review completed, vulnerabilities addressed | ☐ |
| Backup | Data backup and recovery tested | ☐ |
| Monitoring | Performance and error monitoring in place | ☐ |
| Integration | All integrations functioning reliably | ☐ |
Operational Checklist
| Category | Item | Status |
|---|---|---|
| Support | Help desk trained on new system | ☐ |
| Documentation | User guides and troubleshooting docs available | ☐ |
| Escalation | Technical escalation path defined | ☐ |
| Maintenance | Maintenance schedule and procedures documented | ☐ |
| Ownership | System owner assigned | ☐ |
Organizational Checklist
| Category | Item | Status |
|---|---|---|
| Training | Training materials ready for all user groups | ☐ |
| Communication | Deployment communication plan executed | ☐ |
| Leadership | Executive sponsor confirmed and engaged | ☐ |
| Feedback | Feedback collection mechanism in place | ☐ |
| Success metrics | Ongoing measurement plan defined | ☐ |
Documentation for Handoff
Production deployment transfers responsibility from project team to operations. Documentation enables this handoff:
- System documentation: What it does, how it works, how to maintain it
- Operational procedures: Daily, weekly, monthly tasks
- Troubleshooting guides: Common problems and solutions
- Contact information: Who to escalate to for what issues
Documentation created during development is often insufficient for operations. Handoff documentation should be created with operational users in mind.
The Deployment Decision
Who Decides
The deployment decision should have clear ownership. Typically:
- Project sponsor approves based on results
- Technical lead certifies readiness
- Operations lead confirms support capability
- Business owner validates expected value
If approval authority is ambiguous, deployment stalls in committee.
Building the Case for Deployment
The deployment recommendation summarizes:
- Pilot results vs. success criteria
- Comparison to Module 3 projections
- Remaining risks and mitigations
- Recommended deployment approach
- Timeline and resource requirements
This is a decision document, not a status report. It should enable a decision, not defer one.
Handling Stakeholder Concerns
Stakeholders may have concerns about deployment:
"What if it breaks?" Show reliability data from pilot. Document rollback procedures.
"Are practitioners ready?" Show adoption data and feedback. Describe training plan.
"What about the edge cases we haven't tested?" Acknowledge remaining uncertainty. Show how edge cases will be monitored and addressed.
"Is the timing right?" Discuss organizational readiness. Note that delay has costs too.
Concerns should be addressed directly, not dismissed. Unaddressed concerns become deployment blockers.
Timing and Sequencing
Deployment timing matters:
- Avoid major business cycles (end of quarter, holidays)
- Consider training logistics (when can users be trained?)
- Account for support availability (who handles problems?)
- Coordinate with other initiatives (avoid change saturation)
Sequencing options:
- Big bang: Everyone at once. Faster but higher risk.
- Phased: Groups deploy sequentially. Slower but lower risk.
- Parallel: New and old systems run simultaneously. Safe but expensive.
The right approach depends on organizational tolerance for risk and operational complexity.
Module 5B: REALIZE — Practice
R — Reveal
Introduction
Module 5A established the principles of rapid implementation. This practice module provides the methodology: how to move from validated blueprint to working prototype to production deployment, creating value at each step.
Why This Module Exists
The gap between design and deployment is where organizations lose momentum.
Module 4 produced a validated Workflow Blueprint: a specification of how work should flow, what technology should do, and how humans and AI should collaborate. That blueprint represents significant investment: assessment, calculation, design, validation.
But a blueprint is a plan, not a result. The plan must become reality. Module 5 provides the discipline to make that happen without falling into the traps that stalled Cascade Legal Partners for eighteen months.
The deliverable: A Working Prototype with measured before/after results, evidence that the design works, ready for production deployment.
Learning Objectives
By completing Module 5B, you will be able to:
-
Scope a minimum viable prototype that tests core assumptions without building everything at once
-
Select an implementation approach (build, buy, or configure) based on requirements and constraints
-
Construct or configure the prototype within timeline discipline, avoiding scope creep
-
Design and execute pilot testing with appropriate group composition, duration, and measurement
-
Measure results against Module 3 baselines using consistent methodology across all three ROI lenses
-
Iterate based on evidence using the build-measure-learn cycle to address issues systematically
-
Prepare for production deployment with appropriate readiness verification and handoff documentation
The Practitioner's Challenge
Three tensions define implementation:
Speed vs. Completeness
The faster you ship, the sooner you learn. But incomplete systems frustrate users and generate invalid feedback. Finding the minimum that enables meaningful testing requires discipline.
Quality vs. Iteration
Production quality standards evolved for good reason. But applying them to prototypes delays learning. Building for iteration means accepting imperfection now to enable improvement later.
Confidence vs. Evidence
The design feels right. Stakeholders are enthusiastic. Practitioners validated the blueprint. But confidence isn't evidence. Only testing reveals whether the design actually works. The temptation to declare victory early, before data confirms success, must be resisted.
Field Note
A technology director at a regional retailer described the moment his team's implementation approach changed:
"We had been building for four months. The system was sophisticated. It did everything we'd designed and more. But we hadn't tested anything with actual users. Every time we got close to pilot, someone found another gap. 'We can't test without X.' 'Y needs to be finished first.' Always reasonable, always delaying.
"Then a competitor launched something similar. Less sophisticated than what we were building, honestly pretty basic. But they were in market, learning from real customers, iterating based on real feedback. We were still planning our pilot.
"That's when we realized: their bad version that shipped beat our good version that didn't. They were learning while we were building. We stripped back to essentials and deployed in three weeks. It wasn't pretty, but it worked. And we learned more in those three weeks than we had in four months of building.
"Now we have a rule: if you can't describe what you'll learn from shipping, you're not ready to ship. But if you can describe what you'll learn, you're already late."
What You're Receiving
Module 5 receives the following from prior modules:
From Module 4: Validated Workflow Blueprint
The blueprint specifies:
- Current-state workflow with documented friction
- Future-state design with human-AI collaboration
- Technology requirements (tool-agnostic)
- Adoption design elements
- Success metrics aligned with ROI model
For R-01, the blueprint documents:
- Current state: 8 steps, 14-28 minutes, high friction at policy search and interpretation
- Future state: 5-6 steps, 9-14 minutes, Preparation pattern with Policy Engine
- Integration requirements: Order Management, CRM, Policy Engine
- Success targets: <5 min time, <2% error, <5% escalation, >80% adoption
From Module 3: Baseline Metrics
The ROI model established baselines:
- Time per Bible-dependent return: 14.2 minutes
- Incorrect policy application: 4.3%
- Supervisor escalation rate: 12%
- Patricia-specific queries: 15+/day
These baselines become the comparison point for pilot measurement.
From Module 3: Success Criteria
The business case defined success:
- Annual value: $97,516
- Implementation cost: $35,000
- Payback period: 4.2 months
- ROI: 736%
Pilot results must validate (or invalidate) these projections.
Module Structure
Module 5B proceeds through six stages:
1. Prototype Scoping
Translating the complete blueprint into minimum viable scope. What must be tested first? What can wait?
2. Implementation Approach
Selecting build, buy, or configure. Evaluating options against R-01 requirements. Documenting the decision.
3. Testing and Measurement
Designing the pilot. Selecting participants. Defining measurement methodology. Executing the test.
4. Iteration Cycles
Interpreting results. Deciding next actions. Implementing improvements. Retesting.
5. Production Preparation
Verifying readiness. Building the deployment case. Preparing handoff documentation.
6. Transition to Module 6
Connecting proven prototype to sustainability planning. What carries forward.
The R-01 Implementation
Throughout Module 5B, we continue the R-01 example from previous modules:
- Module 2 identified R-01 (Returns Bible Not in System) as a high-priority opportunity
- Module 3 quantified the value: $97,516 annual savings
- Module 4 designed the solution: Preparation pattern with automated policy lookup
Module 5 builds it:
- Scoping the minimum prototype that tests policy lookup improvement
- Selecting implementation approach (configure CRM vs. build custom)
- Testing with representative pilot group
- Measuring against the 14.2-minute baseline
- Iterating based on what testing reveals
- Preparing for deployment to all customer service representatives
By the end of Module 5, R-01 will be a working system with demonstrated results. A reality, no longer a design document.
Module 5B: REALIZE — Practice
O — Observe
Prototype Scoping Methodology
The blueprint specifies the complete solution. The prototype tests the core assumptions. This section covers how to translate comprehensive design into focused prototype scope.
From Blueprint to Prototype Scope
The Blueprint Specifies the Complete Future State
Module 4's blueprint documents everything needed for full implementation:
- All workflow steps and decision points
- All human-AI collaboration specifications
- All integration requirements
- All adoption design elements
This completeness is necessary for production. It is often counterproductive for initial prototype.
The Prototype Tests Core Assumptions
Every design embeds assumptions:
- The technology can do what we specified
- Practitioners will use it as designed
- The workflow will reduce friction as projected
- Integration will work reliably
Some assumptions are more critical than others. The business case depends on certain assumptions being true. If they're wrong, everything else is irrelevant.
The prototype tests these critical assumptions first. Non-critical assumptions can wait.
Identifying Essential First-Test Components
To identify what must be in the prototype, ask:
- What assumption does the business case depend on most?
- If this assumption is wrong, does the opportunity still exist?
- What's the smallest thing we can build that tests this assumption?
For R-01, the critical assumption is: automated policy lookup will reduce representative time from 14.2 minutes to under 5 minutes.
Everything that tests this assumption is essential. Everything that doesn't is deferrable.
The MVP Question
"What Is the Smallest Thing We Can Build That Tests Our Core Assumption?"
This question forces ruthless prioritization. Not "what would be nice to have." Not "what stakeholders expect." Not "what the blueprint specifies." Just: what tests the core assumption?
For R-01, the answer might be:
- Policy Engine that matches return attributes to policies
- Display of matched policy in representative's CRM view
- Ability for representative to act on the displayed information
That's it. Not billing integration. Not documentation automation. Not exception handling workflow. Just: can automated policy lookup reduce the time representatives spend finding policies?
Distinguishing "Nice to Have" from "Must Have for Testing"
| Feature | Must Have (MVP) | Nice to Have | Rationale |
|---|---|---|---|
| Policy matching engine | ✓ | Tests core assumption | |
| Policy display in CRM | ✓ | Tests core assumption | |
| Override mechanism | ✓ | Required for fair test | |
| Similar case display | ✓ | Valuable but not essential for time test | |
| Automatic documentation | ✓ | Efficiency gain, not core test | |
| Billing integration | ✓ | Downstream value, not core test | |
| Exception routing workflow | ✓ | Handles 15% of cases, not typical flow | |
| Manager dashboard | ✓ | Observer feature, not practitioner test |
The must-haves test whether automated policy lookup works. The nice-to-haves make it better but aren't needed to answer the essential question.
Scope Categories
MoSCoW Prioritization for Prototype
| Category | Definition | R-01 Example |
|---|---|---|
| Must Have | Required to test core value proposition; prototype fails without it | Policy matching, CRM display, override capability |
| Should Have | Improves test validity but not essential; include if time permits | Similar case references, confidence indicators |
| Could Have | Valuable but can wait; include in later iterations | Exception handling workflow, training mode |
| Won't Have (this version) | Explicitly deferred; not part of prototype scope | Billing integration, manager reporting, mobile access |
The discipline is in Won't Have. Every stakeholder has features they consider essential. MVP discipline requires explicit deferral with clear rationale.
Scope Documentation
Document scope decisions formally:
R-01 PROTOTYPE SCOPE DOCUMENT
MVP Scope (Must Have):
1. Policy Engine integration
- Receive return attributes from CRM
- Match to applicable policy rules
- Return policy summary and confidence level
2. CRM Display
- Show policy information in existing representative view
- No navigation to separate application
- Display appears when return details entered
3. Override Mechanism
- One-click "doesn't apply" option
- No explanation required
- Action logged for learning
Deferred to Phase 2 (Should/Could Have):
- Similar case display
- Exception handling workflow
- Automatic documentation
- Confidence threshold alerts
Out of Scope (Won't Have):
- Billing system integration
- Manager dashboard and reporting
- Mobile access
- Multi-language support
Rationale: MVP tests whether automated policy lookup reduces time. All deferred features are valuable but not required to validate core assumption.
The R-01 Prototype Scope
Full Scope from Blueprint (Review)
Module 4's blueprint specified:
Future-State Workflow:
- Gather return info → Policy Engine identifies policies
- Policy review → System surfaces summary and similar cases
- Exception handling → System flags unusual cases
- Customer communication → Policy summary available
- Return processing → Decision logged automatically
- Documentation → Derived from workflow
Technology Requirements:
- Policy Engine integration
- CRM integration (read/write)
- Similar case matching
- Automatic documentation
- Performance: <2 second response
Adoption Design:
- Optional acknowledgment for experienced reps
- One-click override
- Training integration
MVP Scope for First Prototype
For initial prototype, scope reduces to:
| Blueprint Element | MVP Status | Rationale |
|---|---|---|
| Policy Engine integration | Include | Core assumption |
| CRM display | Include | Core assumption |
| Override mechanism | Include | Fair test requires escape |
| Similar case matching | Defer | Valuable but not core test |
| Automatic documentation | Defer | Efficiency, not core value |
| Exception workflow | Defer | 15% of cases, test typical first |
| Performance (<2 sec) | Include | Poor performance invalidates test |
| Training mode | Defer | Not needed for pilot with support |
What's Deferred and Why
Similar case matching: Helps representatives make decisions but isn't required to test whether automated policy lookup reduces time. If core assumption validates, add this in Phase 2.
Automatic documentation: Saves time at the end of the workflow but doesn't affect the policy lookup test. The time savings from documentation automation can be measured separately.
Exception workflow: Handles the 15% of cases that are unusual. Testing the 85% typical flow first provides cleaner signal. Exception handling adds complexity that obscures core learning.
Manager reporting: Observer feature, not practitioner feature. Violates the Module 4 principle of designing for practitioners first.
Timeline Implications
| Scope | Estimated Timeline | Risk Level |
|---|---|---|
| Full blueprint | 10-12 weeks | Higher (more complexity) |
| MVP + Phase 2 features | 6-8 weeks | Medium |
| MVP only | 3-4 weeks | Lower (focused scope) |
MVP timeline enables testing the core assumption in one month rather than three. If the assumption validates, additional features follow. If it doesn't, less has been wasted.
Scope Documentation
Feature List with Categorization
Create a formal scope document for stakeholder alignment:
| Feature | Category | Acceptance Criteria | Dependencies |
|---|---|---|---|
| Policy matching | Must Have | Matches return attributes to policy with >75% accuracy | Policy database loaded |
| CRM display | Must Have | Policy appears within 2 seconds of return entry | CRM API access |
| Override button | Must Have | Single click dismisses recommendation | None |
| Confidence indicator | Should Have | Shows high/medium/low based on match quality | Policy matching complete |
| Similar cases | Could Have | Shows 2-3 prior cases with similar attributes | Case history database |
| Auto-documentation | Won't Have (v1) | Records decision without manual entry | Defer to Phase 2 |
Acceptance Criteria for "Done"
Define what "done" means for MVP:
- Policy matching: Successfully matches 50+ test cases with >75% accuracy
- CRM display: Policy information renders within 2 seconds, consistently
- Override: Button functions, action is logged
- Integration: No errors in 100 consecutive transactions
- User test: 3 representatives can complete workflow without assistance
These criteria define when the prototype is ready for pilot. Testable, not perfect.
Dependencies and Prerequisites
| Dependency | Owner | Status | Risk |
|---|---|---|---|
| Policy database content | Patricia (SME) | In progress | Medium - requires knowledge extraction |
| CRM API access | IT department | Approved | Low |
| Test environment | Development team | Available | Low |
| Pilot group availability | Operations manager | Confirmed | Low |
Dependencies that aren't resolved block prototype progress. Identify them early.
Risks of Scope Decisions
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| MVP too limited to generate valid feedback | Medium | High | Include override and confidence to ensure usability |
| Deferred features create stakeholder frustration | Medium | Medium | Clear communication about Phase 2 timeline |
| Policy matching accuracy insufficient | Medium | High | Plan calibration iteration before pilot |
| Integration more complex than estimated | Low | High | Start integration work immediately |
Common Scoping Mistakes
Including Everything from Blueprint
The blueprint specifies production requirements. Including all of them in prototype creates the Cascade problem: building everything, testing nothing.
Correction: Ruthlessly apply the MVP question. What tests the core assumption? Everything else waits.
Underestimating Integration Complexity
Integration between systems always takes longer than expected. APIs don't work as documented. Data formats don't match. Security requirements add steps.
Correction: Start integration work early. Test integration independently before building features that depend on it. Reduce scope rather than extend timeline when integration proves difficult.
Forgetting Training and Support Needs
A prototype that practitioners can't use generates no useful feedback. Pilot users need orientation, support access, and feedback channels.
Correction: Include pilot support in scope. Enough for pilot participants to use the system effectively, though less than production-grade training.
Scope Creep During Build
"While we're building policy matching, we should also add..." Each addition seems reasonable. Accumulated additions delay testing indefinitely.
Correction: Formal scope change process. Any addition to MVP scope requires explicit approval with impact assessment. Good ideas go to Phase 2 backlog, not current sprint.
Scope Sign-Off
Before proceeding to implementation, confirm scope with stakeholders:
Scope Agreement Checklist
- MVP scope is documented and understood
- Deferred features are explicitly listed with rationale
- Stakeholders with deferred features have acknowledged timing
- Acceptance criteria are defined for MVP features
- Dependencies are identified with owners and status
- Timeline is realistic for MVP scope
- Scope change process is agreed
This agreement prevents mid-build disputes about what was promised. When someone asks "Aren't you including X?", the documented scope provides the answer.
Module 5B: REALIZE — Practice
O — Operate
Step 1: Select Implementation Approach
The prototype scope is defined. Now: how to build it? This section covers the build vs. buy vs. configure decision and applies it to R-01.
The Build vs. Buy vs. Configure Decision
Framework Review
Module 5A introduced three implementation paths:
| Approach | When to Use | Tradeoffs |
|---|---|---|
| Build | Requirements are unique; no existing tool fits; long-term flexibility matters | Maximum control, maximum cost, longest timeline |
| Buy | Standard solutions address most requirements; time-to-value is priority | Faster deployment, less flexibility, ongoing license cost |
| Configure | Existing platforms have relevant capabilities; integration is simplified | Fastest path, limited by platform capabilities |
The right choice depends on:
- Requirements specificity (how unique are your needs?)
- Timeline pressure (how fast must you test?)
- Internal capability (can you build and maintain?)
- Budget constraints (what's affordable?)
- Long-term ownership (who maintains this over years?)
Applying to R-01
R-01's requirements from the blueprint:
Functional:
- Accept return attributes
- Match to policy rules
- Return policy summary with confidence
- Display in CRM interface
- Capture override actions
Technical:
- <2 second response time
- Integration with existing CRM and Order Management
- Support for 50+ concurrent users
Constraints:
- No changes to Order Management data structures
- No additional login for representatives
- No mandatory data entry beyond current workflow
Each path has distinct tradeoffs.
R-01 Implementation Options Analysis
Option A: Configure Existing CRM
What would need to happen:
- Create custom policy database within CRM
- Build automation rules to match return attributes to policies
- Create custom UI component for policy display
- Configure logging for override actions
Pros:
- Fastest timeline (3-4 weeks to prototype)
- No new system to integrate
- Representatives stay in familiar interface
- Lower cost (internal effort, no new licenses)
- IT team has CRM configuration expertise
Cons:
- Policy matching logic limited by CRM capabilities
- Scaling may hit platform limits
- Some features may require workarounds
- Dependent on CRM vendor roadmap
Timeline estimate: 3-4 weeks to MVP Resource estimate: 1 CRM administrator, 0.5 developer Cost estimate: $8,000-12,000 (internal labor)
Option B: Purchase Returns Management Tool
What would need to happen:
- Evaluate and select vendor
- Negotiate contract and licensing
- Configure tool for Lakewood policies
- Build integration with existing CRM
- Train administrators on new platform
Pros:
- Purpose-built for returns/policy management
- Vendor handles updates and improvements
- May include features beyond current scope
- Potentially better policy matching capabilities
Cons:
- Longer timeline (vendor selection, contract, configuration)
- Integration complexity (new system to connect)
- Ongoing license costs
- Vendor dependency for customization
- Representatives may need to switch between systems
Timeline estimate: 8-12 weeks to MVP Resource estimate: 0.5 developer for integration, vendor support Cost estimate: $15,000-25,000 (licenses) + $10,000-15,000 (integration)
Option C: Build Custom Integration Layer
What would need to happen:
- Design policy matching engine architecture
- Develop custom matching algorithms
- Build integration layer for CRM and Order Management
- Create custom UI components
- Implement logging and analytics
Pros:
- Exact fit to requirements
- Maximum flexibility for future enhancement
- Full ownership and control
- No vendor dependencies
Cons:
- Longest timeline
- Highest cost
- Requires ongoing development resources
- Risk of scope creep during custom development
- Technical debt accumulation
Timeline estimate: 10-14 weeks to MVP Resource estimate: 2 developers, 1 architect Cost estimate: $35,000-50,000 (development)
R-01 Recommended Approach
Selected Option: Configure Existing CRM (Option A)
Rationale:
-
Timeline alignment: MVP in 3-4 weeks tests core assumption quickly. Longer paths delay learning without proportional benefit for prototype phase.
-
Risk reduction: CRM configuration is reversible. If prototype fails, minimal investment lost. Custom build or vendor commitment creates sunk costs.
-
Capability match: CRM's automation capabilities can handle policy matching at prototype scale. Production may require enhancement, but prototype doesn't need production capacity.
-
Integration simplicity: No new system means no new integration. Representatives stay in familiar interface, reducing adoption friction.
-
Team capability: IT team has CRM expertise. No new skills required for prototype.
What This Means for Prototype Construction:
- Week 1: Policy database design and initial data entry
- Week 2: Automation rules for policy matching
- Week 3: UI component development and integration testing
- Week 4: Pilot preparation and initial testing
Production Considerations:
CRM configuration may be insufficient for full production. If prototype validates the core assumption, production options include:
- Enhanced CRM configuration with additional optimization
- Migration to purchased tool (now justified by proven value)
- Custom development (now scoped by real requirements)
The prototype decision doesn't lock in the production decision. Learning from prototype informs better production choice.
Vendor/Platform Evaluation
When Purchasing (Option B), Evaluate Against Blueprint
If Option B were selected, evaluation would follow this process:
Step 1: Create evaluation criteria from blueprint
| Criterion | Weight | Source |
|---|---|---|
| Policy matching accuracy | High | Blueprint functional requirement |
| CRM integration capability | High | Blueprint integration requirement |
| Response time <2 seconds | Medium | Blueprint performance requirement |
| Override logging | Medium | Blueprint collaboration specification |
| Reporting capabilities | Low | Nice-to-have, not MVP |
| Mobile access | Low | Not in current scope |
Step 2: Evaluate vendors against criteria
| Vendor | Matching | Integration | Performance | Override | Score |
|---|---|---|---|---|---|
| Vendor A | 4/5 | 3/5 | 5/5 | 4/5 | 3.9 |
| Vendor B | 5/5 | 4/5 | 4/5 | 3/5 | 4.1 |
| Vendor C | 3/5 | 5/5 | 4/5 | 5/5 | 4.1 |
Step 3: Proof-of-concept with top candidates
Before contract signing:
- Test with actual policy data
- Verify integration with actual CRM
- Measure actual response times
- Confirm customization capabilities
Proof-of-Concept Requirements
| Test | Success Criteria | Duration |
|---|---|---|
| Policy matching | >75% accuracy on 50 test cases | 3 days |
| Integration | Successful round-trip data flow | 2 days |
| Performance | <2 second response under load | 1 day |
| Customization | Override logging configurable | 1 day |
Proof-of-concept should cost nothing or minimal; vendors are motivated to support it.
Resource Requirements
For Option A (Selected): CRM Configuration
| Resource | Allocation | Weeks | Total |
|---|---|---|---|
| CRM Administrator | 100% | 3 | 120 hours |
| Developer (integration) | 50% | 2 | 40 hours |
| Business Analyst | 25% | 4 | 40 hours |
| Patricia (SME) | 10% | 4 | 16 hours |
| Project Lead | 25% | 4 | 40 hours |
Total effort: ~250 hours Total cost: ~$12,000 (assuming blended rate of $50/hour)
Timeline with Milestones
| Week | Milestone | Deliverables |
|---|---|---|
| 1 | Policy database ready | Data structure defined, initial policies loaded |
| 2 | Matching logic complete | Automation rules configured and tested |
| 3 | UI integration complete | Policy display functional in CRM |
| 4 | Pilot ready | Testing complete, pilot group briefed |
Budget Alignment
Module 3 approved $35,000 for R-01 implementation. Option A prototype consumes ~$12,000, leaving $23,000 for:
- Pilot support and iteration
- Production enhancement
- Contingency
This allocation provides runway for learning and adjustment.
Risk Assessment
Technical Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| CRM automation insufficient for policy complexity | Medium | High | Test complex policies early; have Option B ready |
| Performance degrades under load | Low | Medium | Monitor during pilot; optimize before scale |
| Integration breaks with CRM updates | Low | Medium | Test in sandbox after updates; maintain documentation |
Timeline Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Policy data extraction takes longer than expected | Medium | Medium | Start immediately; Patricia availability confirmed |
| Testing reveals unexpected issues | Medium | Low | Built buffer into Week 4; iteration expected |
| Stakeholder adds scope mid-build | Medium | Medium | Scope agreement signed; change process defined |
Adoption Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Representatives resist new workflow | Low | High | Design validated in Module 4; pilot includes skeptics |
| Policy matching accuracy too low | Medium | High | Calibration sprint before pilot; override available |
| Training insufficient for pilot | Low | Medium | Intensive support during pilot; feedback loops |
Module 5B: REALIZE — Practice
O — Operate
Step 2: Testing and Measurement
The prototype is built. Before pilot launch, design the test: who participates, how long it runs, what gets measured, and how data is collected.
Pilot Design
Pilot Group Selection for R-01
The pilot group must be large enough to generate meaningful data but small enough to support intensively.
Recommended pilot size: 8 representatives
Selection criteria:
- Mix of tenure levels (2 new, 4 experienced, 2 veteran)
- Mix of attitudes (include 1-2 skeptics identified during Module 4 validation)
- Representatives who handle returns regularly (minimum 5 Bible-dependent returns per day)
- Geographic/shift distribution if applicable
R-01 Pilot Group:
| Representative | Tenure | Attitude | Returns Volume | Notes |
|---|---|---|---|---|
| Maria T. | 8 years | Champion | High | Module 4 validation participant |
| DeShawn W. | 2 years | Supportive | High | Eager to try new tools |
| Jennifer R. | 4 months | Neutral | Medium | New perspective, learning Bible |
| Alex P. | 5 years | Skeptic | High | Questioned design in validation |
| Keisha M. | 12 years | Neutral | High | Veteran knowledge, Patricia's backup |
| Carlos S. | 1 year | Supportive | Medium | Quick learner |
| Patricia L. | 22 years | Supportive | High | Bible expert, essential validator |
| Ryan K. | 3 years | Skeptic | Medium | Raised concerns about accuracy |
This mix ensures:
- Champions who will explore and advocate
- Skeptics who will stress-test and surface problems
- New users who reveal whether the system is intuitive
- Veterans who reveal whether it handles complex cases
Duration and Timeline
Pilot duration: 4 weeks
| Week | Phase | Focus |
|---|---|---|
| 1 | Learning | Representatives orient to system; support intensive |
| 2 | Stabilization | Usage patterns establish; early issues addressed |
| 3 | Measurement | Primary data collection; behavior stabilizes |
| 4 | Validation | Confirm patterns; prepare iteration decisions |
Four weeks allows:
- Learning curve effects to pass
- Sufficient transaction volume for statistical validity
- Pattern observation over multiple workweeks
- Time for edge cases to emerge
Success Criteria
Quantitative thresholds (from Module 3/4):
| Metric | Baseline | Target | Measurement Method |
|---|---|---|---|
| Time per Bible-dependent return | 14.2 min | <5 min | Time-motion observation |
| Policy matching accuracy | N/A | >80% confirmed | Override rate tracking |
| Incorrect policy application | 4.3% | <2% | QA audit sample |
| Supervisor escalation rate | 12% | <5% | System logging |
| System usage rate | N/A | >80% | System logging |
| Representative satisfaction | 3.2/5 | >4.0/5 | Survey |
Qualitative indicators:
- Representatives prefer new workflow to old
- Workarounds are minimal or absent
- Patricia queries decrease significantly
- Pilot participants would recommend to colleagues
Control Considerations
Before/after design: Measure each representative's performance before pilot (during Week 0 baseline) and during pilot weeks 3-4.
Controlling for variables:
- Compare similar time periods (avoid end-of-month, holidays)
- Note any unusual volume or complexity during pilot
- Track whether returns mix was typical
- Document any system issues or outages
Measurement Plan
Metrics Aligned with Module 3 Baseline
Module 3 established baselines. Module 5 measures against them using identical methodology.
| Metric | Module 3 Method | Module 5 Method | Comparability Check |
|---|---|---|---|
| Task time | Time-motion observation, n=50 | Time-motion observation, n=50+ | Same observer training, same definition of start/end |
| Error rate | QA audit of 100 decisions | QA audit of 100 decisions | Same auditor, same criteria |
| Escalation | Manual count from logs | System tracking | Verify definition matches |
Collection methodology:
Time Metrics:
- Observer records start time when representative opens return case
- Observer records end time when representative completes policy-based decision
- Exclude customer communication and processing time (consistent with baseline)
- Sample minimum 50 transactions across pilot period
- Distribute samples across all pilot representatives
Quality Metrics:
- QA team audits random sample of 100 return decisions
- Audit criteria: Was correct policy applied? Was decision appropriate for case?
- Auditor blind to whether decision made with or without system
- Compare pilot error rate to baseline error rate
Behavioral Metrics:
- System logs every interaction: policy displayed, override clicked, time on screen
- Calculate usage rate: returns processed with system / total Bible-dependent returns
- Track override rate: overrides / total recommendations
- Note patterns: Who uses most? Who overrides most? What cases get overridden?
Collection Schedule:
| Week | Data Collected | Responsible |
|---|---|---|
| 0 (pre-pilot) | Baseline confirmation measurements | Business Analyst |
| 1 | Usage logging, support issues, early feedback | Project Lead |
| 2 | Continued logging, first observation session | Business Analyst |
| 3 | Primary time-motion observation, QA audit begins | Business Analyst + QA |
| 4 | Complete observation, complete audit, surveys | Full team |
The R-01 Measurement Framework
Time Metrics
Target: Policy lookup time < 5 minutes (vs. 14.2 minute baseline)
Measurement:
- Time from return case open to policy decision made
- Excludes customer communication and return processing
- Measured via observation (primary) and system timestamps (secondary)
Collection:
- 50+ observed transactions during weeks 3-4
- Stratified by representative and case complexity
- Standard deviation calculated to understand variation
Throughput Metrics
Target: Error rate < 2% (vs. 4.3% baseline)
Measurement:
- QA audit of 100 return decisions during pilot
- Same auditor, same criteria as baseline audit
- Error = incorrect policy applied or inappropriate decision
Collection:
- Random sample from all pilot representatives
- Include simple and complex cases
- Audit within 48 hours of decision for context availability
Focus Metrics
Target: Escalation rate < 5% (vs. 12% baseline)
Measurement:
- Percentage of returns requiring supervisor involvement
- Supervisor involvement = case transferred or supervisor consulted
Collection:
- System logging of escalation events
- Verify with supervisor records
- Track by representative and case type
Adoption Metrics
Target: System usage > 80%
Measurement:
- Returns processed using system / total Bible-dependent returns
- Usage = policy display occurred and representative took action
Collection:
- System logging (automatic)
- Verify representatives aren't bypassing system
- Note reasons for non-use if identified
Satisfaction Metrics
Target: Satisfaction > 4.0/5
Measurement:
- Survey administered at end of week 4
- 5-point scale on: ease of use, accuracy, speed, preference vs. old process
- Open-ended feedback questions
Collection:
- All pilot representatives complete survey
- Anonymous for honest feedback
- Administered by neutral party, not project team
Qualitative Data Collection
Observation Protocol
During time-motion observation, note:
- Points of hesitation (where do representatives pause?)
- Verbal reactions (comments, sighs, frustration, satisfaction)
- Workarounds (actions outside the system)
- Questions to colleagues (seeking help or confirmation)
- Override patterns (when and why they override)
Use structured observation form:
OBSERVATION RECORD
Observer: ________________ Date: ________ Time: ________
Representative: ____________ Tenure: ________
Transaction #: ________
Return Type: ________________
Case Complexity: Simple / Medium / Complex
Start Time: ________ End Time: ________ Total: ________
System Used: Yes / No
If No, reason: ________________________________
Policy Recommended: ________________________________
Action Taken: Accepted / Overridden / N/A
If Overridden, reason observed: ________________________________
Friction Points Observed:
________________________________
________________________________
Representative Comments (verbatim):
________________________________
________________________________
Observer Notes:
________________________________
________________________________
Interview Questions
Conduct 15-minute interviews with each pilot representative at end of week 2 and week 4.
Week 2 (early feedback):
- "How would you describe your experience with the new system so far?"
- "What's working well?"
- "What's frustrating or confusing?"
- "Have you found situations where the system doesn't help?"
- "What would make it more useful?"
Week 4 (final feedback):
- "How has your experience changed since we last spoke?"
- "Would you want to continue using this system? Why or why not?"
- "How does this compare to the old way of doing things?"
- "What advice would you give a colleague starting to use this?"
- "What should we change before rolling out to everyone?"
Capturing Workarounds
Workarounds indicate unmet needs. Track them systematically:
| Workaround Observed | Who Used It | Frequency | What Need It Addresses |
|---|---|---|---|
| [Description] | [Reps] | [Often/Sometimes/Once] | [Underlying need] |
Multiple representatives using the same workaround signals design gap.
Weekly Feedback Sessions
Hold 30-minute group session at end of each pilot week:
- What went well this week?
- What problems did you encounter?
- What questions do you have?
- What should we focus on improving?
Document themes, not individual complaints. Look for patterns.
Analysis Framework
Comparing Pilot Results to Baseline
Create comparison table:
| Metric | Baseline | Pilot Result | Change | Target Met? |
|---|---|---|---|---|
| Task time | 14.2 min | [result] | [%] | Yes/No |
| Error rate | 4.3% | [result] | [pp] | Yes/No |
| Escalation rate | 12% | [result] | [pp] | Yes/No |
| System usage | N/A | [result] | N/A | Yes/No |
| Satisfaction | 3.2/5 | [result] | [points] | Yes/No |
Statistical Considerations
Sample size: 50+ observations provides reasonable confidence for time metrics. Smaller samples increase uncertainty.
Significance: For prototype testing, practical significance matters more than statistical significance. A 50% time reduction is meaningful even without p-values.
Variation: Report mean and standard deviation. High variation may indicate inconsistent experience.
Interpreting Mixed Results
Results rarely show universal improvement. Interpretation requires judgment:
Scenario: Time improved, but error rate increased
- Possible cause: Representatives moving too fast, skipping verification
- Response: Adjust workflow to include confirmation step
Scenario: Metrics improved, but satisfaction low
- Possible cause: System works but feels burdensome
- Response: Investigate friction points through interviews
Scenario: Most metrics improved, but one segment struggled
- Possible cause: Complex cases not well-handled
- Response: Analyze which cases fail, enhance for those
Documenting Findings
Create structured pilot report:
R-01 PILOT RESULTS REPORT
Executive Summary:
[2-3 sentences on overall outcome]
Quantitative Results:
[Table comparing baseline to pilot]
Qualitative Findings:
[Key themes from observation and interviews]
What Worked:
[List with evidence]
What Needs Improvement:
[List with specific issues]
Recommended Next Steps:
[Continue/Adjust/Pivot/Stop with rationale]
Appendices:
- Raw data
- Observation records
- Interview transcripts
- Survey results
Module 5B: REALIZE — Practice
O — Operate
Step 3: Iteration Cycles
The pilot generated data. Representatives used the system. Metrics were collected. Now: what does the data mean, and what should happen next?
Interpreting R-01 Pilot Results
Review the Results
From the pilot measurement (file 04):
| Metric | Baseline | Target | Pilot Result | Assessment |
|---|---|---|---|---|
| Task time | 14.2 min | <5 min | 4.3 min | ✓ Target met |
| Error rate | 4.3% | <2% | 2.1% | ~ Close to target |
| Escalation rate | 12% | <5% | 7% | × Below target |
| System usage | N/A | >80% | 87% | ✓ Target met |
| Satisfaction | 3.2/5 | >4.0/5 | 4.2/5 | ✓ Target met |
What's Working Well
- Time reduction exceeded target: 4.3 minutes vs. 5 minute target
- Representatives adopted the system: 87% usage rate
- Satisfaction improved: 4.2/5 vs. 3.2/5 baseline
- Patricia queries dropped dramatically (15+/day to 3/day)
- New representative Jennifer R. became productive quickly
What Needs Adjustment
-
Escalation rate (7%) still above target (5%)
- Root cause: Complex cases where policy matching uncertain
- Specific issue: Multi-condition returns where multiple policies apply
-
Error rate (2.1%) slightly above target (2%)
- Root cause: Specific policy categories with calibration issues
- Specific issue: Warranty vs. satisfaction guarantees confused
Data vs. Practitioner Feedback
What data tells us: Time improved dramatically. Accuracy improved moderately. Escalations reduced but not enough.
What practitioners tell us: "The system is right most of the time, but when it's wrong, I don't know how to tell." Representatives trust the system for simple cases but want more confidence information for complex ones.
The gap: Representatives aren't confident in the system's uncertainty. When matching confidence is low, they escalate rather than risk error. Confidence indicator (Should Have feature) would address this.
The Iteration Decision
Applying the Framework
| Option | Criteria | R-01 Assessment |
|---|---|---|
| Continue | Results positive, expand scope | 4 of 5 metrics met or nearly met; not ready to expand yet |
| Adjust | Results mixed, modify and retest | Core value proven; specific gaps identified; clear fix path |
| Pivot | Core assumption wrong | Core assumption validated (time reduction works) |
| Stop | Opportunity not viable | Value demonstrated; stopping would waste proven progress |
R-01 Decision: ADJUST
Rationale:
The core assumption, that automated policy lookup reduces representative time, is validated. Time improved from 14.2 minutes to 4.3 minutes. This is the foundation of the business case.
However, two metrics need improvement:
- Escalation rate needs 2 percentage points reduction
- Error rate needs 0.1 percentage point reduction
Both issues have identified root causes with clear remediation paths. Iteration will address them without rebuilding the core system.
Planning the Iteration
Iteration 1 Scope
Specific changes to make:
-
Add confidence indicator
- Display High/Medium/Low confidence for each policy match
- Logic: High = single policy match, clear criteria; Medium = single match, some criteria ambiguous; Low = multiple policies apply, criteria unclear
- Implementation: New UI element in CRM display; logic extension in matching engine
-
Calibrate problem categories
- Warranty vs. satisfaction guarantee: Add product category weighting
- Multi-condition returns: Display all applicable policies rather than best match
- Implementation: Policy database update; matching logic refinement
-
Revise escalation guidance
- Add "Review recommended" flag for Low confidence matches
- Change escalation prompt from "Transfer to supervisor" to "Consider policy X before escalating"
- Implementation: CRM display modification
What stays the same:
- Core matching engine architecture
- CRM integration approach
- Override mechanism
- Logging and tracking
Timeline: One Week
| Day | Activity |
|---|---|
| 1-2 | Implement confidence indicator |
| 3 | Calibrate problem categories |
| 4 | Implement escalation guidance changes |
| 5 | Internal testing and pilot preparation |
Success Criteria for Iteration:
- Escalation rate: <5% (target)
- Error rate: <2% (target)
- Representative confidence: "I can tell when to trust it"
- No regression in metrics already meeting target
The R-01 First Iteration
Changes Implemented
Confidence Indicator:
Before: Policy display showed recommendation only
POLICY MATCH: 30-day return - full refund, original payment method
[Apply] [Override]
After: Policy display shows confidence level
POLICY MATCH: 30-day return - full refund, original payment method
Confidence: HIGH
[Apply] [Override]
-- or --
POLICY MATCH: Extended warranty claim OR satisfaction guarantee
Confidence: LOW - Multiple policies may apply
Review both policies before deciding
[View All] [Apply First] [Override]
Calibration Changes:
The warranty vs. satisfaction guarantee confusion stemmed from overlapping product categories. Calibration added:
- Product purchase date weighting (warranties apply to newer products)
- Customer history flag (satisfaction guarantees for repeat customers)
- Price threshold (high-value items get more careful matching)
Escalation Guidance:
Low confidence matches now display: "This case may need additional review. Before escalating, check if [specific policy element] applies."
This gives representatives a path to resolution without defaulting to escalation.
Pilot Impact (Week 2 of Iteration)
| Metric | Pilot 1 | Iteration Target | Iteration Result |
|---|---|---|---|
| Escalation rate | 7% | <5% | 4.8% |
| Error rate | 2.1% | <2% | 1.7% |
| System usage | 87% | >80% | 91% |
| Satisfaction | 4.2/5 | >4.0/5 | 4.4/5 |
Second Pilot Cycle
Abbreviated Second Cycle
With iteration 1 successful, a brief validation cycle confirmed:
- All five metrics now meet targets
- Representatives report improved confidence ("I know when to check")
- Escalations that still occur are appropriate (genuinely complex cases)
- Alex P. (identified skeptic) now advocates for the system
Results Trajectory
| Metric | Baseline | Pilot 1 | Iteration 1 | Trend |
|---|---|---|---|---|
| Task time | 14.2 min | 4.3 min | 4.1 min | Stable |
| Error rate | 4.3% | 2.1% | 1.7% | Improving |
| Escalation | 12% | 7% | 4.8% | Improving |
| Usage | N/A | 87% | 91% | Improving |
| Satisfaction | 3.2/5 | 4.2/5 | 4.4/5 | Improving |
The learning loop is working. Each cycle produces measurable improvement.
Knowing When to Stop Iterating
Graduation Criteria Review
| Criterion | Status |
|---|---|
| All quantitative targets met | ✓ |
| Qualitative indicators positive | ✓ |
| Practitioners would recommend to colleagues | ✓ |
| Critical issues resolved | ✓ |
| Remaining issues are minor/rare | ✓ |
Diminishing Returns Signal
Further iteration might improve metrics marginally:
- Error rate could go from 1.7% to 1.5%
- Satisfaction could go from 4.4 to 4.5
But these gains require disproportionate effort. The core value is proven. Additional refinement can happen after production deployment.
"Good Enough" Determination
R-01 is good enough for production because:
- Core business case validated (time reduction: 70%)
- All success metrics achieved
- Practitioner adoption strong (91%)
- Remaining friction is edge-case, not systemic
- Iteration log shows diminishing issues per cycle
Preparing for Production
The system is ready to move beyond pilot. This means:
- Broader rollout to all customer service representatives
- Scaling support and monitoring
- Transitioning from project to operations
Module 5B: REALIZE — Practice
O — Operate
Step 4: Production Preparation
The pilot succeeded. Iteration addressed the gaps. Metrics meet targets. Practitioners support the system. The question now: is R-01 ready for production?
Production Readiness Assessment
Technical Readiness Checklist
| Item | Requirement | R-01 Status |
|---|---|---|
| Stability | No critical bugs in last 2 weeks | ✓ Zero critical issues in iteration cycle |
| Performance | Response time <2 seconds under expected load | ✓ Averaging 1.4 seconds |
| Security | Security review completed, vulnerabilities addressed | ✓ CRM security applies; no new vulnerabilities introduced |
| Backup | Data backup and recovery tested | ✓ Policy database backed up nightly with CRM |
| Monitoring | Performance and error monitoring in place | ✓ CRM monitoring extended to new components |
| Integration | All integrations functioning reliably | ✓ Order Management and CRM integration stable |
| Scalability | Can handle full user population | ⚠ Testing with 50 concurrent users passed; production may have 80+ |
Technical assessment: Ready with monitoring. Scalability is manageable risk. CRM handles current transaction volume; new components add minimal load.
Operational Readiness Checklist
| Item | Requirement | R-01 Status |
|---|---|---|
| Help desk | Support staff trained on new system | ✓ Help desk completed training; handled pilot issues |
| Documentation | User guides and troubleshooting docs available | ✓ Quick reference guide and FAQ created |
| Escalation | Technical escalation path defined | ✓ IT support → CRM administrator → Development |
| Maintenance | Maintenance schedule and procedures documented | ✓ Weekly policy sync, monthly calibration review |
| Ownership | System owner assigned | ✓ Customer Service Manager owns; IT supports |
Operational assessment: Ready. Pilot provided operational learning; documentation tested with real issues.
Organizational Readiness Checklist
| Item | Requirement | R-01 Status |
|---|---|---|
| Training | Training materials ready for all user groups | ✓ 15-minute self-paced module created |
| Communication | Deployment communication plan executed | ⚠ Plan drafted; execution begins next week |
| Leadership | Executive sponsor confirmed and engaged | ✓ Director of Customer Service committed |
| Feedback | Feedback collection mechanism in place | ✓ Feedback button in CRM; weekly review process |
| Success metrics | Ongoing measurement plan defined | ✓ Dashboard created; monthly reporting schedule |
Organizational assessment: Ready pending communication execution.
The Deployment Case
Summarizing Pilot Results
R-01 pilot demonstrated:
| Metric | Baseline | Target | Final Result | Change |
|---|---|---|---|---|
| Task time | 14.2 min | <5 min | 4.1 min | -71% |
| Error rate | 4.3% | <2% | 1.7% | -2.6pp |
| Escalation rate | 12% | <5% | 4.8% | -7.2pp |
| System usage | N/A | >80% | 91% | N/A |
| Satisfaction | 3.2/5 | >4.0/5 | 4.4/5 | +1.2 |
All targets achieved. All trends improving.
Comparison to Module 3 Projections
| Projection | Module 3 Estimate | Pilot Actual | Variance |
|---|---|---|---|
| Time savings | 9.2 min/return | 10.1 min/return | +10% (better) |
| Annual labor savings | $76,176 | Est. $83,793* | +10% |
| Error reduction value | $15,480 | Est. $17,028* | +10% |
| Focus improvement value | $8,260 | Est. $9,086* | +10% |
| Total annual value | $97,516 | Est. $109,907 | +10% |
Extrapolated from pilot; production results will confirm.
The business case is validated and exceeded.
Addressing Stakeholder Concerns
Concern: "What if it breaks?"
- Response: CRM configuration means existing CRM reliability applies. Rollback procedure documented. Help desk trained. Monitoring in place.
Concern: "Are representatives ready?"
- Response: 91% adoption in pilot. Training materials tested. Champions identified among pilot group to support peers.
Concern: "What about cases the pilot didn't cover?"
- Response: Pilot included mix of case types and representative tenure. Edge cases will emerge; override and escalation paths handle them. Calibration process allows ongoing improvement.
Concern: "Can we really save this much time?"
- Response: Pilot measured same way as baseline. Time reduction verified by observation and system data. Conservative extrapolation used.
Recommendation: Proceed with Production Deployment
Evidence supports deployment. Continued delay risks:
- Pilot group creating two-tier service quality
- Losing momentum and stakeholder attention
- Patricia remaining as single point of failure
R-01 Production Deployment Plan
Rollout Sequence: Phased
Rather than full deployment to all 22 representatives simultaneously, roll out in two waves:
| Wave | Representatives | Timeline | Rationale |
|---|---|---|---|
| 1 | 10 representatives (including 8 pilot) | Week 1-2 | Leverage pilot experience; champions support new users |
| 2 | Remaining 12 representatives | Week 3-4 | Learn from Wave 1; full deployment |
Why phased: Phased rollout limits risk and provides learning opportunity. Pilot representatives can support peers. Issues surface at smaller scale.
Timeline and Milestones
| Week | Milestone | Activities |
|---|---|---|
| Week 1 | Wave 1 preparation | Communication; training scheduling; system verification |
| Week 2 | Wave 1 live | 10 representatives using system; intensive support |
| Week 3 | Wave 2 preparation | Wave 1 lessons incorporated; remaining training completed |
| Week 4 | Wave 2 live | All 22 representatives using system; standard support |
| Week 5+ | Stabilization | Monitoring; calibration adjustments; feedback review |
Training and Communication Plan
Communication sequence:
- Leadership announcement (Director): rationale and commitment
- Department meeting: demonstration and Q&A
- Individual scheduling: training slot assignment
- Go-live notification: system availability confirmation
Training approach:
- 15-minute self-paced module (mandatory)
- 30-minute live Q&A session (optional but recommended)
- Quick reference card at each workstation
- Champion buddy assignment (pilot participant paired with new user)
Support and Monitoring Plan
First 30 days (intensive support):
- Help desk priority queue for system issues
- Daily check-in from project team
- Weekly calibration review
- Real-time usage monitoring
Ongoing support:
- Standard help desk procedures
- Monthly calibration review
- Quarterly performance review
- Annual system assessment
Handoff Documentation
What Operations Needs to Run the System
| Document | Contents | Audience |
|---|---|---|
| System Overview | Architecture, integrations, data flows | IT support |
| Maintenance Procedures | Weekly sync, monthly calibration, backup verification | CRM administrator |
| Performance Thresholds | Response time targets, error rate thresholds | Monitoring team |
| Escalation Matrix | Who to contact for what type of issue | All support staff |
What Support Needs to Troubleshoot
| Document | Contents | Audience |
|---|---|---|
| Troubleshooting Guide | Common issues and resolutions | Help desk |
| Known Limitations | Cases the system handles poorly | Help desk, supervisors |
| Override Protocol | When and how to override recommendations | Representatives |
| Feedback Process | How to report issues and suggestions | All users |
What Training Needs to Onboard
| Document | Contents | Audience |
|---|---|---|
| User Guide | How to use the system day-to-day | Representatives |
| Quick Reference Card | Key actions on one page | Representatives |
| Training Module | Self-paced onboarding content | New representatives |
| FAQ | Common questions with answers | All users |
What Leadership Needs to Track Success
| Document | Contents | Audience |
|---|---|---|
| Success Dashboard | Key metrics, trends, alerts | Customer Service leadership |
| Monthly Report Template | Standardized performance summary | Department leadership |
| Business Case Validation | Actual vs. projected value | Executive sponsor |
| Sustainability Plan | Long-term ownership and monitoring | Operations leadership |
Success Metrics for Production
Metrics Continuing from Pilot
| Metric | Target | Collection Method | Frequency |
|---|---|---|---|
| Task time | <5 min | System timestamps + observation | Monthly sample |
| Error rate | <2% | QA audit | Monthly |
| Escalation rate | <5% | System logging | Weekly |
| System usage | >80% | System logging | Weekly |
| Satisfaction | >4.0/5 | Survey | Quarterly |
Additional Metrics for Scale
| Metric | Target | Collection Method | Frequency |
|---|---|---|---|
| System availability | >99.5% | System monitoring | Continuous |
| Help desk volume | <5 tickets/week | Ticket tracking | Weekly |
| Training completion | 100% | Training system | Until complete |
| Override rate | Trend monitoring | System logging | Weekly |
Reporting Schedule
| Report | Audience | Frequency |
|---|---|---|
| Operational dashboard | Operations team | Real-time |
| Performance summary | Customer Service Manager | Weekly |
| Executive summary | Director, Sponsor | Monthly |
| Business case validation | Leadership team | Quarterly |
Escalation Triggers
| Condition | Action |
|---|---|
| System availability <99% | Immediate IT escalation |
| Error rate >3% for 2 consecutive weeks | Calibration review |
| Satisfaction drops below 3.5/5 | User feedback review |
| Help desk volume >10 tickets/week | Root cause analysis |
Connection to Module 6
Production Deployment Is Not the End
Deployment delivers the capability. Sustainability preserves it.
Without intentional sustainability design:
- Staff turnover erodes expertise
- System updates break integrations
- Calibration drifts as business changes
- Monitoring attention fades
- Value erodes gradually
Module 6 addresses these risks.
Handoff Artifacts for Module 6
R-01 delivers to Module 6:
- Baseline metrics (from Module 3)
- Pilot results and iteration log
- Production deployment results (after stabilization)
- Known risks and monitoring requirements
- Ownership assignments and escalation paths
These artifacts become inputs for sustainability planning.
Module 5B: REALIZE — Practice
Transition to Module 6: NURTURE
What Module 5 Accomplished
Module 5 converted design into reality. The Workflow Blueprint from Module 4 became a working system with measured results.
The Journey:
-
Built working prototype from blueprint
- Scoped minimum viable prototype focusing on core assumption
- Selected implementation approach (Configure CRM)
- Constructed prototype within timeline discipline
-
Tested with practitioners in real conditions
- Designed pilot with representative group composition
- Measured against Module 3 baselines
- Collected quantitative and qualitative data
-
Measured results against baseline
- Time improvement: 14.2 min → 4.1 min (71% reduction)
- Error improvement: 4.3% → 1.7% (2.6 percentage points)
- Escalation improvement: 12% → 4.8% (7.2 percentage points)
- Adoption: 91% system usage
- Satisfaction: 4.4/5
-
Iterated based on evidence
- Interpreted pilot results systematically
- Applied iteration decision framework (Adjust)
- Implemented targeted improvements
- Validated improvements in second cycle
-
Prepared for production deployment
- Verified technical, operational, and organizational readiness
- Built deployment case with validated results
- Created rollout plan and handoff documentation
R-01 Results Summary
Final Pilot Metrics vs. Baseline
| Metric | Baseline | Target | Final Result | Change |
|---|---|---|---|---|
| Task time | 14.2 min | <5 min | 4.1 min | -71% |
| Error rate | 4.3% | <2% | 1.7% | -2.6pp |
| Escalation rate | 12% | <5% | 4.8% | -7.2pp |
| System usage | N/A | >80% | 91% | N/A |
| Satisfaction | 3.2/5 | >4.0/5 | 4.4/5 | +1.2 |
All targets achieved. Core assumption validated.
Final Metrics vs. Module 3 Projections
| Element | Module 3 Projection | Module 5 Result | Variance |
|---|---|---|---|
| Time savings | 9.2 min/return | 10.1 min/return | +10% |
| Estimated annual value | $97,516 | $109,907 (projected) | +10% |
| Implementation cost | $35,000 | ~$12,000 (prototype) | -66% |
| Payback period | 4.2 months | ~1.3 months (projected) | -69% |
Results exceeded projections. Business case strengthened.
Key Learnings from Iteration
-
Confidence indicators matter more than accuracy alone. Representatives needed to know when to trust recommendations.
-
Calibration requires ongoing attention. Policy categories drift; regular review catches problems early.
-
Champions accelerate adoption. Skeptics who converted became the strongest advocates.
-
Simple changes have outsized impact. The confidence indicator and escalation guidance took days to implement but moved escalation rate by 2+ percentage points.
Practitioner Feedback Summary
What worked:
- "I don't have to interrupt Patricia anymore for routine questions."
- "The confidence level tells me when to double-check."
- "New staff can handle returns that used to require veteran knowledge."
What could be better:
- "Some policies still need clearer language."
- "Would be nice to see similar cases for complex situations." (Deferred feature)
Net assessment: Practitioners strongly prefer the new system.
Module 5B: REALIZE — Practice
T — Test
Measuring Implementation Quality
Module 5 built and tested the prototype. This section establishes how to measure whether the work is good: whether it produces results and whether it's done well.
Validating the Prototype
Before pilot begins, the prototype itself needs validation. Four questions:
Does it implement the blueprint specification?
The blueprint from Module 4 specified what the system should do. Validation confirms the prototype does it:
| Blueprint Requirement | Prototype Status |
|---|---|
| Accept return attributes | ✓ Implemented |
| Match to policy rules | ✓ Implemented |
| Return policy summary with confidence | ✓ Implemented |
| Display in CRM interface | ✓ Implemented |
| Capture override actions | ✓ Implemented |
Any gaps between blueprint and prototype should be intentional (MVP scope) or flagged for remediation.
Does it function reliably?
Reliability means consistent behavior:
- Same inputs produce same outputs
- No unexplained failures
- Error handling prevents crashes
- Integration with other systems is stable
For R-01: 100 test transactions with zero failures required before pilot.
Is it usable by practitioners?
Usability means practitioners can complete their work:
- Interface is comprehensible without documentation
- Common tasks are efficient
- Uncommon tasks are achievable
- Error recovery is possible
For R-01: Three representatives complete five returns each without assistance.
Is it ready for pilot?
Pilot readiness means the system can support real work:
- Data is loaded (policy database complete)
- Training is available (quick reference ready)
- Support is prepared (help desk briefed)
- Feedback collection is ready (logging active)
Pilot readiness is not production readiness. Lower standards apply. The goal is learning.
Prototype Quality Metrics
Functional Completeness (vs. MVP Scope)
| MVP Feature | Implemented | Tested | Working |
|---|---|---|---|
| Policy matching | ✓ | ✓ | ✓ |
| CRM display | ✓ | ✓ | ✓ |
| Override mechanism | ✓ | ✓ | ✓ |
| Performance (<2 sec) | ✓ | ✓ | ✓ |
Functional completeness = Features implemented and working / Features in MVP scope
Target: 100% before pilot begins.
Technical Stability
| Stability Metric | Target | Actual |
|---|---|---|
| Failed transactions | 0 in 100 tests | 0 |
| System errors | 0 critical in testing | 0 |
| Response time variance | <500ms | 340ms |
| Recovery from errors | Graceful degradation | ✓ |
Stability ensures the pilot tests the design, not the bugs.
Usability Assessment
| Usability Factor | Method | Result |
|---|---|---|
| Task completion | 3 reps × 5 returns | 15/15 |
| Time to learn | First successful return | <10 min |
| Errors made | User errors during test | 2 (both recovered) |
| Satisfaction | Post-test rating | 4.2/5 |
Usability ensures practitioners can actually use what was built.
Integration Reliability
| Integration | Test Method | Result |
|---|---|---|
| CRM → Policy Engine | 100 transactions | 100% success |
| Order Management data | 50 order lookups | 100% success |
| Logging system | Action capture | 100% captured |
Integration reliability ensures the prototype works in its ecosystem.
Leading Indicators (During Pilot)
Leading indicators predict ultimate outcomes. Watch them early to catch problems.
Early Adoption Signals
| Signal | What It Means | R-01 Target |
|---|---|---|
| Day 1 usage | Initial willingness | >60% of pilot group |
| Week 1 trend | Increasing or decreasing? | Stable or increasing |
| Voluntary use | Using when not required | >40% |
| Override rate | Trust in recommendations | <30% |
Low early adoption may indicate training gaps, usability problems, or resistance. Address before measurement period.
Error/Issue Frequency
| Metric | Early Warning | R-01 Target |
|---|---|---|
| System errors | >5/day | <2/day |
| User-reported issues | >3/day | <1/day |
| Help desk contacts | Rising trend | Stable or declining |
| Workaround emergence | Any pattern | Zero patterns |
Rising issues require investigation. Stable or declining issues allow measurement to proceed.
Practitioner Engagement
| Signal | Positive Indicator | Negative Indicator |
|---|---|---|
| Questions asked | "How do I..." | "Why do I have to..." |
| Suggestions offered | "What if we could..." | Silence |
| Peer discussion | Sharing tips | Sharing complaints |
| Override patterns | Specific cases | Everything |
Engagement reveals whether practitioners are investing in the system or enduring it.
Iteration Velocity
| Metric | Healthy | Unhealthy |
|---|---|---|
| Issues identified | Clear, specific | Vague, broad |
| Fix turnaround | Days | Weeks |
| Improvement validated | Measurable | Assumed |
| New issues emerging | Decreasing | Increasing |
Healthy iteration shows progress. Unhealthy iteration shows churn.
Lagging Indicators (After Pilot)
Lagging indicators confirm outcomes. They're the evidence for deployment decisions.
Time Improvement vs. Baseline
| Metric | Baseline | Target | Result | Assessment |
|---|---|---|---|---|
| Average task time | 14.2 min | <5 min | 4.1 min | ✓ 71% improvement |
| Time variance | High | Reduced | Reduced | ✓ More consistent |
| Peak time cases | 28 min | <10 min | 8 min | ✓ Complex cases improved |
Time improvement validates the core value proposition.
Quality Improvement vs. Baseline
| Metric | Baseline | Target | Result | Assessment |
|---|---|---|---|---|
| Error rate | 4.3% | <2% | 1.7% | ✓ 60% reduction |
| Error severity | Mix | Reduced severe | Reduced | ✓ Remaining errors minor |
| Rework required | 8% | <4% | 3.2% | ✓ Less rework |
Quality improvement validates accuracy claims.
Focus Improvement vs. Baseline
| Metric | Baseline | Target | Result | Assessment |
|---|---|---|---|---|
| Escalation rate | 12% | <5% | 4.8% | ✓ 60% reduction |
| SME queries | 15+/day | <5/day | 3/day | ✓ Patricia freed |
| Context switches | High | Reduced | Reduced | ✓ Less interruption |
Focus improvement validates cognitive load reduction.
ROI Realization vs. Projection
| Element | Projected | Actual | Variance |
|---|---|---|---|
| Time savings value | $76,176/yr | $83,793/yr | +10% |
| Quality savings value | $15,480/yr | $17,028/yr | +10% |
| Focus savings value | $8,260/yr | $9,086/yr | +10% |
| Total annual value | $97,516/yr | $109,907/yr | +10% |
ROI realization validates the business case.
Red Flags
Red flags signal problems that may not be obvious in metrics.
Adoption Doesn't Improve Over Time
Week 1 usage was 65%. Week 4 usage is still 65%. Representatives haven't increased adoption despite familiarity.
What it means: The system isn't earning trust. Practitioners use it when required but don't prefer it.
Investigation: Why aren't practitioners choosing to use it? Usability? Accuracy? Workflow friction?
Same Issues Recur Across Iterations
Iteration 1 addressed policy matching accuracy. Iteration 2 addressed policy matching accuracy. Iteration 3 addressed policy matching accuracy.
What it means: The fix isn't working. Either the diagnosis is wrong or the solution is inadequate.
Investigation: Is this a design problem? An implementation problem? A scope problem?
Practitioners Develop Workarounds for the New System
Representatives are using the system but have developed their own verification steps: checking the Bible anyway, asking Patricia to confirm.
What it means: The system is part of the workflow but hasn't replaced the old process. It's added work, not reduced it.
Investigation: What creates the need for verification? Trust? Accuracy? Specific case types?
Results Plateau Below Targets
Time improved from 14.2 minutes to 7 minutes. Three iterations later, it's still 7 minutes. Progress has stopped short of the 5-minute target.
What it means: The current approach has limits. More iteration won't reach the target.
Investigation: What's creating the floor? Is the target realistic? Does the approach need to change?
Module 5B: REALIZE — Practice
S — Share
Consolidation Exercises
Learning solidifies through application and teaching. These exercises help integrate Module 5 concepts into your practice.
Module 5 Key Takeaways
These principles should guide your implementation practice:
1. Progress Over Perfection
A shipped prototype beats a perfect plan. The goal is learning. Every day spent polishing instead of testing is a day of learning lost.
2. One Visible Win Earns the Right to Continue
Skeptics don't convert through arguments. They convert through evidence. A small success demonstrated is worth more than a large success promised.
3. Prototype Is for Learning, Not Production
Prototypes aren't scaled-down production systems. They're learning instruments. Build them for speed and flexibility, not durability and performance.
4. Iteration Based on Evidence, Not Opinion
When pilots generate data, use it. Don't iterate based on hunches or preferences. Don't dismiss data because it's inconvenient. Let evidence drive decisions.
5. Pilot Is a Means, Not an End
Pilots exist to validate solutions for broader deployment. A pilot that never ends is an exception that consumes resources while denying benefits to everyone else.