# Module 07: Documents, Data, and Artifacts
## Exercise: The Data Detective

Upload `quarterly-sales-messy.csv` to Claude. The file is intentionally messy: missing values, an unassigned rep, an unknown customer, a discounted price, a blank units field, and inconsistent data quality. Real data looks like this.

---

## The Prompt

```
I'm uploading our Q4 2025 sales data. This is raw data pulled from our CRM and it has issues. I need you to:

1. DATA QUALITY AUDIT: Identify every data quality issue in this file. Missing values, inconsistencies, anomalies, anything that looks wrong.

2. CLEAN SUMMARY: Despite the data issues, give me the cleanest possible summary of Q4 performance:
   - Total revenue by month (Oct, Nov, Dec)
   - Revenue by product tier (S-100, S-300, S-500)
   - Revenue by region
   - Top 5 customers by total spend
   - Revenue by sales rep (flag the unassigned deals)

3. THREE TRENDS: What are the three most important trends or insights a sales leader should know about? Support each with specific numbers from the data.

Format the output as an executive briefing I can send to our VP of Sales.
```

---

## What Participants Should Find

**Data Quality Issues:**
- Row 10 (Gulf Industries): units_sold is blank, total_revenue is blank
- Row 19 (web order Oct 19): no sales rep assigned
- Row 23 (Southern Industrial): unit_price is $1,440 instead of $1,450 (1% discount noted)
- Row 48 (web order Dec 30): customer name "Unknown Customer" and type "Unknown"
- Row 15 (Gulf Industries Nov 15): deal_stage is "Pipeline" not "Closed Won" (should this be in the revenue data?)

**Key Metrics (approximate):**
- Total Q4 revenue: ~$850K (exact depends on how blanks are handled)
- S-100 dominates volume, S-500 shows growth trajectory
- Northeast (Sarah Chen) is the top-performing region
- Pacific Components and Apex Manufacturing are the two largest accounts
- December shows an acceleration pattern (year-end buying)

**Interesting Trends:**
- Upsell pattern: Several customers bought S-100 first, then S-300 or S-500 in subsequent orders (Tri-State, Bayou, Great Lakes)
- Heartland Precision: Lost a deal on price in October, came back and ordered in October AND continued through Q4. The competitive loss didn't stick.
- Web orders without rep assignment represent leakage. Two deals worth ~$14,600 had no rep. Someone is doing SEO well but nobody's following up.

---

## Facilitator Notes

This exercise teaches three things at once:
1. Claude can spot data quality issues that humans skim past.
2. Claude can analyze messy data without requiring you to clean it first (though it should flag the mess).
3. The quality of the analysis depends entirely on the specificity of the prompt. "Analyze this data" produces generic output. The five-part prompt above produces an executive briefing.

Let participants try their own analysis prompts first. Then show the structured version. Compare outputs.
