Scenario 3

High-Value Customer Segmentation

A seven-step workflow that turns "which customers should Marketing target?" into a reusable activity layer plus a segment model, materialized into your dev schema. Two-model design on purpose: the activity layer is testable and reusable so downstream work does not re-derive the same aggregates.

Watch the campaign-audience workflow play through end-to-end. Seven steps, builds a reusable activity layer plus the segment model, roughly two minutes.

Why

What you're solving

Marketing asks: which customers should we target? You'll build a reusable activity layer plus a segment model on top of it, instead of writing a one-off query that nobody can re-use.

The two-model design is the point. Churn models, RFM models, executive dashboards - all of them can sit on the activity layer instead of re-deriving the same aggregates in five different places. That's the difference between an analytics-engineering practice and a pile of one-off queries. By the end you'll have a campaign audience that lives as a queryable, versioned table - not a screenshot of a query somebody ran once.

Stakeholder ask. Marketing wants a targeted high-value audience based on recent purchase behavior by store.
Segment Definitions

Three canonical segments

Customers can belong to multiple segments. The model tags or unions rows so a single customer can appear with multiple segment_name values - that's the business definition, not a bug.

VIP

High frequency, high value

avg_transaction_value > $100 AND transaction_count >= 3

The customer who shows up regularly and spends meaningfully every time.

Big spender

At least one large transaction

max_transaction_value > $300

A single transaction over $300 puts them in this segment - independent of frequency.

Category-loyal

Deep in one category

category_transaction_count >= 10 for any single category

The customer who keeps coming back for the same category - perfect for targeted merch campaigns.

The Prompt Flow

What you'll type, in order

Copy each prompt as written (recommended) or rephrase in your own words. Run the steps in order; do not skip ahead.

  1. Discovery

    Six entity types - wider than the other scenarios - which is exactly why discovery has to happen before any SQL.

    Find the models related to customers, stores, orders, order lines, products, and categories.

    Then ask for grain and joins.

    Show the grain and joins for those models.
    Two domains are commonly missed: a dedicated categories model (sometimes category lives only as a column on products) and a separate order lines model (sometimes folded into orders). Name either before moving on.
    Exercises: status, search
  2. Schema Understanding

    Check three things when the response returns: where category actually lives (product level, order-line level, or order header), the grain of orders versus order lines, and whether store is on the order, on the customer, or on both.

    Check recent order dates and category values needed for a 180-day segmentation model.
    Exercises: describe, lineage
  3. Data Inspection

    Two silent bugs to watch for: a stale max(order_date) that quietly shrinks the 180-day window, and a category field with nulls, whitespace, or mixed case that breaks category-loyal logic later.

    Create a 180-day customer activity model by store.
    If the max order date is more than a few days old, decide whether to anchor the window on current_date or on max(order_date). If categories are dirty, decide on exclude-versus-coalesce now.
    Exercises: warehouse, dbt_show on rolling-window anchor and distinct categories
  4. Activity Layer Model

    The reusable intermediate model - not the segment model yet. Per-customer x store aggregates over the trailing 180 days: transaction_count, avg_transaction_value, max_transaction_value, and category_transaction_count at customer x store x category grain.

    Create a segment model for VIPs, big spenders, and category-loyal customers, built on top of the activity model.
    Building this as its own model is the design choice that pays back for years. Verify the 180-day window is applied consistently and that the grain is customer x store (with customer x store x category for the category counts).
    Exercises: file edits, model creation
  5. Segment Model

    Built on top of the activity layer. Expected columns: customer identifier, store_name, segment_name, transaction_count, avg_transaction_value, max_transaction_value, category, category_transaction_count. Verify multi-segment logic and confirm thresholds match the definitions above.

    Compile and preview the segment model. Exclude customers with no segment.
    Exercises: file edits, model creation on top of activity layer
  6. Safe Preview

    The SQL compiles, sample rows render, nothing lands in the warehouse yet. The "exclude no segment" filter doubles as a data-quality check: if a large share of customers fall out, the thresholds or the date window need a second look before Marketing ever sees the list.

    Materialize the segment model into my dev schema. Skip the verification pass - the preview already confirmed the output.
    Confirm multi-segment customers appear on multiple rows. Flag any segment that comes back suspiciously empty (zero VIPs is almost always a threshold or join issue). Eyeball whether the audience size is plausible for a Marketing campaign.
    Exercises: dbt_compile, dbt_show
  7. Materialize

    dbt Wizard runs the build against your dev schema (dev_lab_user_N). The instructor drops dev schemas after the lab via a cleanup script, so this build is safe and reversible.

    The "skip the verification pass" instruction is deliberate: the Step 6 preview already validated the output, and re-running a full verification pass would burn roughly 10% of a 20-minute lab on duplicate work.

    Exercises: dbt_run against dev schema
What you'll have at the end

Final artifact

Two models in your dev schema. A reusable activity layer (customer x store, 180-day aggregates) plus a segment model on top tagging VIPs, big spenders, and category-loyal customers. The campaign audience now lives as a queryable, versioned table - the foundation that future churn, RFM, and dashboard work can sit on without re-deriving the same aggregates.
References + Next

Keep going

GitHub

Open SKILL.md

The canonical scenario-3 skill definition, prompts, and instructor notes.

Reference

Printable prompt sheet

One-page lab handout with every prompt in this scenario, formatted for print.

Next

Scenario 4

Broken model from a source column rename. Choose-your-path: products, orders, or customers.