Scenario 2

Extending an Existing Model with a New Source

A six-step workflow for the most common real-world analytics-engineering task: an existing intermediate model is missing a domain, and a Fivetran-synced source already in the warehouse can fill the gap. Wire it in without breaking the downstream consumers.

Path A - extend int_customer_order_summary with retail.RET_TICKETS for the VP of Customer Success.

Path B - extend int_orders_enriched with retail.RET_TICKETS at the order grain for the Director of Operations. Same source as Path A, different model and grain on purpose.

Path C - extend int_product_sales_summary with retail.RET_PRODUCT_REVIEWS for the Head of Merchandising.

Why

What you're solving

A stakeholder has a follow-up question that the current model can't answer. The data they need is already arriving in Snowflake via Fivetran. It just isn't connected to the model they're staring at.

The job is to wire the new source into the existing intermediate model without breaking the downstream consumers that already depend on it. You'll find the unused source, validate the join's grain and coverage before writing any SQL, modify the existing model with an alias-safe left-joined extension, and confirm the entire downstream lineage still compiles. All in minutes.

Choose Your Path

Three extensions, same workflow

Pick one path before Step 1. The prompt flow is identical across paths - only the placeholders change. The room can run all three simultaneously.

Path A

Customer 360 + Support Tickets

Stakeholder: VP of Customer Success

Ask: When we segment customers for renewal outreach, can we deprioritize customers with open complaints?

Target model
int_customer_order_summary
New source
retail.RET_TICKETS
Entity
customer
New columns
open_tickets_count, last_ticket_status, last_ticket_opened_at
Path B

Orders + Support Tickets

Stakeholder: Director of Operations

Ask: For every order in the enriched view, can we see how many support tickets were opened against it so we can flag problematic orders for postmortem?

Target model
int_orders_enriched
New source
retail.RET_TICKETS (at order grain)
Entity
order
New columns
ticket_count, has_open_ticket_flag, last_ticket_status
Path C

Product Performance + Reviews

Stakeholder: Head of Merchandising

Ask: In product performance, can we surface average customer rating and review volume so we can see which top sellers are also well-rated?

Target model
int_product_sales_summary
New source
retail.RET_PRODUCT_REVIEWS
Entity
product
New columns
avg_rating, review_count, low_rating_count (1-2 stars)
Note: Path B uses the same source as Path A (RET_TICKETS) but at the order grain. That's the point - the source is "new to this model," not new to the project.
The Prompt Flow

What you'll type, in order

The prompts below use placeholders. Substitute the values for the path you picked - the reference card sits underneath.

  1. Locate the target model

    Start from the model we're being asked to extend, not from a blank file. Knowing the downstream consumers up front is what separates a safe edit from a Slack thread about a broken dashboard.

    Find [TARGET_MODEL] in this project. Show me what it currently produces, its grain, and which models depend on it downstream.

    Then ask what unused sources are available.

    Find every source in this project related to [ENTITY] that [TARGET_MODEL] does NOT currently reference. I want to know what data is sitting in our warehouse that we're not using yet.
    Exercises: search, describe, lineage
  2. Discover the unused source

    dbt Wizard surfaces a Fivetran-synced source that's already configured in the project but isn't connected to the target model. No need to spelunk through _staging__sources.yml or guess what's been added since the last time anyone looked.

    Describe the schema of [NEW_SOURCE]. Show me the columns, their types, the grain, and which column joins back to [ENTITY].
    Exercises: status, search, source-vs-model cross-referencing
  3. Validate the join

    This is the step that prevents the most common silent failure of cross-domain joins: a key that looks right by name but doesn't behave right by data. Verify grain, coverage, and join-key match before any SQL is touched.

    Run a quick check: count rows in [NEW_SOURCE], count distinct join keys, and count how many of those keys match an [ENTITY] already in [TARGET_MODEL]. Tell me whether the grain is one-to-one or one-to-many.
    If the grain is many-rows-per-[ENTITY], decide with dbt Wizard whether to aggregate (open tickets count, review count) or pick the latest row by timestamp (last ticket status). Name the aggregation choice out loud before Step 4.
    Exercises: describe, warehouse, join-key inspection
  4. Modify the existing model

    Edit the existing model file, not a new file. The skill being demonstrated is safe modification of an existing asset, not another bolt-on intermediate model.

    Update [TARGET_MODEL] to add [NEW_COLUMNS] from [NEW_SOURCE]. Use a LEFT JOIN so [ENTITY] rows without a match still appear, and aggregate [NEW_SOURCE] to one-row-per-[ENTITY] before joining if its grain is many-to-one. Preserve every column the model currently emits - only add new columns at the end.
    Two non-negotiables. Left join, not inner (inner-joining quietly drops [ENTITY] rows without a match). Preserve existing columns (downstream consumers select specific columns - renaming or removing them is a contract break).
    Exercises: file edits on the existing model
  5. Compile downstream + safe preview

    Compiling the downstream models too is the compile-time guarantee that we haven't broken any consumer that depends on a column that no longer exists or has changed type. Small cost, high catch rate.

    Compile [TARGET_MODEL] and every downstream model that depends on it. Then preview 20 rows of [TARGET_MODEL] ordered deterministically. Do not materialize anything.
    Confirm: new columns populate for matched rows and null (not error) for unmatched. Total row count of [TARGET_MODEL] is unchanged. Every downstream model compiled without error.
    Exercises: dbt_compile across lineage, dbt_show on target
  6. Materialize

    Build into your dev schema. Step 5 already validated output and downstream compile, so the verification pass is scoped out for this timed lab.

    Materialize [TARGET_MODEL] into my dev schema. Skip the verification pass - the preview and downstream compile already confirmed the output.
    Exercises: dbt_run against dev schema

Placeholder substitutions

Path A

Customer 360

[TARGET_MODEL]
int_customer_order_summary
[NEW_SOURCE]
retail.RET_TICKETS
[ENTITY]
customer
[NEW_COLUMNS]
open_tickets_count, last_ticket_status, last_ticket_opened_at
Path B

Operations

[TARGET_MODEL]
int_orders_enriched
[NEW_SOURCE]
retail.RET_TICKETS
[ENTITY]
order
[NEW_COLUMNS]
ticket_count, has_open_ticket_flag, last_ticket_status
Path C

Merchandising

[TARGET_MODEL]
int_product_sales_summary
[NEW_SOURCE]
retail.RET_PRODUCT_REVIEWS
[ENTITY]
product
[NEW_COLUMNS]
avg_rating, review_count, low_rating_count
What you'll have at the end

Final artifact

[TARGET_MODEL] now emits [NEW_COLUMNS] from [NEW_SOURCE] in your dev schema. Existing column contract is preserved, downstream models still compile, row count is unchanged from the pre-edit baseline.
References + Next

Keep going

GitHub

Open SKILL.md

The canonical scenario-2 skill definition with all three paths and instructor notes.

Reference

Printable prompt sheet

One-page lab handout with every prompt in this scenario, formatted for print.

Next

Scenario 3

High-value customer segmentation. Build a two-model design: reusable activity layer plus a segment model for Marketing.