Snowflake
End-to-End Airbnb Analytics Engineering
CSV -> S3 -> Snowflake -> dbt Bronze/Silver/Gold models with analytics-ready outputs.
SnowflakedbtAWS S3PythonSQL
Overview
An end-to-end data engineering pipeline for Airbnb listings, hosts, and bookings using Snowflake and dbt. The project implements medallion modeling, incremental processing, and historical tracking for analytics.
Problem
- Airbnb source data arrives as separate raw files that are not analytics-ready.
- Business users need consistent, reliable metrics with historical tracking across changing entities.
Solution
- Ingest source CSVs into staging, then model Bronze/Silver/Gold layers in dbt on Snowflake.
- Use incremental models to process only new/changed records and improve runtime efficiency.
- Implement dbt snapshots (SCD Type 2) for bookings, hosts, and listings to preserve history.
Architecture
- Source CSV data -> AWS S3 -> Snowflake staging tables
- dbt Bronze models for raw structured ingestion
- dbt Silver models for cleaning, standardization, and enrichment
- dbt Gold models (`fact`, `obt`) for analytics and BI consumption
Metrics
- Produced analytics-ready Gold datasets (`fact` and `obt`) for downstream reporting.
- Reduced rebuild overhead via incremental model execution in Bronze/Silver layers.
- Improved trust with dbt tests, source checks, and lineage visibility.
Highlights
- Medallion architecture with clear layer boundaries and ownership.
- SCD Type 2 snapshots for historical point-in-time analysis.
- Reusable macros and Jinja templating to keep transformations maintainable.
