Join us on an exhilarating exploration of the data landscape as we delve into the phenomenon that is dbt. It has taken the world by storm and is now the most popular data transformation tool. Let’s dive into this new era and witness the renaissance of SQL at the core of data analytics, bringing it back to those who know the data best.
What we will do
In this workshop, we’ll get to experience the first steps into dbt and build our first dbt project. The workshop focuses on hands-on exercises. If you’re here to sit back and relax, then this workshop is not for you.
We will cover the following dbt essential concepts:
Models
Sources
Data documentation
Tests
Macros
Jinja
Packages
The workshop ends with some bits and pieces to help you to grow on your journey to becoming a jinja ninja!
Requirements for attendees
Laptop
Schedule
14h30 - 18h00
Photos
From the event
41 photos
SQL RESURGENCE Sam Debruyn Data Makers Fest September 2024 UNLEASHING DATA POTENTIAL WITH DBT WORKSHOP
Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 dbt, Microsoft, modern data stack
What we'll talk about Why SQL for data transformations? Introducing dbt Analytics engineering dbt features Workshop Your next steps
Di#erent ways to transform data Programming languages Python and Scala. High learning curves and often creates a boundary between business users and specialized engineers. Very powerful and easy to maintain. Declarative languages SQL, SAS, and the likes. Code is easy to write and understand but oCers limited flexibility and can be hard to maintain (adopting software eng. best practices). Low-code / UI-based Easy to adopt, use, and achieve results. Very high vendor lock-in and limited flexibility and modularity.
Scan the QR code
A quick survey done at local meetups How would you rate these statements? 0 = Strongly disagree / 5 = Strongly agree
Programming languages 2023 source: Stack Overflow worked with / wants to work with
The common language of data transformations is not drag-and- drop Data architects ERWIN Wherescape Data engineers Informatica Matillion Analytics engineers Alteryx Talend BI developers Tableau Qlik Analysts Excel / Sheets Power BI
The common language of data transformations is SQL Data architects ERWIN Wherescape SQL Data engineers Informatica Matillion SQL Analytics engineers Alteryx Talend SQL BI developers Tableau Qlik SQL Analysts Excel / Sheets Power BI SQL
Introducing dbt = the T in ELT
Analytics engineering
Analytics engineering Source
a Data Engineer DEs focus more on building data platforms, ingesting data, and are more technical. AEs focus more on building the end data products. a Data Analyst AEs will also apply software engineering best practices (modular code, versioning, automated testing, …), and are comfortable with git, usage of the terminal, and CI/CD. DAs are less technical and are not able to transform data as eJiciently, taking into account best practices w.r.t. writing code. The Analytics Engineer vs.
dbt adoption past 6 years 2020 2019 2021 2022 2018 2017 October 2023: 30000+ weekly active projects
3 things to know No compute dbt requires a data warehouse to function, it only sends SQL queries SQL with Jinja dbt is built for SQL, in some cases you can also use Python Free/self-hosted or cloud dbt Core is free but requires "plumbing" (e.g. an orchestrator) dbt Cloud is paid, but will be cheaper than building everything around it manually
📦 dbt Core Open source & free to use Complete dbt experience with all capabilities with regards to transformations, database engine compatibility ☁ dbt Cloud Free tier: 1 user Teams: starts at 100 USD/user/month Everything in dbt Core for the most common database engines Browser-based IDE CI/CD & job scheduling Hosted data docs Easy setup & configuration dbt Mesh dbt Explorer …
Compatibility
Getting started dbt Cloud
Getting started dbt Core
Workshop introduction The great Drew Banin (creator of dbt) watched the movie Synecdoche, New York, and was inspired by the idea of creating a complete simulation of a world. Rather than using discrete rules to generate synthetic data, instead setting up entities with behavior patterns and letting them loose to interact with each other. This was the birth of the JaDle Shop. There are customers, stores, products, and more, all with their own behaviors and interactions as time passes. A ja%le is an Australian dish. In other countries it’s often called a toastie, a panini, or a croque monsieur. A ja%le is toasted in an iron/toaster and is 2 slices of bread with something in between. 50 ja%le recipes for when you’re pressed for time source 1 | source 2
Modular development Write transformations in separate version-controlled files SQL on steroids with Jinja: control logic, loops Customize and parametrize with variables Reusable code blocks with macros Easy to follow DRY principles
Manage data sources and monitor data freshness Sources
Sources Dynamic schema selection Start tracking lineage from the source
Data lineage Understand the flow of data Impact of modifying a transformation How a dimension/fact is constructed
Data lineage Spot and detect bad data model design
Data tests & unit tests Automated testing for your code, as well as for your data Tests can be integrated in other tooling to get a good view on your data quality Simple YAML- or SQL-based syntax to define tests
Documentation and tests
dbt docs Clear convention- based data documentation Good step-up to a data catalog
dbt packages: don’t reinvent the wheel Similar to libraries in software development Benefit from global knowledge by using pre-built common data transformations and data modelling techniques Share publicly or privately within your organization Can contain models (transformations), macros, tests, …
Date dimension in 1 line
Workshop setup https://debruyn.dev/dbtdmf
Workshop setup
Workshop setup https://www.getdbt.com/signup
Concepts
CTEs (Common Table Expressions)
dbt models
Materialization
dbt source
Staging models
Data marts
Jinja
dbt macro
DAG (Directed Acyclic Graph)
Compile time / run time
Introspective queries
dbt seeds
dbt snapshots https://en.wikipedia.org/wiki/Slowly_chan ging_dimension
dbt analyses
dbt commands
dbt init
dbt compile
dbt run
dbt test
dbt seed
dbt build
dbt docs generate
Recommended packages
There is more Hooks & operations Run Python models Manage access with grants Track dataset usage in BI & ML with exposures Data contracts …
Accomplish great things Version controlled and reproducible ↗ Collaboration within the team & other teams Built-in docs & lineage ↗ Know and understand your data Test code & data ↗ Deploy & run with confidence Modular & easy to use ↗ Easy to extend and maintain
Your next steps
Questions? sam@debruyn.dev https://debruyn.dev
Stay in the loop
See you at the next one?
I announce upcoming talks on LinkedIn — that's also where most of the conference chatter happens. Slides and recordings land right here on the speaking page. If you'd rather follow along quietly, the RSS feed has every new post and talk.