Home About Services Speaking Blog
← All speaking
Analytics Engineering Data Engineering Data Platform Dbt Microsoft Fabric Sql Data Warehouse Data Lakehouse Open-Source ELT Python

SQL Resurgence: Unleashing data potential with dbt

Fabric Winterfest — Snow World, Antwerp, Belgium
About this talk

The largest Microsoft Fabric community event in Belgium

Join us on an exhilarating exploration of the data landscape as we delve into the phenomenon that is dbt. It has taken the world by storm and is now the most popular data transformation tool. Let’s dive into this new era and witness the renaissance of SQL at the core of data analytics, bringing it back to those who know the data best.

In these slides
  1. Different Ways to Transform Data
  2. Why SQL?
  3. Introducing dbt
  4. dbt Core vs dbt Cloud
  5. Compatibility
  6. Getting Started
  7. Modular Development
  8. Sources
  9. Data Lineage
  10. Data Tests & Unit Tests
  11. Documentation
SQL RESURGENCE Sam Debruyn Fabric Winterfest UNLEASHING DATA POTENTIAL WITH DBT Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Freelance Data Platform Architect / Data Engineer 6⃣ years in data 🔟 + years in software / architecture / cloud 🫶 dbt, Fabric, modern data platforms Different ways to transform data Programming languages Python and Scala. High learning curves and often creates a boundary between business users and specialized engineers. Very powerful and easy to maintain. Declarative languages SQL, SAS, and the likes. Code is easy to write and understand but offers limited flexibility and can be hard to maintain (adopting software eng. best practices). Low-code / UI-based Easy to adopt, use, and achieve results. Very high vendor lock-in and limited flexibility and modularity. Programming languages 2024 Introducing dbt Open-source Python utility for building data transformations dbt = T in ELT Helps you build and manage your data transformations in SQL Often seen as alternative to stored procedures, Spark jobs, … Analytics engineering Your entire analytics engineering workflow Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it . dbt allows anyone comfortable with SQL to own that workflow. While data scientists and analysts are writing a lot of code, being great software engineers isn’t what they’ve been trained for and it often isn’t their first priority. Similarly, while data engineers are great software engineers, they don’t have training in how they data are actually used and so can’t always partner effectively with analysts and data scientists. I believe this gap should be filled in by analytics engineers. Michael Kaminsky, 2019 3 things to know No compute dbt requires a data warehouse to function, it only sends SQL queries SQL with Jinja dbt is built for SQL, in some cases you can also use Python Free/self-hosted or cloud dbt Core is free but requires "plumbing" (e.g. an orchestrator) dbt Cloud is paid, but will be cheaper than building everything around it manually dbt adoption 2020 2019 2021 2022 2018 2017 October 2023: 30000+ weekly active projects dbt adoption 80K+ teams using dbt 30% YoY growth paying customers Spark’s popularity is not increasing anymore Modular development Write transformations in separate version-controlled files SQL on steroids with Jinja: control logic, loops Customize and parametrize with variables Reusable code blocks with macros Easy to follow DRY principles Manage data sources and monitor data freshness Sources Sources Dynamic schema selection Start tracking lineage from the source Data lineage Understand the flow of data Impact of modifying a transformation How a dimension/fact is constructed Data lineage Spot and detect bad data model design Data tests & unit tests Automated testing for your code, as well as for your data Tests can be integrated in other tooling to get a good view on your data quality Simple YAML- or SQL-based syntax to define tests Documentation and tests dbt docs Clear convention- based data documentation Good step-up to a data catalog dbt packages: don’t reinvent the wheel Similar to libraries in software development Benefit from global knowledge by using pre-built common data transformations and data modelling techniques Share publicly or privately within your organization Can contain models (transformations), macros, tests, … Compatibility dbt-fabric: a walk down memory lane The new dbt-fabric has arrived Easier authentication and configuration MERGE in incremental and microbatch models Python/PySpark models dbt Core 1.10 support Bugfixes 🐛 🏗 More coming soon! pip install dbt-fabric-samdebruyn 📚 https://dbt-fabric.debruyn.dev Accomplish great things Version controlled and reproducible ↗ Collaboration within the team & other teams Built-in docs & lineage ↗ Know and understand your data Test code & data ↗ Deploy & run with confidence Modular & easy to use ↗ Easy to extend and maintain Your next steps Questions? sam@debruyn.dev https://debruyn.dev https://debruyn.dev/snow-dbt