Home About Services Speaking Blog
← All speaking
Analytics Engineering Data Engineering Data Platform Data Quality Dbt Dbt Cloud Meetup Microsoft Fabric Open-Source Python Sql

From Fabric to Fantastic: How dbt Makes Your Lakehouses and Warehouses Shine

Microsoft Fabric Global Online Conference — online
About this talk

I gave a session on dbt at the Microsoft Fabric Global Online Conference. If you want to learn more about Microsoft Fabric, get your ticket for this free online event: https://microsoftfabric.global/

Unlock the full potential of your Microsoft Fabric Data Warehouse or Lakehouse. This session dives into dbt, a tool that streamlines SQL development within Fabric. It lowers the barrier of entry into the world of data analytics to everyone who ever wrote a line of SQL. Learn how dbt empowers data teams with functionalities like data documentation, automated testing, and data lineage for reliable and insightful analytics.

In these slides
  1. dbt-fabric lookback
  2. Lakehouse vs. Warehouse
  3. Where does SQL fit in?
  4. Introducing dbt
  5. Analytics Engineering
  6. Modular Development
  7. Sources
  8. Data Lineage
  9. Data Tests & Unit Tests
  10. Documentation
From the event
1 photos
FROM FABRIC TO FANTASTIC Sam Debruyn Fabric Global Online Conference September 2024 HOW DBT MAKES LAKEHOUSES AND WAREHOUSES SHINE Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 dbt, Microsoft, modern data stack dbt-fabric: a quick lookback Lakehouse/Warehouse Where does SQL fit in? Different ways to transform data Programming languages Python and Scala. High learning curves and often creates a boundary between business users and specialized engineers. Very powerful and easy to maintain. Declarative languages SQL, SAS, and the likes. Code is easy to write and understand but offers limited flexibility and can be hard to maintain (adopting software eng. best practices). Low-code / UI-based Easy to adopt, use, and achieve results. Very high vendor lock-in and limited flexibility and modularity. A quick survey done at local meetups Programming languages 2023 source: Stack Overflow worked with / wants to work with The common language of data transformations is not drag-and- drop Data architects ERWIN Wherescape Data engineers Informatica Matillion Analytics engineers Alteryx Talend BI developers Tableau Qlik Analysts Excel / Sheets Power BI The common language of data transformations is SQL Data architects ERWIN Wherescape SQL Data engineers Informatica Matillion SQL Analytics engineers Alteryx Talend SQL BI developers Tableau Qlik SQL Analysts Excel / Sheets Power BI SQL Introducing dbt Open-source Python utility for building data transformations Free/OSS version: dbt Core / version with all the bells & whistles included: dbt Cloud The de facto default tool for analytics engineering 3 things to know No compute dbt requires a data warehouse to function, it only sends SQL queries SQL with Jinja dbt is built for SQL, in some cases you can also use Python Free/self-hosted or cloud dbt Core is free but requires "plumbing" (e.g. an orchestrator) dbt Cloud is paid, but will be cheaper than building everything around it manually dbt adoption past 6 years 2020 2019 2021 2022 2018 2017 October 2023: 30000+ weekly active projects Modular development Write transformations in separate version-controlled files SQL on steroids with Jinja: control logic, loops Customize and parametrize with variables Reusable code blocks with macros Easy to follow DRY principles Manage data sources and monitor data freshness Sources Sources Dynamic schema selection Start tracking lineage from the source Data lineage Understand the flow of data Impact of modifying a transformation How a dimension/fact is constructed Data lineage Spot and detect bad data model design Data tests & unit tests Automated testing for your code, as well as for your data Tests can be integrated in other tooling to get a good view on your data quality Simple YAML- or SQL-based syntax to define tests Documentation and tests dbt docs Clear convention- based data documentation Good step-up to a data catalog dbt packages: don’t reinvent the wheel Similar to libraries in software development Benefit from global knowledge by using pre-built common data transformations and data modelling techniques Share publicly or privately within your organization Can contain models (transformations), macros, tests, … Date dimension in 1 line There is more Implement SCD with snapshots Incremental loads Hooks & operations Run Python models through Spark (coming soon on Fabric) Manage access with grants Track dataset usage in BI & ML with exposures Data contracts … Compatibility Accomplish great things Version controlled and reproducible ↗ Collaboration within the team & other teams Built-in docs & lineage ↗ Know and understand your data Test code & data ↗ Deploy & run with confidence Modular & easy to use ↗ Easy to extend and maintain Your next steps Questions? sam@debruyn.dev https://debruyn.dev