Home About Services Speaking Blog
← All speaking
Soda Data Quality Data Governance Microsoft Fabric Data Warehouse Data Lakehouse OneLake Apache Spark Notebooks

From Flat to Sparkling: Monitoring Data Quality with Soda in Microsoft Fabric

Cloudbrew — Mechelen, Belgium
About this talk

After loading all of your data into OneLake, you might be wondering how to ensure its quality. In this session, we’ll explore and demonstrate how Soda can help you monitor data quality with automated data testing and alerting, write and monitor data contracts, and browse data quality metrics in a user-friendly dashboard.

The first issue data engineers encounter after loading their data is the low quality of that data. This talk learns them how to get a good view on that data quality and how to take measures to improve that.

In these slides
  1. Data quality
  2. Data quality management
  3. What is Soda?
  4. Soda in Fabric
FROM FLAT TO SPARKLING Sam Debruyn Cloudbrew December 2024 MONITORING DATA QUALITY WITH SODA IN MICROSOFT FABRIC Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 Fabric, Azure, modern data stack What we'll talk about Data quality Data quality management What is Soda? Soda in Fabric Why proactive data quality monitoring is important source 1 source 2 NASA (1999) American Airlines (2017) Samsung Securities (2018) Spanish Navy (2013) British Post Office / Fujitsu (1999-2015) JR’s Demolition (2020) Data quality vs. data quality management Data quality dimensions What is data quality management? Different ways to perform data quality monitoring in Fabric Soda concepts Soda concepts Di#erent flavours of Soda Soda Core Open-source Free Python package and CLI Soda Library Closed source Paid Python package and CLI Complex checks Soda Agent Containerized version of Soda Library meant to run checks and submit results to Soda Cloud Pull architecture (no open ports needed) Soda Cloud SaaS / cloud product Extensive dashboard Actionable alerting How Soda integrates with Fabric How Soda integrates with Fabric How Soda integrates with Fabric Getting started Demo: running Soda scans in Fabric There is more Monitor the characteristics of a column with distribution monitoring Write and verify data contracts Integrate with Soda Agent Data discussions Integrate with Slack, Teams, … for alerting Integrate with data catalogs like Purview, Atlan, … Use SodaGPT/AskAI to add new checks … Accomplish great things Version controlled ↗ Collaboration within the team Connect to the most common data sources ↗ Monitor your data where it is today Simple syntax to add new checks ↗ Make DQ accessible to less tech-savvy users Store results & metrics ↗ Gain insight in long-term data quality Slides Slides available at https://debruyn.dev/ cb24 Questions? sam@debruyn.dev https://debruyn.dev