The largest Microsoft Fabric community event in Belgium
What does it take to roll out Microsoft Fabric in a real-life data team? In this session, we’ll walk through the journey of implementing Fabric at a mid-sized organization, from first pilot to full adoption. You’ll hear what worked, what didn’t, and what we learned along the way.
We’ll cover how we approached access control, workspace setup, setup of pipelines and notebooks, dbt, and monitoring, and where we had to get creative with workarounds. But we’ll also show how Fabric’s simplicity and rapid evolution made it a solid choice, even for teams without massive engineering resources. Whether you’re considering Fabric or already using it, this talk will help you steer clear of common pitfalls and get the most out of what Fabric has to offer.
In these slides
Context
Design phase
Pilot
Implementation phase
What did not go well
The future
Disclaimer You are NOT allowed to reuse, modify, or redistribute these slides without my prior written consent. Email me at sam@debruyn.dev to request permission. Why this notice? I presented these slides myself, and they rely heavily on the verbal explanations I provided. Not everything is written down, especially details about future, undocumented features that Microsoft mentioned verbally at conferences or on social media. Because these items may still change, I want to avoid spreading inaccurate or outdated information.
Lessons from Migrating to Fabric Fabric Winterfest Sam Debruyn Thread by Thread
Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Freelance Data Platform Architect / Data Engineer 6⃣ years in data 🔟 + years in software / architecture / cloud 🫶 Fabric, Microsoft, modern data platforms
What we'll talk about Context Design phase Pilot Implementation phase What did not go well The future Slides available at the end
Context Mid-sized organization Centralized data team ± 15 people Most use cases were marketing related Lots of PII data Some things migrated to Synapse Medallion structure Low cloud adoption maturity No SE skills
Why Fabric SaaS Learning curve Power BI Simplify AI adoption Integration with Microsoft stack
Design phase
1. Workspace and Capacity design 👉 Impact of data consumption on data loads 👉 No impact of one data team on the other 👉 Non-prod / prod environment separation Capacity Units Development Testing Production Team A 8 4 16 Team B 2 2 8 Team C 4 2 8 * Examples not based on reality
To Medallion or… not to Medallion?
The 3 Layers of the Medallion Architecture
The 3 Layers of the Medallion Architecture
The 3 Layers of the Medallion Architecture Curated/gold Purpose : high-quality data supporting business reporting, advanced analytics. Pre-aggregated and tailored to analytical needs.
Nobody actually does medallion
Medallion in Fabric… my approach Data landing in its raw form, no modifications applied Raw Column renames Data type corrections Filter invalid records Cleansed / staging Rename tables Join tables together Build reusable components Base Dimensions & facts Curated Structure in the data model Linking together Metrics/measures Semantic Reporting Activation AI Consumption
2. Architecture design
3. Pilot Setup of a few Workspaces Access with Entra ID Groups + PIM OneLake Shortcuts to data already loaded Shortcuts to Shortcuts to Shortcuts to … Limitations: The maximum number of shortcuts in a single OneLake path is 10. The maximum number of direct shortcuts to shortcut links is 5.
Shortcuts Foundational concept of Microsoft Fabric Once you understand Shortcuts, you understand OneLake Currently only available in Lakehouses In our SQL-first design, we used Lakehouses as ‘Shortcut Hubs’ Can be used to provide source data cross- environment
Lakehouse or Warehouse Lakehouse Warehouse Owner of the Delta Lake tables 👤 Main engine Polaris Automatic performance tuning (VACUUM, OPTIMIZE, V-ORDER) ❌ ✅ Configurable time travel ✅ ❌ - always 30 days Security OneLake Security (+ SQL security in SQL Analytics Endpoint) SQL security Future: OneLake Security? T-SQL transformations Read-only (views) ✅ Shortcuts ✅ ❌
Implementation phase
Infrastructure as Code - Terraform Why? • Reproducible environments • Simplify Workspace setup, provide domain teams with a few default components • Wire up access control, Shortcuts, CI/CD, … Why not? • Data users should focus and be proficient in data skills – IaC skills are a distraction • Ease of use – creating components in the UI 💡 When something is not available through the Fabric Terraform Provider, you can use the Mastercard-built REST API Terraform provider. ⚠ There is Rate Limiting on the API
Spark is often not a good match because 1. Few users actually understand Spark clusters and data distribution techniques à Spark jobs often run with few threads while a multitude of CPU cores are provisioned à Waste of cloud resources à Impact on cost and sustainability 💸 🌳 2. Few users actually follow SE best practices (Python, linting, formatting, OOP/FP, testing, devops, …) à Notebook spaghetti 🍝 SQL-first
(Side note: I built the dbt support Fabric) Low learning curve Bring the essentials of SE into DE dbt-fabric-samdebruyn is feature-complete and battle-tested dbt
Entra ID & PIM integration
OneLake Security: before and after
Monitoring
Living on the edge The Microsoft Fabric we worked with during our Pilot was very different from the Fabric at the end Great contact with Microsoft Support, Microsoft Belgium, Fabric Community, SuperUsers, MVPs, Fabric CAT team, Fabric product groups Feedback opportunities Weekly releases Basically your data platform becomes better while you sleep. Working on the edge of innovation means accepting change as part of the architecture.
What did not go well
Security context and ownership Very unpredictable when you run a Notebook / Data Pipeline / Spark Job Sometimes the “Submitter” is correct, most of the time it isn’t I was told “the person who last modified the item” It seems that opening something is enough to “modify” it
Orchestration
Data warehouse: collations Default collation: case sensitive (CS) Introduced in 2024: option to set case insensitive collation (CI) However… this then only sets the collation for the data warehouse Data loaded through SQL Analytics Endpoints (Lakehouse – Shortcuts) is still CS à You have to start casting everything select * from dbo.dim_customer select * from dbo.Dim_Customer
Data warehouse: CU explosions 💥 Example: F4 Capacity in WE (€327,50/month @ RI) 1. Friday afternoon: SQL dev writes monster query (e.g. complex cartesian product of multiple tables with each 100M rows) 2. State before query: 99.99% CU used 3. Check succeeds 4. Query starts executing – no other queries executing 5. Max nodes potentially assigned: 4 CU Baseline + 32 CU Burst 6. SQL dev disconnects for the weekend – query continues to process in background 7. Monday morning: email 100% Capacity reached 8. Pause & Resume Capacity (max CU = no possibility to cancel running query to smooth out usage) 9. Consumed CU = 4+32 CU @ 2.5 days 10. Billed overage: 36 CU @ PAYG cost for 2.5 days = ± €413,07 You submit a query Check: did you consume 100% of your CU? Yes Error: max CU reached No Query plan SQL nodes start processing until completion
Capacity Management When the CU exceeds 100%, lots of tools to monitor the usage stop working Why is this an App and not a built-in page? Slow, unresponsive, uses a human’s account
Startup pools
ALM (DevOps / CI/CD) During the early days of Fabric, this was often seen as an afterthought Today: lots of options, but no perfection yet • Deployment Pipelines 🐌 • Git integration 🌟 • fabric-cicd 🐍 • Terraform 🤓 • Fabric CLI 🛠 • SDKs 🐍 • REST APIs 🛠 • …
Areas for improvement
The future is… bright
Questions? sam@debruyn.dev https://debruyn.dev https://debruyn.dev/snow
Stay in the loop
See you at the next one?
I announce upcoming talks on LinkedIn — that's also where most of the conference chatter happens. Slides and recordings land right here on the speaking page. If you'd rather follow along quietly, the RSS feed has every new post and talk.