Medallion for Data Mesh: Exploring Workspace, Capacity, and Domain Design in Microsoft Fabric dataMinds Connect — Mechelen, Belgium
09 October 2024
About this talk
Designing a medallion architecture is more than just setting up a few Lakehouses and Data Warehouses. In this session, data architects and engineers learn how to design a medallion architecture with Fabric Workspaces, Capacities, and Domains. We’ll cover the segregation of responsibilities, granular access control, and how to pick and choose the right Capacities for every workload. By the end of this talk, you’ll be ready to design a scalable and secure data platform with Microsoft Fabric.
In these slides
Data Mesh Medallion Workspaces & Capacities Medallion & Data Mesh on Fabric Capacity design for scalability Access control & Domains ×
‹
› MEDALLION FOR DATA MESH Sam Debruyn DataMinds Connect October 2024 EXPLORING WORKSPACE, CAPACITY, AND DOMAIN DESIGN
Thank you, partners 💖
Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 Fabric, Azure, modern data stack
What we'll talk about Data Mesh Medallion Workspaces & Capacities Medallion & Data Mesh on Fabric Capacity design for scalability Access control & Domains
Data mesh
Data Mesh? First introduced in 2019 by Zhamak Deghani at ThoughtWorks Overcoming challenges of the monolithic data lake © ThoughtWorks, 2020
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE
More content on Data Mesh Microsoft Cloud Adoption Framework Initial blog post on data mesh Second blog post on data mesh Free PDF copy of the Data Mesh book
Medallion layers
The 3 Layers of the Medallion Architecture
The 3 Layers of the Medallion Architecture
The 3 Layers of the Medallion Architecture Curated/gold Purpose : high-quality data supporting business reporting, advanced analytics. Pre-aggregated and tailored to analytical needs.
Data platform architectural design questions
So… how do we bring these concepts together? Data mesh and medallion with Fabric
Let’s look at Workspace design for medallion and data mesh in Microsoft Fabric
The “Hello World” version
Bronze
Bronze
Shortcuts: Fabric cornerstones
Other ways to ingest data into Bronze
Bronze layer layout
Bronze layer layout
Silver
Silver
Silver layer layout
Gold
Gold
Gold
Gold
Overview Overview: entire platform (example)
Did I invent this? No, this is also how Microsoft recommends it
Easy to extend
Platinum What about Advanced Analytics? Or specific use-cases not fitting into regular Gold Workspaces? Hyper-specialized Workspaces can be conceived similarly to Azure resource groups / “project folders”
Workspaces & Capacities
Why should you create separate Workspaces?
Why should you create separate Workspaces?
Why should you create separate Workspaces?
Fabric concepts: Workspaces & Capacities Capacity • pool of Capacity Units • matches a certain amount of compute power • to be spread amongst one or more Workspaces Workspace • logical grouping of items • Lakehouses, Warehouses, Reports, KQL, … • possible access control boundary
Capacities
Capacity SKUs
Bursting & smoothing
Bursting & smoothing SKU CU’s Available CUs for interactive 10min workloads Available CUs for background 24h workloads Actual workload duration & consumption F2 2 1.200 172.800 ASAP* F4 4 2.400 345.600 ASAP* F8 8 4.800 691.200 ASAP* F16 16 9.600 1.382.400 ASAP* F32 32 19.200 2.764.800 ASAP* F64 64 38.400 5.529.600 ASAP* F128 128 76.800 11.059.200 ASAP* … … … … …
Impact of SKU choice Capacities determine feature availability E.g. CoPilot, Power BI only F64 or higher Capacities determine how features are available Nodes and cores/node in Spark (2 vCores per CU – burst factor 3 | 0.25 nodes per CU)
supported regions Capacity level settings
Throttling
Capacity Metrics App
Why should you create separate Workspaces?
Why should you create separate Workspaces?
How access can be managed in Fabric Workspace level roles: Admin, Member, Contributor, Viewer Item sharing: Read, Edit, Share Data sharing: Read, ReadData, ReadAll OneLake RBAC (preview) Note: this will probably be improved with the introduction of OneSecurity
Domains
The problem: how to get an overview of tens (hundreds?) of Workspaces
Domains
Domains: OneLake Data Hub
Recap
RECAP: The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE
RECAP: Medallion layers: bronze, silver, gold
Recap Split Workspaces by type of workload and role in the data fabric Single Capacities are good for trials, but we should avoid them for actual implementations Access control can be complex, start by managing access on the Workspace level Bundle Workspaces in Domains
Sam’s 5 golden rules for Workspace & Capacity design in Fabric
Slides Slides available at https://debruyn.dev/ dmc24
Thank you, partners 💖
Session Feedback 💖 h"ps://bit.ly/dMC2024_SessionFeedback
Questions? sam@debruyn.dev https://debruyn.dev
MEDALLION FOR DATA MESH Sam Debruyn DataMinds Connect October 2024 EXPLORING WORKSPACE, CAPACITY, AND DOMAIN DESIGN
Thank you, partners 💖
Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 Fabric, Azure, modern data stack
What we'll talk about Data Mesh Medallion Workspaces & Capacities Medallion & Data Mesh on Fabric Capacity design for scalability Access control & Domains
Data mesh
Data Mesh? First introduced in 2019 by Zhamak Deghani at ThoughtWorks Overcoming challenges of the monolithic data lake © ThoughtWorks, 2020
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS
The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE
More content on Data Mesh Microsoft Cloud Adoption Framework Initial blog post on data mesh Second blog post on data mesh Free PDF copy of the Data Mesh book
Medallion layers
The 3 Layers of the Medallion Architecture
The 3 Layers of the Medallion Architecture
The 3 Layers of the Medallion Architecture Curated/gold Purpose : high-quality data supporting business reporting, advanced analytics. Pre-aggregated and tailored to analytical needs.
Data platform architectural design questions
So… how do we bring these concepts together? Data mesh and medallion with Fabric
Let’s look at Workspace design for medallion and data mesh in Microsoft Fabric
The “Hello World” version
Bronze
Bronze
Shortcuts: Fabric cornerstones
Other ways to ingest data into Bronze
Bronze layer layout
Bronze layer layout
Silver
Silver
Silver layer layout
Gold
Gold
Gold
Gold
Overview Overview: entire platform (example)
Did I invent this? No, this is also how Microsoft recommends it
Easy to extend
Platinum What about Advanced Analytics? Or specific use-cases not fitting into regular Gold Workspaces? Hyper-specialized Workspaces can be conceived similarly to Azure resource groups / “project folders”
Workspaces & Capacities
Why should you create separate Workspaces?
Why should you create separate Workspaces?
Why should you create separate Workspaces?
Fabric concepts: Workspaces & Capacities Capacity • pool of Capacity Units • matches a certain amount of compute power • to be spread amongst one or more Workspaces Workspace • logical grouping of items • Lakehouses, Warehouses, Reports, KQL, … • possible access control boundary
Capacities
Capacity SKUs
Bursting & smoothing
Bursting & smoothing SKU CU’s Available CUs for interactive 10min workloads Available CUs for background 24h workloads Actual workload duration & consumption F2 2 1.200 172.800 ASAP* F4 4 2.400 345.600 ASAP* F8 8 4.800 691.200 ASAP* F16 16 9.600 1.382.400 ASAP* F32 32 19.200 2.764.800 ASAP* F64 64 38.400 5.529.600 ASAP* F128 128 76.800 11.059.200 ASAP* … … … … …
Impact of SKU choice Capacities determine feature availability E.g. CoPilot, Power BI only F64 or higher Capacities determine how features are available Nodes and cores/node in Spark (2 vCores per CU – burst factor 3 | 0.25 nodes per CU)
supported regions Capacity level settings
Throttling
Capacity Metrics App
Why should you create separate Workspaces?
Why should you create separate Workspaces?
How access can be managed in Fabric Workspace level roles: Admin, Member, Contributor, Viewer Item sharing: Read, Edit, Share Data sharing: Read, ReadData, ReadAll OneLake RBAC (preview) Note: this will probably be improved with the introduction of OneSecurity
Domains
The problem: how to get an overview of tens (hundreds?) of Workspaces
Domains
Domains: OneLake Data Hub
Recap
RECAP: The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE
RECAP: Medallion layers: bronze, silver, gold
Recap Split Workspaces by type of workload and role in the data fabric Single Capacities are good for trials, but we should avoid them for actual implementations Access control can be complex, start by managing access on the Workspace level Bundle Workspaces in Domains
Sam’s 5 golden rules for Workspace & Capacity design in Fabric
Slides Slides available at https://debruyn.dev/ dmc24
Thank you, partners 💖
Session Feedback 💖 h"ps://bit.ly/dMC2024_SessionFeedback
Questions? sam@debruyn.dev https://debruyn.dev
Stay in the loop
See you at the next one? I announce upcoming talks on LinkedIn — that's also where most of the conference chatter happens. Slides and recordings land right here on the speaking page. If you'd rather follow along quietly, the RSS feed has every new post and talk.