Home About Services Speaking Blog
← All speaking
Medallion Microsoft Fabric Data Engineering Data Platform Data Lakehouse Data Warehouse OneLake Architecture Fabric Shortcuts Data Mesh Security Capacity Management Sql Analytics Engineering Data Governance

Medallion for Data Mesh: Exploring Workspace, Capacity, and Domain Design in Microsoft Fabric

dataMinds Connect — Mechelen, Belgium
About this talk

Designing a medallion architecture is more than just setting up a few Lakehouses and Data Warehouses. In this session, data architects and engineers learn how to design a medallion architecture with Fabric Workspaces, Capacities, and Domains. We’ll cover the segregation of responsibilities, granular access control, and how to pick and choose the right Capacities for every workload. By the end of this talk, you’ll be ready to design a scalable and secure data platform with Microsoft Fabric.

In these slides
  1. Data Mesh
  2. Medallion
  3. Workspaces & Capacities
  4. Medallion & Data Mesh on Fabric
  5. Capacity design for scalability
  6. Access control & Domains
From the event
7 photos
MEDALLION FOR DATA MESH Sam Debruyn DataMinds Connect October 2024 EXPLORING WORKSPACE, CAPACITY, AND DOMAIN DESIGN Thank you, partners 💖 Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 Fabric, Azure, modern data stack What we'll talk about Data Mesh Medallion Workspaces & Capacities Medallion & Data Mesh on Fabric Capacity design for scalability Access control & Domains Data mesh Data Mesh? First introduced in 2019 by Zhamak Deghani at ThoughtWorks Overcoming challenges of the monolithic data lake © ThoughtWorks, 2020 The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE More content on Data Mesh Microsoft Cloud Adoption Framework Initial blog post on data mesh Second blog post on data mesh Free PDF copy of the Data Mesh book Medallion layers The 3 Layers of the Medallion Architecture The 3 Layers of the Medallion Architecture The 3 Layers of the Medallion Architecture Curated/gold Purpose : high-quality data supporting business reporting, advanced analytics. Pre-aggregated and tailored to analytical needs. Data platform architectural design questions So… how do we bring these concepts together? Data mesh and medallion with Fabric Let’s look at Workspace design for medallion and data mesh in Microsoft Fabric The “Hello World” version Bronze Bronze Shortcuts: Fabric cornerstones Other ways to ingest data into Bronze Bronze layer layout Bronze layer layout Silver Silver Silver layer layout Gold Gold Gold Gold Overview Overview: entire platform (example) Did I invent this? No, this is also how Microsoft recommends it Easy to extend Platinum What about Advanced Analytics? Or specific use-cases not fitting into regular Gold Workspaces? Hyper-specialized Workspaces can be conceived similarly to Azure resource groups / “project folders” Workspaces & Capacities Why should you create separate Workspaces? Why should you create separate Workspaces? Why should you create separate Workspaces? Fabric concepts: Workspaces & Capacities Capacity • pool of Capacity Units • matches a certain amount of compute power • to be spread amongst one or more Workspaces Workspace • logical grouping of items • Lakehouses, Warehouses, Reports, KQL, … • possible access control boundary Capacities Capacity SKUs Bursting & smoothing Bursting & smoothing SKU CU’s Available CUs for interactive 10min workloads Available CUs for background 24h workloads Actual workload duration & consumption F2 2 1.200 172.800 ASAP* F4 4 2.400 345.600 ASAP* F8 8 4.800 691.200 ASAP* F16 16 9.600 1.382.400 ASAP* F32 32 19.200 2.764.800 ASAP* F64 64 38.400 5.529.600 ASAP* F128 128 76.800 11.059.200 ASAP* … … … … … Impact of SKU choice Capacities determine feature availability E.g. CoPilot, Power BI only F64 or higher Capacities determine how features are available Nodes and cores/node in Spark (2 vCores per CU – burst factor 3 | 0.25 nodes per CU) supported regions Capacity level settings Throttling Capacity Metrics App Why should you create separate Workspaces? Why should you create separate Workspaces? How access can be managed in Fabric Workspace level roles: Admin, Member, Contributor, Viewer Item sharing: Read, Edit, Share Data sharing: Read, ReadData, ReadAll OneLake RBAC (preview) Note: this will probably be improved with the introduction of OneSecurity Domains The problem: how to get an overview of tens (hundreds?) of Workspaces Domains Domains: OneLake Data Hub Recap RECAP: The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE RECAP: Medallion layers: bronze, silver, gold Recap Split Workspaces by type of workload and role in the data fabric Single Capacities are good for trials, but we should avoid them for actual implementations Access control can be complex, start by managing access on the Workspace level Bundle Workspaces in Domains Sam’s 5 golden rules for Workspace & Capacity design in Fabric Slides Slides available at https://debruyn.dev/ dmc24 Thank you, partners 💖 Session Feedback 💖 h"ps://bit.ly/dMC2024_SessionFeedback Questions? sam@debruyn.dev https://debruyn.dev MEDALLION FOR DATA MESH Sam Debruyn DataMinds Connect October 2024 EXPLORING WORKSPACE, CAPACITY, AND DOMAIN DESIGN Thank you, partners 💖 Who am I? Sam Debruyn 📍 Heist-op-den-Berg, BE 💼 Consultant / Data & Cloud Architect 5⃣ years in data 🔟 years in software / architecture / cloud 🫶 Fabric, Azure, modern data stack What we'll talk about Data Mesh Medallion Workspaces & Capacities Medallion & Data Mesh on Fabric Capacity design for scalability Access control & Domains Data mesh Data Mesh? First introduced in 2019 by Zhamak Deghani at ThoughtWorks Overcoming challenges of the monolithic data lake © ThoughtWorks, 2020 The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE More content on Data Mesh Microsoft Cloud Adoption Framework Initial blog post on data mesh Second blog post on data mesh Free PDF copy of the Data Mesh book Medallion layers The 3 Layers of the Medallion Architecture The 3 Layers of the Medallion Architecture The 3 Layers of the Medallion Architecture Curated/gold Purpose : high-quality data supporting business reporting, advanced analytics. Pre-aggregated and tailored to analytical needs. Data platform architectural design questions So… how do we bring these concepts together? Data mesh and medallion with Fabric Let’s look at Workspace design for medallion and data mesh in Microsoft Fabric The “Hello World” version Bronze Bronze Shortcuts: Fabric cornerstones Other ways to ingest data into Bronze Bronze layer layout Bronze layer layout Silver Silver Silver layer layout Gold Gold Gold Gold Overview Overview: entire platform (example) Did I invent this? No, this is also how Microsoft recommends it Easy to extend Platinum What about Advanced Analytics? Or specific use-cases not fitting into regular Gold Workspaces? Hyper-specialized Workspaces can be conceived similarly to Azure resource groups / “project folders” Workspaces & Capacities Why should you create separate Workspaces? Why should you create separate Workspaces? Why should you create separate Workspaces? Fabric concepts: Workspaces & Capacities Capacity • pool of Capacity Units • matches a certain amount of compute power • to be spread amongst one or more Workspaces Workspace • logical grouping of items • Lakehouses, Warehouses, Reports, KQL, … • possible access control boundary Capacities Capacity SKUs Bursting & smoothing Bursting & smoothing SKU CU’s Available CUs for interactive 10min workloads Available CUs for background 24h workloads Actual workload duration & consumption F2 2 1.200 172.800 ASAP* F4 4 2.400 345.600 ASAP* F8 8 4.800 691.200 ASAP* F16 16 9.600 1.382.400 ASAP* F32 32 19.200 2.764.800 ASAP* F64 64 38.400 5.529.600 ASAP* F128 128 76.800 11.059.200 ASAP* … … … … … Impact of SKU choice Capacities determine feature availability E.g. CoPilot, Power BI only F64 or higher Capacities determine how features are available Nodes and cores/node in Spark (2 vCores per CU – burst factor 3 | 0.25 nodes per CU) supported regions Capacity level settings Throttling Capacity Metrics App Why should you create separate Workspaces? Why should you create separate Workspaces? How access can be managed in Fabric Workspace level roles: Admin, Member, Contributor, Viewer Item sharing: Read, Edit, Share Data sharing: Read, ReadData, ReadAll OneLake RBAC (preview) Note: this will probably be improved with the introduction of OneSecurity Domains The problem: how to get an overview of tens (hundreds?) of Workspaces Domains Domains: OneLake Data Hub Recap RECAP: The 4 Principles of the Data Mesh DOMAIN-ORIENTED DECENTRALIZED DATA OWNERSHIP DATA PRODUCT THINKING SELF-SERVICE ANALYTICS FEDERATED GOVERNANCE RECAP: Medallion layers: bronze, silver, gold Recap Split Workspaces by type of workload and role in the data fabric Single Capacities are good for trials, but we should avoid them for actual implementations Access control can be complex, start by managing access on the Workspace level Bundle Workspaces in Domains Sam’s 5 golden rules for Workspace & Capacity design in Fabric Slides Slides available at https://debruyn.dev/ dmc24 Thank you, partners 💖 Session Feedback 💖 h"ps://bit.ly/dMC2024_SessionFeedback Questions? sam@debruyn.dev https://debruyn.dev