Preparing a migration to Microsoft Fabric: from Azure Synapse Serverless SQL
If all those posts about Microsoft Fabric have made you curious, you might want to consider it as your next data platform. Since it is very new, not all features are available yet and most are still in preview. You could already adopt it, but if you want to deploy this to a production scenario, you’ll want to wait a bit longer. In the meantime, you can already start preparing for the migration. Let’s dive into the paths to migrate to Microsoft Fabric. Today: starting from Synapse Serverless SQL Pools.
All posts in this series
- Preparing a migration to Microsoft Fabric: from Azure Synapse Serverless SQL
- Migrating Azure Synapse Dedicated SQL to Microsoft Fabric
Are Azure Synapse Serverless SQL Pools going away?
No! Microsoft is very clear about this. They currently have no plans at all to deprecate Azure Synapse Serverless SQL Pools or even Azure Synapse Analytics Workspaces. You don’t have to migrate to Microsoft Fabric and you can just stay a happy camper in the Serverless SQL Pools. Fabric is built as some kind of operating system on which a lot of applications can run: Synapse Lakehouse, Synapse Warehouse, Synapse Serverless SQL, and more. However, if you want to take advantage of the new features in Microsoft Fabric, this post might be for you.
Microsoft Fabric and Synapse Serverless: a shared engine
Microsoft Fabric’s Lakehouse and Warehouse engines and Synapse Serverless SQL all share the same engine at their core: Polaris. This engine was built from the ground up to serve the needs of today’s data platforms. It is a distributed, columnar, in-memory engine that is optimized for analytical workloads. With Synapse Serverless SQL, Microsoft released the first half of its engine: the compute layer, Polaris. With Microsoft Fabric’s OneLake, Microsoft released the second big half of its modern data platform engine: the storage layer.
|Synapse Dedicated SQL Pool||PDW with MPP||proprietary + PolyBase|
|Synapse Serverless SQL Pool||Polaris||ADLS|
|Synapse Lakehouse in Fabric||Polaris||OneLake|
|Synapse Warehouse in Fabric||Polaris||OneLake|
Typical Azure Synapse Serverless SQL Pools usage patterns and their migration paths
There are a few common ways that Synapse Serverless SQL Pools are used. Let’s look at the migration path for each of them.
Using Synapse Serverless as the serving layer
The most common one. Regardless of how you’re transforming the data in your data lake, at one point, the data needs to be served to your end users in the form of SQL. More often than not, through Power BI. Users in this scenario typically store the data in their data lake in Parquet or Delta Lake and then use
OPENROWSET-based views or direct
SELECT queries to expose this to SQL.
At the moment the
OPENROWSET functionality is not (yet) available in Microsoft Fabric. But probably, we won’t even need it. Fabric’s Lakehouse feature can automatically discover Delta Lake tables in your data lake and automatically exposes these through T-SQL. All you have to do is to create Shortcuts to these tables. Don’t worry, I’ve written an entire blog post
about this specific topic. 😉
Not using Delta Lake yet? Fabric uses Delta Lake at its core, so if you’re considering a migration, your data will end up as Delta Lake at one point anyway. There are a few routes you can take to convert your data to Delta Lake. This can be using Spark jobs or notebooks (keep reading for even easier options 😁) to read the Parquet data and then write the same DataFrame to a Delta Lake table.
Often, you can go about this a lot easier. Fabric supports ingesting your Parquet datasets into the Lakehouse as Delta using a simple right-click. This should show you the Load to table option. Note that this approach is only available for single-file Parquet datasets, not for partitioned Parquet datasets.
A previous version of this blog post had a paragraph here about using Azure Synapse Serverless to create the Delta Lake tables. That was incorrect and has been removed. Thanks to Andy Cutler for pointing this out 💙!
Using Synapse Serverless as the transformation layer
Data transformations in Synapse Serverless are often done with SQL Views or with the CETAS statements mentioned above.
In the case of SQL views, all you’ll have to do is to get your data in OneLake as Delta Lake datasets. That way you won’t have to rely on
OPENROWSET anymore and you can just use the Delta Lake tables directly in your SQL views.
If your code is full of CETAS statements, there is more work to do. Synapse SQL in Fabric does not support the use of external tables. However, Fabric can easily detect your tables as long as they are created in the Delta Lake format. The sections above explain how to achieve this. My recommended approach here would be to switch to the Delta Lake format as soon as possible and consider replacing the CETAS statements with simple SQL Views. This brings you a bit closer to Fabric and makes the migration easier.
In this post, we’ve looked at the migration paths from Synapse Serverless SQL Pools to Microsoft Fabric. In future posts in this series, I will look at migrations from other platforms, but Synapse Serverless SQL is definitely one of the easiest workloads to migrate to Microsoft Fabric. Follow me on social media to stay up to date with the other upcoming posts in this series.
You might also like
Welcome to the 3rd generation: SQL in Microsoft Fabric
While typing this blog post, I’m flying back from the Data Platform Next Step conference where I gave a talk about using dbt with Microsoft Fabric . DP Next Step was the first conference focussed on Microsoft data services right after the announcement of Microsoft Fabric so a lot of speakers were Microsoft employees and most of the talks had some Fabric content. Fabric Fabric Fabric, what is it all about? In this post I’ll go deeper into what it is, why you should care and focus specifically on the SQL aspect of Fabric.
Microsoft Fabric's Auto Discovery: a closer look
In previous posts , I dug deeper into Microsoft Fabric’s SQL-based features and we even explored OneLake using Azure Storage Explorer . In this post, I’ll take a closer look at Fabric’s auto-discovery feature using Shortcuts. Auto-discovery, what’s that? Fabric’s Lakehouses can automatically discover all the datasets already present in your data lake and expose these as tables in Lakehouses (and Warehouses). Cool, right? At the time of writing, there is a single condition: the tables must be stored in the Delta Lake format. Let’s take a closer look.