How to Read Data from Microsoft Fabric and Load It into Azure Synapse Dedicated SQL Pool

Elite-Intelx
2 min readAug 17, 2024

--

In today’s data-driven world, efficiently managing and transferring data between different platforms is crucial for businesses to harness insights and make informed decisions. This article will guide you through the process of reading data from Microsoft Fabric and loading it into an Azure Synapse Dedicated SQL Pool using PySpark. This approach is especially useful for handling large datasets, ensuring scalability, and optimizing performance.

Prerequisites

Before we dive into the code, make sure you have the following prerequisites:

  • Azure Synapse Analytics Workspace: A Synapse dedicated SQL pool created within your Azure Synapse workspace.
  • Microsoft Fabric Account: Access to a Microsoft Fabric account where your data is stored.
  • Spark Environment: A working environment with PySpark and the necessary libraries installed.

Step 1: Listing Files in Microsoft Fabric

First, we’ll list all the files available in the Fabric storage location. This allows us to identify the tables we want to load into Synapse.

result = mssparkutils.fs.ls("abfss://<container>@<fabric_storage>.dfs.fabric.microsoft.com/<folder>/Tables/")
names = [file_info.name for file_info in result]

In this step, the mssparkutils.fs.ls command lists the files in the specified Fabric directory. Replace <container>, <fabric_storage>, and <folder> with your actual container name, storage account name, and folder path.

Step 2: Setting Up the Spark Session

Next, we create a Spark session and set up various configurations required for loading the data.

from pyspark.sql import SparkSession
from pyspark.sql import DataFrame
import com.microsoft.spark.sqlanalytics
from com.microsoft.spark.sqlanalytics.Constants import Constants


spark = SparkSession.builder.appName("WriteDataFrame").getOrCreate()

#Below configuration is to handle date relates issues
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
spark.conf.set("spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED")
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled","true")

Here, we configure the Spark session to handle legacy date/time parsing and enable schema auto-merging, which is important when working with Delta Lake tables.

def load_table_to_synapse(table_name: str):
source_path = f"abfss://<container>@<fabric_storage>.dfs.fabric.microsoft.com/<folder>/Tables/{table_name}"
df = spark.read.format("delta").load(source_path) # Assuming tables are stored in Delta format

# Need to create temp_container, temp_storage and temp_folder in azure storage

(df.write
.option(Constants.SERVER, "<synapse_server_name>.sql.azuresynapse.net")
.option(Constants.TEMP_FOLDER, "abfss://<temp_container>@<temp_storage>.dfs.core.windows.net/<temp_folder>")
.mode("overwrite")
.format("delta")
.option("readChangeFeed", "true")
.synapsesql(f"<database_name>.<schema_name>.{table_name}"))

Key Points:

  • Source Path: The source_path points to the table in Microsoft Fabric. Replace the placeholders with your specific details.
  • Synapse Server: The server option should be your Synapse SQL server name.
  • Temporary Storage: A temporary storage location is required for writing data. Ensure that it has the correct permissions.
  • Delta Format: We assume the data is in Delta format, which is efficient for large datasets and supports ACID transactions.

Step 4: Executing the Load Process

Finally, we iterate over the list of table names and load each one into Synapse.

for table_name in names:
print(table_name)
load_table_to_synapse(table_name)

print("Data loading completed for all tables.")

This loop ensures that all identified tables are processed and loaded into the Synapse Dedicated SQL Pool.

Conclusion

By following these steps, you can efficiently read data from Microsoft Fabric and load it into an Azure Synapse Dedicated SQL Pool. This process not only centralizes your data but also ensures that it is ready for advanced analytics and reporting.

Feel free to modify the code to suit your specific requirements, and remember to replace placeholders with your actual values.

--

--

No responses yet