5 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article details how to ingest, parse, and query Excel files using Azure Databricks. It explains features like schema inference, reading specific sheets, and using Auto Loader for streaming. Key limitations include unsupported password protection and merged cells.
If you do, here's more
Access to the page requires authorization, and the feature discussed is currently in Beta. Workspace admins can manage access to it via the Previews page in Azure Databricks. The article details how users can ingest, parse, and query Excel files using Databricks SQL and Spark APIs, which simplifies working with Excel by automatically inferring schema and data types. This eliminates the need for external libraries or manual conversions. Users can upload files directly through the UI or access them from cloud storage.
Key features include the ability to read both .xls and .xlsx files, access specific sheets within multi-sheet files, and specify cell ranges. The ingestion process can handle evaluated formulas, and users can leverage Auto Loader for structured streaming of Excel files. To create or modify tables from Excel files, users can utilize a straightforward UI where they upload a file, select a sheet, and set header rows, if needed. Querying these files can be done using Spark's batch and streaming APIs, with options to infer schemas or set them manually.
The article highlights several parsing options, such as `dataAddress` for specifying cell ranges and `headerRows` for defining header rows in the data. For those dealing with complex layouts in Excel, specific cell ranges can be extracted to form Spark DataFrames. Limitations include the inability to handle password-protected files and merged cell values, which only populate the top-left cell. Users can also list sheets in a file but are limited to reading one sheet at a time.
Questions about this article
No questions yet.