Power BI Data Analytics Data Engineering

Unzip Files Power Query: Step-by-Step Guide

Unzip Files Power Query: Step-by-Step Guide
Power BI

How to Unzip Files in Power Query: A Step-by-Step Guide for Data Teams

⏱️ 6 min read
Power Query custom M function to unzip ZIP files in Power BI and Excel — step-by-step guide for data teams

Unzip files directly inside Power Query using a custom M function — no external tools, no scripting environments required

ZIP files are everywhere in enterprise data workflows. Vendors drop compressed archives into SFTP folders. Finance teams email zipped CSV exports. ERP systems produce bundled reports that need extraction before ingestion. For most organisations, unzipping those files means a separate preprocessing step — a Python script, a scheduled task, or worse, a manual intervention before the data pipeline can proceed.

What many Power BI and Excel teams don't realise is that Power Query can unzip files natively, using a custom M function that reads the ZIP binary directly and decompresses each file inside it. No external tools. No additional infrastructure. No dependency on the machine running the refresh. This technique makes it possible to unzip files in Power Query as part of a self-contained data transformation workflow — which is particularly valuable for teams operating in locked-down enterprise environments where installing additional software is restricted.

Why Unzipping Inside Power Query Matters for Enterprise Data Teams

The business case is straightforward: every preprocessing step that sits outside your Power Query pipeline is a point of failure, a maintenance obligation, and a dependency that can break without warning. A Python script that unzips files before Power BI refresh works — until the Python environment changes, the scheduled task fails silently, or a new team member doesn't know the dependency exists.

Keeping the extraction logic inside Power Query eliminates that class of failure entirely. The transformation logic lives in the same place as everything else, versioned and visible in Power Query Editor. When the data source changes, the analyst who maintains the report can see and update the full pipeline without needing to track down a separate script or server-side process.

"Every preprocessing step outside Power Query is a hidden dependency. Consolidating ZIP extraction inside the M query eliminates an entire category of pipeline failure — and keeps your transformation logic in one place, visible and maintainable."

For organisations running Power BI in gateway-connected environments or using Power BI consulting services to build enterprise-grade reporting infrastructure, self-contained query logic also simplifies gateway configuration. There is no pre-extraction script to schedule or maintain on the gateway machine — the query handles everything end-to-end.

How the Power Query ZIP Extraction Function Works

ZIP files follow a well-defined binary specification. Every archive contains a Central Directory — a metadata block at the end of the file that lists every compressed entry, including file names, sizes, compression methods, and the byte offset where each file's data begins. The custom M function exploits this structure directly using Power Query's binary format primitives.

Reading the End of Central Directory Record

The function begins by locating the End of Central Directory (EOCD) record, a 22-byte structure at the very end of the ZIP file. This record tells the function where the Central Directory begins and how many file entries it contains. Power Query's BinaryFormat library provides the tools to parse this precisely — reading unsigned 16-bit and 32-bit integers in little-endian byte order, which is how the ZIP specification defines them.

Iterating the Central Directory Entries

With the Central Directory located, the function iterates each entry to collect file metadata: the file name, the compressed size, and critically, the byte offset of the local file header within the archive. This offset is used in the next step to jump directly to each compressed file's data without scanning the entire binary stream.

Decompressing with DEFLATE

ZIP files typically use the DEFLATE compression algorithm. Power Query's native Binary.Decompress function supports DEFLATE directly via Compression.Deflate, which means no custom decompression logic is required. The function reads each file's compressed binary payload from the correct offset and decompresses it in a single step, returning the original file content as a binary value that downstream query steps can consume normally.

The Complete Custom M Function to Unzip Files in Power Query

The function below accepts a ZIP file loaded as a binary value and returns a list of records — one per file inside the archive — each containing the file name and its decompressed content. Load this into a blank query in Power Query Editor via the Advanced Editor.

Power Query M — Custom Unzip Function
(ZIPFile) =>
let
    ushort = BinaryFormat.ByteOrder(
    BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    uint   = BinaryFormat.ByteOrder(
    BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),

    // Step 1: Parse End of Central Directory record (last 22 bytes)
    EDOCfn = BinaryFormat.Record([
        ZipContent = BinaryFormat.Binary(
        Binary.Length(ZIPFile) - 22),
        Magic = uint,
        DiskNum = ushort,
        CDirectoryDiskId = ushort,
        CDirectoryRecordCountOnDisk = ushort,
        CDirectoryRecordCount = ushort,
        SizeOfCentralDirectory = uint,
        CentralDirectoryOffset = uint,
        CommendLength = ushort
    ]),
    EDOC = EDOCfn(ZIPFile),

    // Step 2: Parse Central Directory entries
    BeforeCentralDirectory = BinaryFormat.Binary(
    EDOC[CentralDirectoryOffset]),
    CentralDirectory = BinaryFormat.Length(
    BinaryFormat.Record([
    ZipContent = BeforeCentralDirectory,
    Items = BinaryFormat.List(
    BinaryFormat.Record([
    Magic = uint,
    CurrentVersion = ushort,
    MinVersion = ushort,
    Flags = ushort,
    CompressionMethod = ushort,
    FileModificationTime = ushort,
    FileModificationDate = ushort,
    CRC32 = uint,
    BinarySize = uint,
    FileSize = uint,
    FileInfo = BinaryFormat.Choice(
    BinaryFormat.Record([
        Len = ushort,
        FieldsLen = ushort,
        FileCommentLength = ushort,
        Disk = ushort,
        InternalFileAttr = ushort,
        ExternalAttr = uint,
        PosOfFileHeader = uint
        ]),
        (fileInfo) => BinaryFormat.Record([
        FileName     = BinaryFormat.Text(
                       fileInfo[Len],
                       TextEncoding.Ascii),
                       Fields = BinaryFormat.Binary(
                       fileInfo[FieldsLen]),
                       FileComment = BinaryFormat.Text(
                       fileInfo[FileCommentLength],
                       TextEncoding.Ascii),
                       Disk = BinaryFormat.Transform(
                       BinaryFormat.Null,
                       each fileInfo[Disk]),
                       InternalFileAttr = BinaryFormat.Transform(
                       BinaryFormat.Null,
                       each fileInfo[Disk]),
                       ExternalAttr     = BinaryFormat.Transform(
                       BinaryFormat.Null,
                       each fileInfo[InternalFileAttr]),
                       PosOfFileHeader  = BinaryFormat.Transform(
                       BinaryFormat.Null,
                       each fileInfo[PosOfFileHeader])
                        ])
                    )
                ]),
                EDOC[CDirectoryRecordCount]
            )
        ]),
        EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]
    ),

    // Step 3: Decompress each file using DEFLATE
    Contents = List.Transform(
        CentralDirectory(ZIPFile)[Items],
        (cdEntry) =>
            let
                ZipEntry = BinaryFormat.Record([
                    PreviousData = BinaryFormat.Binary(
                    cdEntry[FileInfo][PosOfFileHeader]),
                    Magic = uint,
                    ZipVersion = ushort,
                    ZipFlags = ushort,
                    CompressionMethod = ushort,
                    FileModificationTime = ushort,
                    FileModificationDate = ushort,
                    CRC32 = uint,
                    BinarySize = uint,
                    FileSize = uint,
                    FileName = BinaryFormat.Choice(
                    BinaryFormat.Record([
                    Len = ushort,
                    FieldsLen = ushort
                        ]),
                        (fileInfo) => BinaryFormat.Record([
                            FileName = BinaryFormat.Text(
                            fileInfo[Len],
                            TextEncoding.Ascii),
                            Fields   = BinaryFormat.Binary(
                            fileInfo[FieldsLen])
                        ])
                    ),
                    FileContent = BinaryFormat.Transform(
                    BinaryFormat.Binary(cdEntry[BinarySize]),
                    each Binary.Decompress(_, Compression.Deflate)
                    )
                ])(ZIPFile)
                in
                [ FileName = ZipEntry[FileName][FileName],
                  Content  = ZipEntry[FileContent] ]
    )
in
    Contents

How to Use the Function: Step-by-Step

With the function loaded into Power Query, invoking it against a ZIP file is a straightforward four-step process. The example below uses a local file path, but the same pattern applies to files sourced from SharePoint, Azure Blob Storage, or any binary-returning connector.

  1. Open Power Query Editor in Power BI Desktop or Excel, then navigate to Home → New Source → Blank Query.
  2. Open the Advanced Editor (View → Advanced Editor) and paste the full function code above. Name the query fnUnzipFile and click Done.
  3. Create a new blank query to invoke the function. Use Binary.Buffer(File.Contents("C:\Path\To\File.zip")) to load the ZIP binary into memory, then pass it to fnUnzipFile.
  4. Expand the results — the function returns a list of records with FileName and Content fields. Convert to a table, then expand or filter to the specific file you need, and apply further transformations as normal.

The complete invocation query for a local file looks like this:

Power Query M — Invoking the Unzip Function
let
    ZIPFile = Binary.Buffer
    (File.Contents("C:\Path\To\Your\File.zip")),
    UnzippedContents = fnUnzipFile(ZIPFile)
    in
    UnzippedContents

Binary.Buffer loads the entire ZIP into memory before processing, which significantly improves performance for larger archives by preventing repeated reads from disk or network. Replace the file path with a dynamic reference — such as a SharePoint document library URL or an Azure Blob Storage binary — to make the query fully source-agnostic.

When to Use Power Query vs Other ZIP Extraction Methods

The Power Query approach is not universally the right choice. Understanding where it fits — and where it does not — helps data leaders make the right architectural decision for each scenario.

Scenario Power Query M Function Python / Script Azure Data Factory
Self-service BI report refresh ✅deal — fully self-contained ⚠️ External dependency ⚠️ Over-engineered for simple cases
Scheduled gateway refresh ✅ No gateway-side scripts needed ❌ Script must run on gateway machine ✅ Managed separately from Power BI
Very large ZIP files (>500 MB) ⚠️ Memory-constrained; may time out ✅ Handles large files efficiently ✅ Built for large-scale ingestion
Multi-file archives (>50 files) ⚠️ Works but filtering logic needed ✅ Full control over extraction ✅ Purpose-built for batch processing
Locked-down enterprise environment ✅ No additional software required ❌ Requires Python install & permissions ⚠️ Requires Azure subscription access
Low-code / analyst-owned pipeline ✅ Maintainable by Power Query users ❌ Requires developer involvement ❌ Requires platform engineering skills

Enterprise Considerations: Governance, Performance, and Scale

Before deploying this technique in a production Power BI environment, data leaders should account for three operational factors that affect how it performs at enterprise scale.

Memory and Refresh Time

The Binary.Buffer call loads the entire ZIP file into the Power Query engine's memory before decompression begins. For archives under 100 MB, this is typically inconsequential. For larger files — particularly those sourced from network locations — memory pressure can slow refresh times or cause failures in environments with constrained Power BI gateway resources. In those cases, consider moving the extraction step to a dedicated data engineering pipeline using Azure Data Factory or Microsoft Fabric Data Flows, and delivering pre-extracted files to Power Query.

Handling Non-DEFLATE Compression

The function as written assumes DEFLATE compression, which covers the vast majority of real-world ZIP files. However, some ZIP archives use STORE (no compression) or older methods like SHRINK or IMPLODE. Files using STORE compression do not need decompression and will produce an error with the current function. A production-hardened version should inspect the CompressionMethod field from the Central Directory entry and branch accordingly — returning the raw binary directly for STORE-compressed entries.

Source Control and Reuse

Store the fnUnzipFile function as a shared Power Query template that your team imports into new reports. If your organisation uses TMDL-based Power BI project files, you can version-control the query alongside your semantic model definitions — ensuring that the extraction logic is auditable and consistently applied across reports. This approach also makes it straightforward to update the function centrally when requirements change, rather than hunting down individual reports that each contain their own copy.

Key Takeaways
  • Power Query can unzip files natively using a custom M function — no external tools, scripts, or preprocessing steps required.
  • The function reads the ZIP binary's Central Directory structure and decompresses each file using Power Query's built-in
  • This approach is best suited to self-service BI pipelines, gateway-connected refreshes, and locked-down enterprise environments where installing additional software is restricted.
  • For archives larger than ~100 MB or containing many files, consider Azure Data Factory or Microsoft Fabric Data Flows as a more scalable extraction layer.
  • Wrap the function in a shared query template and version-control it alongside your semantic model to ensure consistent, auditable use across your Power BI estate.

Next Steps: Streamlining Your Data Pipelines

The ability to unzip files in Power Query is one of many techniques that separate a basic Power BI implementation from a genuinely production-grade analytics pipeline. Teams that invest in native M capabilities — rather than patching gaps with external scripts and manual preprocessing — build reporting infrastructure that is more reliable, easier to maintain, and less likely to fail between scheduled refreshes.

If your organisation is working to reduce pipeline complexity, improve refresh reliability, or build out a Power BI estate that your team can maintain confidently without constant developer involvement, Numlytics can help. Our Power BI consulting practice covers everything from query optimisation and semantic model architecture to full-scale ETL pipeline development on Microsoft Fabric. Speak with a certified Power BI consultant to discuss what a structured improvement programme would look like for your team.