Business Intelligence Data Engineering Microsoft Fabric

Fast Copy in Dataflows Gen2: Microsoft Fabric Guide

Fast Copy in Dataflows Gen2: Microsoft Fabric Guide
Microsoft Fabric

Fast Copy in Dataflows Gen2: A Complete Guide to Microsoft Fabric's High-Speed Data Ingestion

⏱️6 min read
👁️Microsoft Fabric · Data Engineering · Business Intelligence
Fast Copy in Dataflows Gen2 Microsoft Fabric — high-speed data ingestion using Copy Activity backend for large-scale data movement into Lakehouse and Warehouse destinations

Fast Copy in Dataflows Gen2 is now Generally Available — bringing the Copy Activity engine's parallel data movement into the Power Query authoring experience for large-scale Fabric ingestion workflows.

Dataflows Gen2 in Microsoft Fabric occupies a specific position in the data engineering toolkit: it provides a Power Query-based, code-light interface for building data ingestion and transformation workflows, making it accessible to analysts and data engineers who prefer a visual experience over writing Spark code or ADF pipeline JSON. Its historical limitation was performance at scale — the Power Query Mashup Engine that powers the data transformation layer is optimised for flexibility and expressiveness, not for high-throughput bulk data movement.Fast Copy in Dataflows Gen2, now Generally Available, directly addresses this limitation by automatically switching to a Copy Activity-based execution engine for large data loads — delivering the performance of Azure Data Factory's Copy Activity within the Dataflows Gen2 authoring experience, without requiring the developer to leave the Power Query canvas.

Why Fast Copy Is a Meaningful GA for Data Engineers

Before Fast Copy reached GA, enterprise data engineering teams on Microsoft Fabric faced a recurring design decision for large ingestion workloads: use Dataflows Gen2 for its familiar Power Query authoring experience and Fabric-native integration, or use Fabric Data Pipelines with a Copy Activity for the performance needed to move tens of gigabytes within acceptable refresh windows. That decision had a real cost — Data Pipelines require a different authoring paradigm (JSON-based pipeline definition), different monitoring (Pipeline run history rather than Dataflow refresh history), and a more complex governance approach for teams that primarily work in the Dataflows environment.

Fast Copy in GA collapses this decision for most large ingestion scenarios. Teams that prefer the Dataflows Gen2 authoring experience can now use it for the same large-scale ingestion workloads that previously required Data Pipelines — with the Fast Copy engine handling the performance-critical movement transparently, and the Power Query transformations applied in the staging layer after the bulk copy completes.

"Fast Copy is the feature that makes Dataflows Gen2 a serious choice for enterprise-scale ingestion. Before GA, the choice between Dataflows and ADF pipelines was partly a performance trade-off. Fast Copy removes that trade-off for the ingestion layer — the transformation capabilities remain Power Query, but the movement capabilities are now Copy Activity."

The Two Engines Behind Dataflows Gen2

Understanding why Fast Copy delivers the performance improvements it does requires understanding that Dataflows Gen2 has always had two execution engine options, not one.

The Power Query Mashup Engine is the standard execution layer — the same engine that powers Power BI Dataflows and Power BI Desktop's Power Query editor. It is highly capable for data transformation: merges, pivots, column derivations, conditional logic, custom functions, and the full range of M-language operations. Its architectural characteristic relevant to performance is that it processes data through a series of transformation steps sequentially, applying each M query transformation to the data before writing the output to the destination. This is well-optimised for moderate data volumes but does not fully exploit parallel I/O for large bulk transfers.

The Copy Activity engine — the same engine that powers Copy Activity in Azure Data Factory and Fabric Data Pipelines — is architecturally designed for bulk data movement. It uses a serverless, parallelised architecture that maximises network bandwidth, data store IOPS, and compute scaling simultaneously. It does not apply Power Query transformations during the copy operation — it moves data from source to staging destination at maximum throughput, then hands off to the SQL DW compute layer for any post-staging transformations. This separation of concerns is what enables the performance improvement: the copy operation is not throttled by transformation processing.

Fast Copy is the mechanism by which Dataflows Gen2 automatically selects the Copy Activity engine for the data movement phase of large ingestion queries, while preserving the Power Query authoring experience for transformation logic.

How Fast Copy Activates: The 100 MB / 5 Million Row Threshold

Fast Copy activates automatically — there is no configuration toggle or explicit opt-in required. When a Dataflows Gen2 query refreshes and the data load exceeds either 100 MB of data volume or 5 million rows, Dataflows automatically switches to the Fast Copy backend for the ingestion phase. Below these thresholds, the standard Power Query Mashup Engine handles the full operation; above them, Fast Copy takes over the movement phase.

This automatic switching has an important implication for transformation logic: Power Query transformations that are evaluated before the threshold check — query folding steps that translate M operations into native source SQL — are executed at the source before Fast Copy runs. Transformations that cannot be folded (Power Query steps that must execute in the Mashup Engine) are staged: the raw data is fast-copied into a staging area within the Fabric destination, and the unfolded transformations are applied using SQL DW compute against the staged data. This two-phase approach — bulk copy to staging, then SQL-based transformation — is the same pattern used by Dataflows Gen2's staging feature, and Fast Copy leverages it to decouple movement performance from transformation complexity.

For data engineering teams, the practical consequence is that Power Query transformations that can fold to the source (filter predicates, column selection, join conditions expressed in native SQL) contribute to Fast Copy performance by reducing the data volume before the copy runs. Transformations that cannot fold are applied post-staging at SQL DW cost, rather than blocking the ingestion operation.

Performance Benchmarks: What 8× Faster Means in Practice

Microsoft's published benchmark for Fast Copy in Dataflows Gen2 showed that loading a 6 GB CSV file from Azure Blob Storage into a Fabric Lakehouse table with Fast Copy enabled produced an approximately 8× improvement in processing time and a 3× reduction in cost compared to the standard Power Query engine path for the same operation.

These figures are directionally accurate for large flat-file ingestion scenarios, where the Copy Activity engine's parallel I/O advantages over the Mashup Engine are most pronounced. The actual improvement varies by source type, destination, network conditions, data schema complexity, and the volume of post-staging transformation required. For structured source systems like Azure SQL Database or Snowflake — where query folding is well-supported and the source database handles filtering efficiently — the performance difference may be smaller because the Mashup Engine can offload much of the work to the source. For flat-file sources (CSV, Parquet, ADLS) where no folding is possible, the Copy Activity engine's parallel I/O advantage is largest.

The cost reduction is a consequence of shorter compute time: Fabric Capacity Units are consumed for the duration of the Dataflow refresh, and an 8× reduction in processing time for the ingestion phase translates directly into fewer CU-seconds consumed for that phase of the refresh.

Supported Connectors and On-Premises Gateway

Fast Copy is available for a specific list of connectors that have been validated for Copy Activity backend compatibility. The supported connectors at GA include ADLS Gen2, Azure Blob Storage, Azure SQL Database, On-Premises SQL Server, Oracle, Fabric Lakehouse, Fabric Warehouse, PostgreSQL, and Snowflake. The connector list will expand in subsequent Fabric updates as additional source types are validated for Fast Copy support.

The on-premises data gateway is supported for Fast Copy operations against on-premises sources — specifically On-Premises SQL Server and Oracle. This is a significant capability for enterprise organisations that operate hybrid data environments: large on-premises database extracts, which have historically been among the most time-consuming Dataflow refresh operations due to gateway throughput constraints, benefit from the Copy Activity engine's optimised data movement even when routing through the on-premises gateway.

Connectors That Do Not Yet Support Fast Copy

For connectors not on the supported list — REST APIs, SharePoint, Salesforce, SAP OData, and many other Power Query connectors — Dataflows Gen2 continues to use the standard Power Query Mashup Engine regardless of data volume. For large-scale ingestion from these sources, Data Pipelines with a Copy Activity (where the connector is supported) or a custom extraction pattern remains the recommended approach. Checking the Refresh History for the Fast Copy indicator (described in the next section) is the simplest way to confirm whether Fast Copy is active for a specific source configuration.

Verifying Fast Copy Is Active in Refresh History

After a Dataflows Gen2 refresh completes, the Refresh History shows the status of each output entity in the dataflow. When Fast Copy was used for a specific entity, the entity status in the Refresh History includes a Fast Copy indicator — a visual flag that confirms the Copy Activity backend was used for that entity's ingestion, rather than the standard Mashup Engine path.

This indicator is the primary diagnostic tool for confirming Fast Copy is operating. If a large-volume entity does not show the Fast Copy indicator after a refresh, it indicates one of three conditions: the connector does not support Fast Copy, the data volume did not exceed the 100 MB / 5 million row threshold, or a pre-staging Power Query transformation that cannot fold to the source is preventing Fast Copy activation. In the last case, reviewing the query for unfoldable steps and either removing them or moving them to the post-staging SQL transformation layer may restore Fast Copy eligibility.

Fast Copy in Dataflows Gen2 vs ADF Copy Activity: When to Use Which

Fast Copy narrows the performance gap between Dataflows Gen2 and ADF Data Pipelines for ingestion workloads, but it does not eliminate all differences between the two approaches. Understanding the remaining distinctions guides the right tool selection for specific scenarios.

Use Dataflows Gen2 with Fast Copy when the data source is on the supported connector list, the transformation logic is expressible in Power Query (including staging-based SQL transformations), the target audience for the dataflow authoring and maintenance is a team comfortable with Power Query, and the operational monitoring model preferred is the Dataflow refresh history rather than a Data Pipeline run monitor. Fast Copy makes Dataflows Gen2 the right choice for large ingestion workloads from supported sources that previously required a pipeline due to performance constraints.

Use Fabric Data Pipelines with Copy Activity when the source connector is not supported by Fast Copy in Dataflows, when the ingestion workflow requires orchestration logic beyond what Dataflows supports (conditional branching, parallel execution of multiple activities, failure retry patterns), when the ingestion must be triggered by an event rather than a schedule, or when the team's primary data engineering tooling is pipeline-oriented rather than Power Query-based.

Enterprise Data Engineering Patterns That Benefit Most

Three specific enterprise data engineering patterns see the highest impact from Fast Copy reaching GA status.

Daily ERP extract patterns. Organisations extracting large transaction tables from Azure SQL Database, On-Premises SQL Server, or Oracle ERP systems into Fabric Lakehouses — a common Bronze layer ingestion pattern — see direct refresh time reductions for their largest extract queries. Extracts that previously exceeded acceptable refresh windows can now complete within them, removing the need to split the extraction into multiple smaller dataflows or migrate to a Data Pipeline.

Large flat-file ingestion from data lake sources. ADLS Gen2 and Azure Blob Storage sources containing large CSV, Parquet, or JSON files — common staging patterns in organisations that receive data from third-party providers — benefit most from Fast Copy's parallel I/O optimisation. The performance improvement for flat-file sources without query folding is the largest, and the cost savings are directly proportional to the reduction in processing time.

Snowflake and cross-cloud ingestion. Organisations integrating Snowflake data into Fabric Lakehouses for analytics — a common pattern for organisations that standardised on Snowflake for their cloud data warehouse before adopting Fabric — can use Fast Copy in Dataflows Gen2 rather than Snowflake's own data sharing or a Data Pipeline Copy Activity, simplifying the integration into a single Dataflows-based pattern.

Ingestion Method Comparison

Method Authoring Experience Large-Scale Performance Transformation Flexibility Best For
Dataflows Gen2 with Fast Copy Power Query (visual, M language) High — Copy Activity engine for large loads Full Power Query + post-staging SQL Large ingestion from supported sources; Power Query teams
Dataflows Gen2 (standard) Power Query (visual, M language) Medium — Mashup Engine only Full Power Query Small–medium volumes; all connector types
Fabric Data Pipeline (Copy Activity) JSON pipeline definition High — Copy Activity engine always Limited — copy and basic mapping only Event-triggered; orchestration complexity; unsupported connectors
Fabric Spark Notebook PySpark / Scala / SQL (code) Very High — distributed compute Maximum — full Spark transformation capability Complex transformations; very large volumes; data science integration

Next Steps for Fabric Data Engineering Teams

For data engineering teams that have been using Fabric Data Pipelines specifically because Dataflows Gen2 was too slow for their large-volume ingestion workloads, the GA of Fast Copy in Dataflows Gen2 is worth a direct re-evaluation. The practical test is straightforward: migrate one high-volume ingestion pipeline to a Dataflows Gen2 equivalent, confirm the Fast Copy indicator in Refresh History after the first full refresh, and compare the refresh duration and CU consumption against the Pipeline run.

For teams already using Dataflows Gen2 across all ingestion workloads, no action is required — Fast Copy is enabled automatically for all new dataflows and activates transparently for qualifying loads. Reviewing the Refresh History for existing large-volume dataflows to confirm the Fast Copy indicator is active is the only recommended validation step.

For organisations designing their Fabric data engineering architecture and weighing tool choices for ingestion — Dataflows Gen2, Data Pipelines, Spark Notebooks, or a combination — the comparison table above provides the framework for matching each tool to its appropriate workload. If your team needs guidance on designing a Fabric ingestion architecture that right-sizes the tooling for different source types and volume tiers, speak with a certified Microsoft Fabric data engineer at Numlytics. We work with data engineering teams across the US, UK, Australia, and UAE to design Fabric architectures that are efficient, maintainable, and proportionate to the operational requirements of each data source. For the capacity context relevant to Dataflows Gen2 performance, see our posts on limiting capacity utilisation in Microsoft Fabric and Fabric capacity overage.