Business Intelligence AI & Machine Learning Cloud Data Platforms Data Engineering

Microsoft Fabric Throttling, Bursting & Smoothing Explained

Microsoft Fabric Throttling, Bursting & Smoothing Explained
Microsoft Fabric

Microsoft Fabric Throttling, Bursting, and Smoothing Explained

⏱️6 min read
👁️Microsoft Fabric · Cloud Data Platforms · Data Engineering
Microsoft Fabric throttling bursting and smoothing explained — CU capacity model, how Fabric bursts beyond SKU ceiling and smooths over 24 hours, throttling cascade for interactive and scheduled workloads

Microsoft Fabric's bursting and smoothing model — why an F2 can handle workloads that look like they need an F8, what happens when the smoothing window runs out, and how to plan capacity around these mechanics.

When organisations first encounter Microsoft Fabric throttling, the instinctive response is to size up — purchase the next SKU tier to get more headroom. That response is often more expensive than necessary, because it ignores two mechanisms built into Fabric's capacity model specifically to make peak workloads survivable without a SKU upgrade: bursting and smoothing. Understanding how these two mechanisms interact with each other — and with the different treatment of interactive versus scheduled workloads — is the difference between capacity planning based on peak demand and capacity planning based on average demand. For most real-world Fabric workloads, the difference is a significant SKU cost reduction.

How Fabric's Capacity Unit Model Actually Works

Microsoft Fabric capacities are sold in SKU tiers — F2, F4, F8, F16, F32, F64, and above — where each tier provides a defined allocation of Capacity Units (CUs) per second. An F2 provides 2 CUs/second; an F4 provides 4 CUs/second; an F8 provides 8 CUs/second, and so on, doubling with each tier.

The key insight into the CU model is that it is a time-based budget, not a hard per-second ceiling in the way a CPU core count is a hard ceiling. Every second that a workload runs below the SKU's CU allocation, the unused CUs accumulate as future-available capacity in a smoothing pool. Every second that a workload exceeds the SKU's CU allocation, it draws from that smoothing pool rather than immediately throttling. The smoothing pool has a bounded depth — it accumulates over 24 hours — and once it is depleted, throttling begins. This is the architectural reason why Fabric throttling is less disruptive than most teams expect from a per-second capacity model.

"Fabric's capacity model is not a per-second speed limit — it is a daily energy budget. You can sprint far above the posted limit, as long as the total energy used across the day stays within the total budget your SKU provides."

What Throttling Is and When It Occurs

Throttling in Microsoft Fabric occurs when the smoothed CU consumption rate exceeds the SKU's allocation — not at the moment of peak consumption, but after the smoothing mechanism's look-ahead window is exhausted. Fabric does not throttle the moment a workload exceeds the per-second CU limit; it throttles when the 10-minute or 24-hour forward-looking consumption window is projected to exceed the available allocation.

This distinction is important for understanding when throttling becomes visible to users. A single spike — a large pipeline run that consumes 5× the F2 CU limit for 10 minutes — may not cause any visible throttling if the rest of the day's workload is light enough that the 24-hour total stays within budget. Throttling becomes a practical operational problem when the workload profile is high-consumption for sustained periods that exhaust the daily smoothing budget.

Bursting: Borrowing from a Higher SKU Without Paying for One

Bursting is Fabric's mechanism for allowing a workload to consume more CUs per second than the purchased SKU's per-second allocation — without immediately throttling and without paying for a higher SKU tier. When a workload exceeds the F2's 2 CUs/second limit and consumes, say, 10 CUs/second for a period, Fabric allows this burst to execute as long as the daily CU budget is not yet exhausted.

Bursting is not free compute — it is time-shifted compute. The extra CUs consumed during the burst are "borrowed" from the capacity's future allocation within the same 24-hour window. Fabric's model does not provide extra CUs beyond the SKU's daily total; it provides flexibility in when within the day those CUs are consumed. A burst that consumes 5 hours' worth of CU allocation in 30 minutes is supported — but it means the remaining 23.5 hours of the day have proportionally less CU budget available.

Smoothing: How Fabric Repays the Borrowed Compute

Smoothing is the accounting mechanism that tracks bursting. When a workload bursts above the per-second CU allocation, Fabric records the excess CUs consumed and applies them against the forward smoothing window. The smoothing window is different for different workload types — which is one of the most practically important aspects of Fabric's capacity model.

For interactive workloads(ad hoc user queries, report loads, on-demand Notebook runs), excess CUs are smoothed over the next 5 minutes. For background workloads(scheduled pipelines, dataset refreshes, Dataflows Gen2 scheduled runs), excess CUs are smoothed over the next 24 hours. This asymmetry is intentional: interactive workloads are treated with a short smoothing window because they represent direct user experience impact; scheduled background workloads are given the full 24-hour smoothing window because they are designed to run outside peak hours and can tolerate a longer budget horizon.

The F2 Worked Example: 10 CUs for an Hour Without Upgrading

The following example — preserved from the original post and expanded — shows how an F2 capacity can support a workload that appears to require an F8, because the average daily consumption stays within the F2's daily budget.

Setup: An organisation purchases an F2 capacity, which provides 2 CUs/second. The typical workload runs at an average of 1.5 CUs/second throughout the day. Once per day, a full data load pipeline runs for 60 minutes at 10 CUs/second.

F2 Capacity — Daily CU Budget Calculation
F2 daily budget = 2 CUs × 86,400 seconds = 172,800 CUs/day

Average workload (23 hours) = 1.5 CUs × 82,800 sec = 124,200 CUs
Peak burst (1 hour at 10 CUs) = 10 CUs × 3,600 sec = 36,000 CUs

Total CUs used = 124,200 + 36,000 = 160,200 CUs

Remaining budget = 172,800 − 160,200 = 12,600 CUs (7.3% headroom)

The key result: the 10 CU burst ran for a full hour on an F2 capacity — five times the F2's per-second allocation — without triggering sustained throttling, because the average workload across the remaining 23 hours was low enough that the total daily CU consumption stayed within the F2's daily budget. An organisation that sized its Fabric SKU based on the 10 CU peak would have purchased an F8 or F16. The smoothing model made the F2 sufficient for the actual workload profile.

Interactive vs Scheduled Workloads: Different Smoothing Windows

The smoothing window asymmetry between interactive and scheduled workloads has a direct and important implication for how Fabric handles mixed workload environments — which describes virtually every enterprise Fabric deployment, where scheduled pipelines and background refreshes run alongside user-facing Power BI reports and ad hoc Notebook queries.

For interactive workloads, the 5-minute smoothing window means that Fabric is effectively looking at the next 5 minutes of projected consumption when deciding whether to throttle an interactive query. A burst that is heavy but brief — a complex DAX query that spikes CU consumption for 30 seconds — is unlikely to exhaust the 5-minute smoothing window and is unlikely to be throttled. A sustained period of high interactive query load that fills the 5-minute forward window will trigger throttling on new interactive requests.

For scheduled background workloads, the 24-hour smoothing window means Fabric is looking at projected consumption over the full day. A scheduled pipeline that consumes a large burst of CUs will have its consumption averaged across 24 hours in the smoothing accounting — the same pipeline that causes a momentary spike in the per-second metric is smoothed to a small per-second average over 24 hours, making it far less likely to trigger throttling than a naive per-second view would suggest.

The Three-Tier Throttling Cascade

When the smoothing window's projected consumption is full, Microsoft Fabric throttling applies in a defined three-tier cascade that escalates from delay to rejection based on how far into the forward window the capacity is projected to be exhausted.

Throttling Tier Trigger Condition Effect on Interactive Workloads Effect on Scheduled Workloads
Tier 1—Delay Next 10-minute forward window is full Ad hoc interactive queries are queued and delayed Scheduled workloads continue unaffected
Tier 2—Rejection Next 60-minute forward window is full Ad hoc interactive queries are rejected with an error Scheduled workloads continue unaffected
Tier 3 — Scheduled rejection Full 24-hour forward window is full All queries rejected Scheduled workloads are rejected and must be retried

The cascade design reflects the priority hierarchy Fabric applies to different workload types. Interactive workloads — which directly affect user experience — are the first to be delayed or rejected when capacity is constrained, because delaying a user's query is less disruptive than failing a scheduled pipeline mid-execution. Scheduled workloads are protected until Tier 3, which requires the full 24-hour window to be projected as fully consumed — a condition that indicates genuinely sustained overuse, not a temporary spike.

What This Means for Capacity Planning

The practical implication of bursting and smoothing for Fabric capacity planning is that the right metric for SKU sizing is average daily CU consumption, not peak per-second CU consumption. A workload with a 10 CU peak but a 1.5 CU average does not need a 10-CU SKU — it needs a SKU whose daily budget can accommodate the total CU consumption of the full 24-hour workload profile, with sufficient headroom to absorb variability.

The Fabric Capacity Metrics App is the primary tool for measuring actual daily CU consumption patterns — not the per-second peak, but the rolling 24-hour average that the smoothing model actually uses to determine throttling risk. An organisation that has been experiencing occasional throttling events should review the Metrics App to understand whether the issue is peak per-second consumption (manageable through workload scheduling) or sustained daily over-budget consumption (which requires a SKU upgrade or workload reduction).

Bursting and Smoothing vs Fabric Capacity Overage

Bursting and smoothing are built into every Fabric SKU and operate automatically without configuration — they are the standard capacity model, not an optional feature. Fabric Capacity Overage (covered in our post on Microsoft Fabric Capacity Overage) is a separate opt-in mechanism that extends beyond what bursting and smoothing provide: it allows consumption beyond the daily CU budget by drawing additional CUs at a pay-as-you-go rate, rather than time-shifting within the existing daily budget.

The relationship between the two is sequential: bursting and smoothing operate first, allowing peak consumption within the daily budget. When the daily budget itself is exhausted — which bursting and smoothing cannot prevent, only defer — overage (if enabled) allows continued operation at additional cost. Understanding this sequence helps clarify why bursting and smoothing reduce throttling risk for most workloads, and why overage is the mechanism for the rare cases where the daily budget is genuinely insufficient.

Next Steps for Fabric Capacity Management

For Fabric capacity administrators who are experiencing throttling events, the diagnostic starting point is the Fabric Capacity Metrics App — specifically the smoothed CU consumption view that shows whether throttling is occurring because of peak per-second consumption being smoothed into a short forward window, or because of genuine sustained daily over-budget consumption. The appropriate response is different in each case: peak-spike throttling is addressed through workload scheduling (moving large pipeline runs to off-peak hours) or workload optimisation; daily budget exhaustion requires either a SKU upgrade, capacity overage enablement, or workload reduction.

For teams doing initial Fabric SKU sizing, the bursting and smoothing model means the correct sizing question is: what is the expected average daily CU consumption across all workloads, and does the target SKU's daily CU budget cover that with adequate headroom for variability? Sizing for the peak second instead of the average day typically results in overspending on Fabric capacity that sits idle most of the time. For further guidance on Fabric capacity management, see our posts on Microsoft Fabric Capacity Overage and limiting capacity utilisation in Microsoft Fabric. If your organisation needs help right-sizing its Fabric capacity based on actual workload measurement, speak with a certified Microsoft Fabric consultant at Numlytics.