RPCsLive·updated 1m ago

Most accurate gas oracle, live gap vs realized priority fee

Absolute gap in gwei between each oracle's predicted priority-fee tier and the realized percentile in the next mined block, measured per chain. Ranked on the p99 gap, the worst 1% of blocks, because that is where gas spikes hurt integrations.

Read this carefully

Lower gap is NOT the same as "best oracle". Inclusion-confidence oracles (Etherscan) over-predict by design to guarantee inclusion; percentile trackers (PublicNode, Owlracle) hug the realized number by construction. The table therefore ranks on the p99 gap (typical p50 gaps are micro-gwei noise) and pairs it with the covered column: share of time the prediction sat at or above the realized p50, the inclusion-side risk a gap alone cannot show.

This benchmark answers the question wallets, swap routers and bridge UIs all face before sending a transaction. which gas oracle actually matches what the next block will charge, on the chain my product runs on. Marketing pages quote "fast / standard / slow" without ever publishing the gap between prediction and reality. We normalize each oracle's tiers onto a unified p25 / p50 / p75 / p90 / p99 scheme, take the predicted priority fee in gwei, then compare it against the realized percentile computed directly from the actual transactions in the next mined block. The headline is the p99 of the absolute error (|predicted, realized|, in gwei) over 24h, the worst 1% of blocks, because typical-minute errors are micro-gwei noise while gas spikes are where predictions actually diverge. A covered-rate column shows the share of time each prediction sat at or above the realized p50 (the inclusion-side risk an absolute gap cannot show). Coverage. Ethereum mainnet (PublicNode feeHistory + Owlracle + Etherscan v2) and Polygon (same three). Use the chain tab above to slice the leaderboard. Avalanche C-Chain was dropped because its auto-tuning fee market drives the priority fee to ~0 by design, which collapses prediction-error to ~0 across all oracles and makes the bench non-informative. What this bench does NOT capture. (a) measured inclusion latency (the covered-rate column is a proxy for under-bidding risk, not an observed wait time); (b) over-pay cost in USD (depends on gas_used and ETH price); (c) fundamental oracle reliability outside the priority-fee dimension. Read this number as "how close to the realized percentile, in gwei", not as "which oracle should I integrate end-to-end".

Methodology

We measure how accurately each gas oracle predicts the priority fee a transaction must pay to land in the next block, on two EIP-1559 chains (Ethereum, Polygon). Every oracle is polled at its tier-tolerant cadence and its predicted priority fee is buffered with the predicted block height PER chain. When that block is mined, the harness pulls the full block via `eth_getBlockByNumber(.., true)` on the chain's PublicNode RPC, computes the realized priority percentile from the actual `maxPriorityFeePerGas` values across all included transactions, and records the absolute error per (oracle, tier, chain) as both a gauge and a histogram. p50 / p90 / p99 are computed via Prometheus `quantile_over_time` over the 24 h window. Per-chain coverage: Ethereum and Polygon both have all three oracles (Etherscan v2 free tier covers chainid 1 and 137). The Etherscan call goes through a global rate-gate (≥6s between any two Etherscan requests across chains) because the no-key limit is 1 req/5s per IP shared. Pending-buffer growth surfaces per (oracle, chain) so a temporarily backlogged realizer on one chain doesn't silently inflate that chain's scores. Two chains are deliberately excluded. BNB Chain (not fully EIP-1559, no dynamic base fee). Avalanche C-Chain (auto-tuning fee market drives the priority fee to ~0, collapsing prediction-error to ~0 across all oracles and making the bench non-informative).

Frequently asked

What does this benchmark actually measure?

The absolute difference, in gwei, between each gas oracle's predicted p50 priority fee and the realized p50 priority fee in the next mined block, ranked on the p99 of that gap over 24h (the worst 1% of blocks, where gas spikes actually cost money). Owlracle currently leads the active chain tab at 0.109 gwei. The covered column adds the inclusion-side view: the share of time the prediction sat at or above the realized p50. What it does NOT measure. USD over-pay cost (depends on gas_used × ETH price) or fundamental oracle reliability beyond priority-fee accuracy.

Why rank on the p99 gap instead of the typical p50 gap?

Because at current fee levels the p50 gaps are fractions of a micro-gwei apart across oracles, on a 100k-gas transaction a 0.001 gwei error is about a thousandth of a cent, so a p50 ranking orders economically indistinguishable noise. The p99 gap captures the volatile minutes (mempool spikes, NFT mints, MEV bursts) where predictions genuinely diverge and a wrong number either overpays or misses the block. The typical p50 and p90 gaps remain visible as secondary columns.

What is the covered rate column?

The share of time over 24h that the oracle's posted p50-tier prediction was at or above the realized p50 priority fee of the latest mined block. Absolute gap treats over-prediction and under-prediction the same, but they are not symmetric for users: over-predicting means slightly overpaying, under-predicting means a transaction paying exactly the predicted fee would have ranked below the block's median and risks waiting. Inclusion-confidence oracles (Etherscan) should score high here by design; percentile trackers sit near 50% by construction. Read gap and covered together: low p99 gap + high covered is the sweet spot.

Why these specific chains (Ethereum, Polygon)?

Both are EIP-1559 with a proper dynamic base fee, which makes 'priority-fee prediction error' an apples-to-apples comparable metric across them. Avalanche C-Chain was dropped because its auto-tuning fee market collapses the priority fee to ~0 by design, which makes the prediction-error metric uniformly ~0 across all oracles and non-informative. BNB Chain is excluded because it isn't EIP-1559 (effective fee = gasPrice only, priority is structural noise). Optimism / Base / Arbitrum and other L2 OP Stack rollups are excluded because their priority fee is ~0 (centralised sequencer, no MEV) and the relevant cost is L1 data fee, a different metric that belongs in a separate bench. Solana uses lamports per CU with Jito MEV, different fee model entirely.

Which oracle should I pick for my wallet?

Read the p99 gap (does it blow out during spikes?) together with the covered rate (does it err on the side of inclusion or under-bid?) for the chain your product runs on. Owlracle currently leads the active tab at 0.109 gwei (p99) but the right oracle depends on whether you can tolerate over-pay (then prefer high covered), whether you optimise for tail behaviour, and whether the oracle's free quota fits your call volume. The bench gives you the live numbers. it cannot tell you which trade-off your product wants.

What about gas prediction on Optimism / Base / Arbitrum?

L2 OP Stack chains have priority fee ≈ 0 because the sequencer is centralised (no MEV competition, no public mempool). The actually meaningful cost on an L2 is the L1 data fee, what the sequencer pays Ethereum mainnet to post the batch, which is a different prediction problem. We're considering a separate `l2-data-fee-prediction` bench for that. Including L2s in this bench would silently produce flat ~0 numbers that pollute the comparison.

How is gas prediction error measured here, technically?

Per chain, every oracle is polled at its tier-tolerant cadence and the predicted priority fee per tier is buffered with the predicted block height. When that block is mined, the harness pulls the full block via `eth_getBlockByNumber(.., true)` on the chain's PublicNode RPC and computes the realized percentile from the actual `maxPriorityFeePerGas` values across every included transaction. Absolute error (|predicted, realized|, in gwei) is recorded per (oracle, tier, chain) as both a gauge and a histogram. p50 / p90 / p99 are computed via Prometheus `quantile_over_time` over the last 24 h. Empty or low-tx blocks are flagged separately because the realized percentile is noisy when transaction count is near zero.

Why is gas prediction so hard?

EIP-1559 makes the base fee deterministic (it adjusts by ±12.5% per block based on the previous block's gas used), so every oracle agrees on base fee within a fraction of a gwei. The hard part is predicting the priority-fee distribution in the *next* block. Priority fees are set by users in response to mempool congestion, which shifts on swap activity, MEV bot deployments, NFT mints and DEX volume in a way no historical-lookback model can fully anticipate. The leaderboard surfaces which oracle's lookback / inference scheme tracks reality best per chain, sustained over 24 h.

Source code github.com/ChainBench/OpenChainBench/tree/main/harnesses/gas-estimation