Methodology

How every benchmark is measured, reported and reproduced.

Design principles

I.
Identical inputs. Every provider sees the same request. same pair, same notional, same destination. submitted at the same moment from the same region. If inputs differ, we say so.
II.
Honest aggregates. We report p50, p90 and p99 latency along with success rate. Means are reported but never used as a headline. tail behaviour is what users feel.
III.
Auditable runs. Raw metrics are stored in Prometheus and exposed publicly. Anyone can re-run the harness against the same endpoints and verify the numbers match.
IV.
No cherry-picking. The benchmark plan is committed before each run: providers, routes, cadence, timeout. Adding or removing providers after seeing results requires a published correction.
V.
Neutral presentation. No spec marks a winner ahead of time. Tables sort mechanically by p50; readers compare the columns themselves.

Statistical conventions

Latency aggregates: Reported as p50, p90, p99 and arithmetic mean over the run window. Failed requests (timeout, 5xx, malformed response) are excluded from latency aggregates and counted toward success rate.
24h range: Min and max of p50 observed across the rolling 24-hour window. captures the volatility of each provider, not just its central tendency.
Δ field: Each provider's p50 expressed as a percentage delta from the field mean. Negative is below the field, positive is above.
Success rate: Share of requests returning a usable result within the published timeout. The only metric that includes failures.
Region normalisation: Wherever a benchmark is multi-region, the headline figure is the cross-region median. Per-region figures appear in Fig. 3 of each report.
Significance: Differences smaller than the within-provider standard deviation are noted but not framed as a ranking.

Reproducing a result

Clone the harness from the link at the bottom of any benchmark report.
Set API keys for the providers you want to include. Public endpoints work for most aggregators; some bridges require allow-listing.
Run the harness. it exposes /metrics over HTTP. Point a local Prometheus at it, or query the public OpenChainBench Prometheus directly.
Run for at least 24 hours to get a comparable sample size (n typically ≥ 1,000 per provider per region).
Compare your aggregates to the published numbers. If they diverge, file a provider correction with a reproducer.

Corrections

Found a number you can't reproduce? File a data-quality issue (the published figure looks wrong) or a provider correction (your service measures a different value). Material errors are corrected in place with a dated note.