Methodology
How every benchmark is measured, reported and reproduced.
Design principles
- I.
Identical inputs. Every provider sees the same request. same pair, same notional, same destination. submitted at the same moment from the same region. If inputs differ, we say so.
- II.
Honest aggregates. We report p50, p90 and p99 latency along with success rate. Means are reported but never used as a headline. tail behaviour is what users feel.
- III.
Auditable runs. Raw metrics are stored in Prometheus and exposed publicly. Anyone can re-run the harness against the same endpoints and verify the numbers match.
- IV.
No cherry-picking. The benchmark plan is committed before each run: providers, routes, cadence, timeout. Adding or removing providers after seeing results requires a published correction.
- V.
Neutral presentation. No spec marks a winner ahead of time. Tables sort mechanically by p50; readers compare the columns themselves.
Statistical conventions
- Latency aggregates
- Reported as p50, p90, p99 and arithmetic mean over the run window. Failed requests (timeout, 5xx, malformed response) are excluded from latency aggregates and counted toward success rate.
- 24h range
- Min and max of p50 observed across the rolling 24-hour window. captures the volatility of each provider, not just its central tendency.
- Δ field
- Each provider's p50 expressed as a percentage delta from the field mean. Negative is below the field, positive is above.
- Success rate
- Share of requests returning a usable result within the published timeout. The only metric that includes failures.
- Region normalisation
- Wherever a benchmark is multi-region, the headline figure is the cross-region median. Per-region figures appear in Fig. 3 of each report.
- Significance
- Differences smaller than the within-provider standard deviation are noted but not framed as a ranking.
Reproducing a result
- Clone the harness from the link at the bottom of any benchmark report.
- Set API keys for the providers you want to include. Public endpoints work for most aggregators; some bridges require allow-listing.
- Run the harness. it exposes
/metricsover HTTP. Point a local Prometheus at it, or query the public OpenChainBench Prometheus directly. - Run for at least 24 hours to get a comparable sample size (n typically ≥ 1,000 per provider per region).
- Compare your aggregates to the published numbers. If they diverge, file a provider correction with a reproducer.
Corrections
Found a number you can't reproduce? File a data-quality issue (the published figure looks wrong) or a provider correction (your service measures a different value). Material errors are corrected in place with a dated note.