How routing is scored.
Every routing decision is a deterministic 0–100 composite over the factors below — version methodology_v1 (scoring v0, effective 2026-06-11). This page, the machine endpoint, and the running scoring engine all read the same versioned config, and a CI test fails the build if they ever disagree. What you read here is what runs.
Factors and weights
| Factor | Weight | What it measures | When evidence is null |
|---|---|---|---|
downstream_task_successDownstream task success | 20% | Measured rate at which calls to this service produced a successful downstream task, from probe results and verified outcome telemetry. | Neutral midpoint (0.5). Unmeasured services gain no advantage and suffer no fabricated penalty. |
schema_conformanceSchema conformance | 15% | Measured rate at which responses conform to the service's declared schema (probe harness + outcome reports). | Neutral midpoint (0.5). |
cost_per_successful_taskCost per successful task | 15% | Price estimate relative to the requesting policy's price ceiling — cheaper within budget scores higher. Computed per request against the caller's own max_price_usdc. | Neutral midpoint (0.5) when the service has no price estimate or the request sets no ceiling. |
p95_latencyp95 latency | 15% | Measured p95 latency relative to the requesting policy's latency ceiling — faster within bound scores higher. Latency evidence comes from the probe harness. | Neutral midpoint (0.5) when unmeasured or the request sets no ceiling. |
failure_mode_legibilityFailure-mode legibility | 10% | Whether the service's failure behavior has been observed and classified. Cards above the seed tier have probe-classified failure modes. | Seed (unmeasured) cards score 0.4; measured tiers score 0.7. Tier is the only input. |
provenance_qualityProvenance / receipt quality | 10% | Whether the service issues receipts for paid calls (receipt_issuer on the card). | Services declaring no receipt issuer score 0.3; any declared issuer scores 0.7. |
idempotency_replay_safetyIdempotency / replay safety | 5% | Declared + verified idempotency and replay safety on the service card. | Full marks only when both are 'yes'; anything unknown scores the neutral midpoint (0.5). |
policy_fitPolicy fit | 5% | How cleanly the candidate fits the requesting policy after hard filters (rails, flags, tier floors) have run. | Defaults to full fit (1.0) once a candidate survives the hard policy filters. |
freshnessFreshness / source confidence | 5% | Whether the evidence on the card is recent — services with a recent probe score higher. | Never-probed cards score 0.4; recently probed cards score 0.9. |
Evidence fields stay null until measured — Stackbroker does not invent benchmarks. Null evidence scores at the documented neutral treatment per factor, never at fabricated strength.
How attestation tiers are earned
| Tier | How it is earned |
|---|---|
seed — Claimed | Asserted by a source or hand-curated; Stackbroker has not measured it yet. |
probed — Observed | Stackbroker observed or measured this through the probe harness or public inspection. |
verified — Verified | Passed a defined Stackbroker verification workflow. |
attested — Reviewed | A Stackbroker reviewer checked the evidence and a signed evidence window exists. |
Every step up the ladder is free to providers. Reaching and holding attested additionally requires a clean (or remediated) trust scan inside the evidence window and a live provider connection point — see the trust pre-flight scope and the neutrality policy.
How risk_flags are produced
| Flag | Source | Meaning |
|---|---|---|
replay_safety_unknown | service card | The service's replay safety is undeclared/unverified; retried paid calls may double-charge. |
unprobed_seed_card | service card | The card is seed-tier: hand-curated or externally indexed, with no Stackbroker measurement behind it yet. |
security_finding | trust scan rubric | A confirmed finding from the trust-scan pipeline (static description/schema analysis, schema drift, endpoint reputation), mapped through the versioned rubric — never raw scanner output. Specific flags are namespaced security_flags entries on the card. |
trust_scan_stale | trust scan staleness | The service's most recent trust manifest is past its expiry window. Stale is displayed as stale, never hidden, and stops satisfying attested-tier policy floors until re-scanned. |
What is NOT a factor
- Subscription status or tier of the requesting agent or operator
- Any payment from any party — providers cannot pay Stackbroker anything, ever
- Provider relationship status (claimed vs. unclaimed cards rank under identical rules)
- Advertising, sponsorship, or placement of any kind (none exists)
- Outcome-report credits (they reduce the reporter's own bill; reports feed evidence identically regardless of who reports)
providers can't pay us · agents' payments can't tilt results
Neutrality invariants
providers_never_pay — Stackbroker accepts no money from providers: no listing fees, no verification fees, no expedited tiers, no paid placement, no advertising. Probing, claim validation, security scanning, and attestation progression are free — conditional only on claiming the card and maintaining a verified connection point. Enforcement: No provider-billing tables or foreign keys exist in the schema; CI runs a static isolation check and a behavioral test asserting provider records carry no billing linkage.
subscriptions_buy_access_not_outcomes — Agent subscriptions gate request volume, throughput, console features, and support. They never affect scores, rankings, verdicts, attestation tiers, risk flags, or audit records. Two subscribers issuing the identical request get identical decisions. Enforcement: Scoring/attestation/scanning modules have no import path or query access to billing tables; CI runs a static import check plus a behavioral test comparing decisions across subscription tiers.
Decisions are stamped, hash-chained, and anchored with daily Ed25519 signatures. Verify with the audit walkthrough; the public key is served at /.well-known/stackbroker-signing-key.
Changelog
| Version | Date | Change | Reason |
|---|---|---|---|
methodology_v1 | 2026-06-11 | First published methodology: nine-factor composite (scoring v0 weights, unchanged), evidence-null treatment documented per factor, neutrality invariants published and machine-readable. | Verifiable neutrality: the scoring that runs in production must be the scoring that is published, with CI preventing drift. |