Methodology

How routing is scored.

Every routing decision is a deterministic 0–100 composite over the factors below — version methodology_v1 (scoring v0, effective 2026-06-11). This page, the machine endpoint, and the running scoring engine all read the same versioned config, and a CI test fails the build if they ever disagree. What you read here is what runs.

The composite

Factors and weights

FactorWeightWhat it measuresWhen evidence is null
downstream_task_success
Downstream task success
20%Measured rate at which calls to this service produced a successful downstream task, from probe results and verified outcome telemetry.Neutral midpoint (0.5). Unmeasured services gain no advantage and suffer no fabricated penalty.
schema_conformance
Schema conformance
15%Measured rate at which responses conform to the service's declared schema (probe harness + outcome reports).Neutral midpoint (0.5).
cost_per_successful_task
Cost per successful task
15%Price estimate relative to the requesting policy's price ceiling — cheaper within budget scores higher. Computed per request against the caller's own max_price_usdc.Neutral midpoint (0.5) when the service has no price estimate or the request sets no ceiling.
p95_latency
p95 latency
15%Measured p95 latency relative to the requesting policy's latency ceiling — faster within bound scores higher. Latency evidence comes from the probe harness.Neutral midpoint (0.5) when unmeasured or the request sets no ceiling.
failure_mode_legibility
Failure-mode legibility
10%Whether the service's failure behavior has been observed and classified. Cards above the seed tier have probe-classified failure modes.Seed (unmeasured) cards score 0.4; measured tiers score 0.7. Tier is the only input.
provenance_quality
Provenance / receipt quality
10%Whether the service issues receipts for paid calls (receipt_issuer on the card).Services declaring no receipt issuer score 0.3; any declared issuer scores 0.7.
idempotency_replay_safety
Idempotency / replay safety
5%Declared + verified idempotency and replay safety on the service card.Full marks only when both are 'yes'; anything unknown scores the neutral midpoint (0.5).
policy_fit
Policy fit
5%How cleanly the candidate fits the requesting policy after hard filters (rails, flags, tier floors) have run.Defaults to full fit (1.0) once a candidate survives the hard policy filters.
freshness
Freshness / source confidence
5%Whether the evidence on the card is recent — services with a recent probe score higher.Never-probed cards score 0.4; recently probed cards score 0.9.

Evidence fields stay null until measured — Stackbroker does not invent benchmarks. Null evidence scores at the documented neutral treatment per factor, never at fabricated strength.

Trust evidence

How attestation tiers are earned

TierHow it is earned
seedClaimedAsserted by a source or hand-curated; Stackbroker has not measured it yet.
probedObservedStackbroker observed or measured this through the probe harness or public inspection.
verifiedVerifiedPassed a defined Stackbroker verification workflow.
attestedReviewedA Stackbroker reviewer checked the evidence and a signed evidence window exists.

Every step up the ladder is free to providers. Reaching and holding attested additionally requires a clean (or remediated) trust scan inside the evidence window and a live provider connection point — see the trust pre-flight scope and the neutrality policy.

Risk flags

How risk_flags are produced

FlagSourceMeaning
replay_safety_unknownservice cardThe service's replay safety is undeclared/unverified; retried paid calls may double-charge.
unprobed_seed_cardservice cardThe card is seed-tier: hand-curated or externally indexed, with no Stackbroker measurement behind it yet.
security_findingtrust scan rubricA confirmed finding from the trust-scan pipeline (static description/schema analysis, schema drift, endpoint reputation), mapped through the versioned rubric — never raw scanner output. Specific flags are namespaced security_flags entries on the card.
trust_scan_staletrust scan stalenessThe service's most recent trust manifest is past its expiry window. Stale is displayed as stale, never hidden, and stops satisfying attested-tier policy floors until re-scanned.
Stated bluntly

What is NOT a factor

  • Subscription status or tier of the requesting agent or operator
  • Any payment from any party — providers cannot pay Stackbroker anything, ever
  • Provider relationship status (claimed vs. unclaimed cards rank under identical rules)
  • Advertising, sponsorship, or placement of any kind (none exists)
  • Outcome-report credits (they reduce the reporter's own bill; reports feed evidence identically regardless of who reports)

providers can't pay us · agents' payments can't tilt results

Machine-readable

Neutrality invariants

providers_never_payStackbroker accepts no money from providers: no listing fees, no verification fees, no expedited tiers, no paid placement, no advertising. Probing, claim validation, security scanning, and attestation progression are free — conditional only on claiming the card and maintaining a verified connection point. Enforcement: No provider-billing tables or foreign keys exist in the schema; CI runs a static isolation check and a behavioral test asserting provider records carry no billing linkage.

subscriptions_buy_access_not_outcomesAgent subscriptions gate request volume, throughput, console features, and support. They never affect scores, rankings, verdicts, attestation tiers, risk flags, or audit records. Two subscribers issuing the identical request get identical decisions. Enforcement: Scoring/attestation/scanning modules have no import path or query access to billing tables; CI runs a static import check plus a behavioral test comparing decisions across subscription tiers.

Decisions are stamped, hash-chained, and anchored with daily Ed25519 signatures. Verify with the audit walkthrough; the public key is served at /.well-known/stackbroker-signing-key.

Versioned

Changelog

VersionDateChangeReason
methodology_v12026-06-11First published methodology: nine-factor composite (scoring v0 weights, unchanged), evidence-null treatment documented per factor, neutrality invariants published and machine-readable.Verifiable neutrality: the scoring that runs in production must be the scoring that is published, with CI preventing drift.