Methodology

How routing is scored.

Every routing decision is a deterministic 0–100 composite over the factors below — version methodology_v1 (scoring v0, effective 2026-06-11). This page, the machine endpoint, and the running scoring engine all read the same versioned config, and a CI test fails the build if they ever disagree. What you read here is what runs.

GET /v1/methodology Verify the audit chain Neutrality policy

The composite

Factors and weights

Factor	Weight	What it measures	When evidence is null
`downstream_task_success` Downstream task success	20%	Measured rate at which calls to this service produced a successful downstream task, from probe results and verified outcome telemetry.	Neutral midpoint (0.5). Unmeasured services gain no advantage and suffer no fabricated penalty.
`schema_conformance` Schema conformance	15%	Measured rate at which responses conform to the service's declared schema (probe harness + outcome reports).	Neutral midpoint (0.5).
`cost_per_successful_task` Cost per successful task	15%	Price estimate relative to the requesting policy's price ceiling — cheaper within budget scores higher. Computed per request against the caller's own max_price_usdc.	Neutral midpoint (0.5) when the service has no price estimate or the request sets no ceiling.
`p95_latency` p95 latency	15%	Measured p95 latency relative to the requesting policy's latency ceiling — faster within bound scores higher. Latency evidence comes from the probe harness.	Neutral midpoint (0.5) when unmeasured or the request sets no ceiling.
`failure_mode_legibility` Failure-mode legibility	10%	Whether the service's failure behavior has been observed and classified. Cards above the seed tier have probe-classified failure modes.	Seed (unmeasured) cards score 0.4; measured tiers score 0.7. Tier is the only input.
`provenance_quality` Provenance / receipt quality	10%	Whether the service issues receipts for paid calls (receipt_issuer on the card).	Services declaring no receipt issuer score 0.3; any declared issuer scores 0.7.
`idempotency_replay_safety` Idempotency / replay safety	5%	Declared + verified idempotency and replay safety on the service card.	Full marks only when both are 'yes'; anything unknown scores the neutral midpoint (0.5).
`policy_fit` Policy fit	5%	How cleanly the candidate fits the requesting policy after hard filters (rails, flags, tier floors) have run.	Defaults to full fit (1.0) once a candidate survives the hard policy filters.
`freshness` Freshness / source confidence	5%	Whether the evidence on the card is recent — services with a recent probe score higher.	Never-probed cards score 0.4; recently probed cards score 0.9.

Evidence fields stay null until measured — Stackbroker does not invent benchmarks. Null evidence scores at the documented neutral treatment per factor, never at fabricated strength.

Trust evidence

How attestation tiers are earned

Tier	How it is earned
`seed` — Claimed	Asserted by a source or hand-curated; Stackbroker has not measured it yet.
`probed` — Observed	Stackbroker observed or measured this through the probe harness or public inspection.
`verified` — Verified	Passed a defined Stackbroker verification workflow.
`attested` — Reviewed	A Stackbroker reviewer checked the evidence and a signed evidence window exists.

Every step up the ladder is free to providers. Reaching and holding attested additionally requires a clean (or remediated) trust scan inside the evidence window and a live provider connection point — see the trust pre-flight scope and the neutrality policy.

Risk flags

How risk_flags are produced

Flag	Source	Meaning
`replay_safety_unknown`	service card	The service's replay safety is undeclared/unverified; retried paid calls may double-charge.
`unprobed_seed_card`	service card	The card is seed-tier: hand-curated or externally indexed, with no Stackbroker measurement behind it yet.
`security_finding`	trust scan rubric	A confirmed finding from the trust-scan pipeline (static description/schema analysis, schema drift, endpoint reputation), mapped through the versioned rubric — never raw scanner output. Specific flags are namespaced security_flags entries on the card.
`trust_scan_stale`	trust scan staleness	The service's most recent trust manifest is past its expiry window. Stale is displayed as stale, never hidden, and stops satisfying attested-tier policy floors until re-scanned.

Stated bluntly

What is NOT a factor

Subscription status or tier of the requesting agent or operator
Any payment from any party — providers cannot pay Stackbroker anything, ever
Provider relationship status (claimed vs. unclaimed cards rank under identical rules)
Advertising, sponsorship, or placement of any kind (none exists)
Outcome-report credits (they reduce the reporter's own bill; reports feed evidence identically regardless of who reports)

providers can't pay us · agents' payments can't tilt results

Machine-readable

Neutrality invariants

providers_never_pay — Stackbroker accepts no money from providers: no listing fees, no verification fees, no expedited tiers, no paid placement, no advertising. Probing, claim validation, security scanning, and attestation progression are free — conditional only on claiming the card and maintaining a verified connection point. Enforcement: No provider-billing tables or foreign keys exist in the schema; CI runs a static isolation check and a behavioral test asserting provider records carry no billing linkage.

subscriptions_buy_access_not_outcomes — Agent subscriptions gate request volume, throughput, console features, and support. They never affect scores, rankings, verdicts, attestation tiers, risk flags, or audit records. Two subscribers issuing the identical request get identical decisions. Enforcement: Scoring/attestation/scanning modules have no import path or query access to billing tables; CI runs a static import check plus a behavioral test comparing decisions across subscription tiers.

Decisions are stamped, hash-chained, and anchored with daily Ed25519 signatures. Verify with the audit walkthrough; the public key is served at /.well-known/stackbroker-signing-key.

Versioned

Changelog

Version	Date	Change	Reason
`methodology_v1`	2026-06-11	First published methodology: nine-factor composite (scoring v0 weights, unchanged), evidence-null treatment documented per factor, neutrality invariants published and machine-readable.	Verifiable neutrality: the scoring that runs in production must be the scoring that is published, with CI preventing drift.