Why the browser matters
Browser and edge runtimes make the Four Fs concrete. They force frugality because memory is limited. They reward speed because users feel latency immediately. They require flexibility because device capabilities vary. They support federation because private local data can remain local.
A browser model-breeding system should not try to load a large generalist and then improvise. It should treat each skill as a package with a declared footprint, contract, runtime backend, and fallback path.
Runtime layers
| Layer | Responsibility | Typical artifact |
|---|---|---|
| Loader | fetch, verify, cache, and unload packages | manifest + digest |
| Runtime | execute model or adapter | WASM, WebGPU, ONNX, GGUF-compatible engine |
| Router | select the smallest adequate skill | policy + score table |
| Budget | track memory, latency, and battery cost | local resource ledger |
| Evaluator | detect confidence failure or unsafe output | local validator or remote review path |
| Sync | share only approved summaries or updates | federated/distilled records |
Package-first execution
A skill package needs enough metadata for the runtime to say no before loading it. The manifest should include size, required backend, expected latency class, input contract, output contract, risk tier, and unload conditions.
FUNCTION load_skill_if_affordable(skill_manifest, device_state, policy)
VERIFY_SIGNATURE(skill_manifest)
VERIFY_DIGEST(skill_manifest.weights)
IF skill_manifest.required_backend NOT_IN device_state.backends
RETURN REJECT("backend unavailable")
projected_memory <- device_state.loaded_bytes + skill_manifest.bytes
IF projected_memory > policy.memory_ceiling
RETURN REJECT("memory ceiling")
IF skill_manifest.risk_tier > policy.allowed_risk_tier
RETURN REJECT("risk tier")
LOAD_WEIGHTS(skill_manifest)
REGISTER_UNLOAD_RULE(skill_manifest.unload_condition)
RETURN READY(skill_manifest.id)
END FUNCTIONQuantization as a design primitive
Quantization is not merely compression after training. It is an architectural primitive because it decides which combinations can coexist on a device. A slightly weaker quantized specialist may be more valuable than a stronger model that prevents the router from loading complementary skills.
Privacy boundary
Local execution does not automatically make a system private. Telemetry, cache keys, prompts, embeddings, and sync payloads can leak. Treat every outbound event as a data product that needs minimization, purpose limitation, and user-visible policy.
Failure modes
The main edge failures are memory spikes, cold-start latency, stale cached packages, mismatched tokenizers, unsupported operators, and silent evaluator bypass when the device is offline. The safe default is an explicit fallback, not an ungoverned best effort.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.