Architecture Advanced 2 minute read Updated 2026-06-26 UTC

Browser and edge runtime architecture

How memory ceilings, quantization, WebAssembly, WebGPU, and local privacy shape model breeding systems.

Research statusEngineering synthesis from browser and edge sections of source reports Publication statePublished Reviewed byMichael Kappel Source reports3

Why the browser matters

Browser and edge runtimes make the Four Fs concrete. They force frugality because memory is limited. They reward speed because users feel latency immediately. They require flexibility because device capabilities vary. They support federation because private local data can remain local.

A browser model-breeding system should not try to load a large generalist and then improvise. It should treat each skill as a package with a declared footprint, contract, runtime backend, and fallback path.

Runtime layers

LayerResponsibilityTypical artifact
Loaderfetch, verify, cache, and unload packagesmanifest + digest
Runtimeexecute model or adapterWASM, WebGPU, ONNX, GGUF-compatible engine
Routerselect the smallest adequate skillpolicy + score table
Budgettrack memory, latency, and battery costlocal resource ledger
Evaluatordetect confidence failure or unsafe outputlocal validator or remote review path
Syncshare only approved summaries or updatesfederated/distilled records

Package-first execution

A skill package needs enough metadata for the runtime to say no before loading it. The manifest should include size, required backend, expected latency class, input contract, output contract, risk tier, and unload conditions.

pseudocode
FUNCTION load_skill_if_affordable(skill_manifest, device_state, policy)
    VERIFY_SIGNATURE(skill_manifest)
    VERIFY_DIGEST(skill_manifest.weights)

    IF skill_manifest.required_backend NOT_IN device_state.backends
        RETURN REJECT("backend unavailable")

    projected_memory <- device_state.loaded_bytes + skill_manifest.bytes
    IF projected_memory > policy.memory_ceiling
        RETURN REJECT("memory ceiling")

    IF skill_manifest.risk_tier > policy.allowed_risk_tier
        RETURN REJECT("risk tier")

    LOAD_WEIGHTS(skill_manifest)
    REGISTER_UNLOAD_RULE(skill_manifest.unload_condition)
    RETURN READY(skill_manifest.id)
END FUNCTION

Quantization as a design primitive

Quantization is not merely compression after training. It is an architectural primitive because it decides which combinations can coexist on a device. A slightly weaker quantized specialist may be more valuable than a stronger model that prevents the router from loading complementary skills.

Privacy boundary

Local execution does not automatically make a system private. Telemetry, cache keys, prompts, embeddings, and sync payloads can leak. Treat every outbound event as a data product that needs minimization, purpose limitation, and user-visible policy.

Failure modes

The main edge failures are memory spikes, cold-start latency, stale cached packages, mismatched tokenizers, unsupported operators, and silent evaluator bypass when the device is offline. The safe default is an explicit fallback, not an ungoverned best effort.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.