Architecture

The WebNN implementation in Firefox consists of multiple layers, from the JavaScript API exposed to web content down to the ONNX Runtime backend that executes neural network operations.

Overview

The WebNN implementation spans six distinct layers:

  1. JavaScript API Layer - Web-facing API (navigator.ml, MLContext, MLGraphBuilder, MLGraph, MLOperand, MLTensor)

  2. WebIDL Layer - Interface definition language defining the JavaScript API surface

  3. C++ DOM Implementation - Core implementation in dom/webnn/ directory

  4. Rust FFI Bridge - Foreign Function Interface in dom/webnn/rustnn_bridge/

  5. Rustnn Library - Rust implementation in third_party/rust/rustnn/

  6. Backend - Platform-specific backend (ONNX Runtime, CoreML, etc.) for neural network execution with hardware acceleration

Architecture Diagram

        graph TB
    subgraph "Web Content (JavaScript)"
        A[navigator.ml]
        B[MLContext]
        C[MLGraphBuilder]
        D[MLGraph]
        E[MLOperand]
        F[MLTensor]
    end

    subgraph "WebIDL Layer"
        G[WebNN.webidl]
    end

    subgraph "C++ DOM Implementation (dom/webnn/)"
        H[ML.cpp]
        I[MLContext.cpp]
        J[MLGraphBuilder.cpp]
        K[MLGraph.cpp]
        L[MLOperand.cpp]
        M[MLTensor.cpp]
    end

    subgraph "Rust FFI Bridge (dom/webnn/rustnn_bridge/)"
        N[lib.rs]
        O[rustnn_context_create]
        P[rustnn_context_destroy]
        Q[rustnn_graph_build]
        R[rustnn_graph_compute]
        S[rustnn_tensor_create]
        T[rustnn_tensor_write]
        U[rustnn_tensor_read]
    end

    subgraph "Rustnn Library (third_party/rust/rustnn/)"
        V[Context]
        W[GraphBuilder]
        X[Graph]
        Y[Operation Types]
        Z[Backend Converter]
        AA[Executor]
    end

    subgraph "Backend (Platform-Specific)"
        AB[ONNX Runtime / CoreML]
        AC[Hardware Acceleration]
    end

    A --> G
    B --> G
    C --> G
    D --> G
    E --> G
    F --> G

    G --> H
    G --> I
    G --> J
    G --> K
    G --> L
    G --> M

    H --> N
    I --> O
    I --> P
    J --> Q
    K --> R
    M --> S
    M --> T
    M --> U

    O --> V
    P --> V
    Q --> W
    Q --> X
    R --> AA
    S --> V
    T --> V
    U --> V

    W --> Y
    W --> Z
    X --> Z
    Z --> AA
    AA --> AC
    

Layer Descriptions

JavaScript API Layer

The WebNN API is exposed to web content through navigator.ml and provides:

  • MLContext - Represents a compute context for neural network operations

  • MLGraphBuilder - Builds computational graphs by composing operations

  • MLGraph - Compiled computational graph ready for execution

  • MLOperand - Represents an operand in the graph (input, output, or intermediate value)

  • MLTensor - Represents tensor data that can be used across multiple graph executions

WebIDL Layer

Located in dom/webidl/WebNN.webidl, this layer:

  • Defines all operations, data types, and options available to web content

  • Acts as the contract between JavaScript and C++ implementations

  • Specifies which features are exposed with the dom.ml.enabled preference

C++ DOM Implementation

Located in dom/webnn/, this layer includes:

  • ML.cpp - Implements the navigator.ml interface and context creation

  • MLContext.cpp - Manages the WebNN context lifecycle and tensor operations

  • MLGraphBuilder.cpp - Implements all graph building operations (add, mul, conv2d, matmul, etc.)

  • MLGraph.cpp - Represents compiled graphs and handles inference execution

  • MLOperand.cpp - Manages operand metadata (shape, data type)

  • MLTensor.cpp - Handles tensor data storage and transfers

Rust FFI Bridge

Located in dom/webnn/rustnn_bridge/src/lib.rs, this layer provides:

  • C-compatible FFI functions for context, graph, and tensor operations

  • Pointer management for Rust objects exposed to C++

  • Data marshaling between C++ and Rust types

  • Error handling and propagation across the FFI boundary

Key FFI functions:

  • Context: rustnn_context_create, rustnn_context_destroy

  • Graph: rustnn_graph_build, rustnn_graph_compute, rustnn_graph_destroy

  • Tensors: rustnn_tensor_create, rustnn_tensor_write, rustnn_tensor_read

Rustnn Library

Located in third_party/rust/rustnn/, this library provides:

  • Context - Manages ONNX Runtime sessions and resources

  • GraphBuilder - Constructs computational graphs from operations

  • Graph - Internal graph representation with operations and tensors

  • Operation Types - Implementations of all supported operations

  • Backend Converter - Converts rustnn graphs to backend-specific format (ONNX or CoreML)

  • Executor - Manages inference execution through selected backend

Backend Layer

The rustnn library selects the appropriate backend based on the platform:

ONNX Runtime (Windows, Linux, Android):

  • Inference execution - Runs the actual neural network computations

  • Hardware acceleration - Utilizes CPU SIMD on all platforms, DirectML (Windows), CUDA (Linux with NVIDIA GPUs)

  • Optimization - Applies graph optimizations and operator fusion

  • Cross-platform support - Consistent behavior across Windows, Linux, and Android

CoreML (macOS, iOS):

  • Native Apple ML framework integration

  • Hardware acceleration - Utilizes Apple Neural Engine, GPU, or CPU

  • Optimized for Apple silicon

  • Native performance on macOS and iOS devices

The backend is automatically selected by rustnn at runtime based on platform availability and capabilities

Supported Operations

The WebNN implementation supports a comprehensive set of operations:

Binary Operations:

  • add, sub, mul, div, min - Element-wise arithmetic

  • matmul - Matrix multiplication

  • gemm - General matrix multiply with optional transpose

Unary Operations:

  • relu, sigmoid, tanh, softmax - Common activation functions

  • gelu, elu, leakyRelu, hardSwish - Advanced activations

Element-wise Operations:

  • abs, ceil, floor, round - Rounding and absolute value

  • neg, sign, reciprocal - Sign operations

  • exp, log, sqrt, pow - Mathematical functions

Trigonometric Operations:

  • sin, cos, tan, asin, acos, atan - Standard trigonometric functions

Hyperbolic Operations:

  • sinh, cosh, asinh, acosh, atanh - Hyperbolic functions

Convolution Operations:

  • conv2d - 2D convolution with configurable strides, dilations, padding, and groups

Pooling Operations:

  • averagePool2d, maxPool2d - 2D pooling with configurable window and strides

Normalization Operations:

  • batchNormalization - Batch normalization with scale and bias

  • instanceNormalization - Instance normalization

  • layerNormalization - Layer normalization

Reduction Operations:

  • reduceSum, reduceMean, reduceMax, reduceMin, reduceProduct - Reduction along specified axes

Shape Operations:

  • reshape - Change tensor shape

  • transpose - Permute tensor dimensions

  • concat - Concatenate tensors along an axis

Data Flow

Graph Building Phase

  1. Web content calls navigator.ml.createContext()

  2. C++ creates backend context via Rust FFI (ONNX Runtime or CoreML depending on platform)

  3. Web content creates MLGraphBuilder and defines operations

  4. Each operation creates an MLOperand representing the result

  5. Web content calls builder.build() with output operands

  6. C++ serializes operations to JSON and calls Rust FFI

  7. Rustnn converts the graph to backend-specific format (ONNX or CoreML)

  8. Backend creates an optimized execution session

  9. Graph ID is returned to web content as MLGraph

        sequenceDiagram
    participant JS as JavaScript
    participant CPP as C++ (MLGraphBuilder)
    participant FFI as Rust FFI Bridge
    participant RustNN as Rustnn Library

    JS->>CPP: createContext()
    CPP->>FFI: rustnn_context_create()
    FFI->>RustNN: Context::new()
    RustNN-->>FFI: Context handle
    FFI-->>CPP: context_id
    CPP-->>JS: MLContext

    JS->>CPP: new MLGraphBuilder(context)
    CPP-->>JS: MLGraphBuilder

    JS->>CPP: input('x', {shape, dataType})
    CPP-->>JS: MLOperand

    JS->>CPP: add(a, b)
    CPP-->>JS: MLOperand

    JS->>CPP: build({output: operand})
    CPP->>FFI: rustnn_graph_build(ops_json)
    FFI->>RustNN: GraphBuilder::build()
    RustNN->>RustNN: Convert to backend format
    RustNN->>RustNN: Create backend session
    RustNN-->>FFI: Graph handle
    FFI-->>CPP: graph_id
    CPP-->>JS: MLGraph
    

Inference Phase

  1. Web content calls context.compute(graph, inputs, outputs)

  2. C++ marshals input data and calls Rust FFI with graph ID

  3. Rustnn retrieves the backend session and prepares input tensors

  4. Backend (ONNX Runtime or CoreML) executes the computational graph

  5. Hardware acceleration is automatically utilized when available

  6. Output tensors are returned through Rust FFI

  7. C++ copies output data to JavaScript-provided buffers

  8. Promise resolves, indicating inference completion

        sequenceDiagram
    participant JS as JavaScript
    participant CPP as C++ (MLContext)
    participant FFI as Rust FFI Bridge
    participant RustNN as Rustnn Library
    participant Backend as Backend (ONNX/CoreML)

    JS->>CPP: compute(graph, inputs, outputs)
    CPP->>FFI: rustnn_graph_compute(graph_id, inputs, outputs)
    FFI->>RustNN: Graph::compute()
    RustNN->>Backend: session.run()
    Backend->>Backend: Execute operations
    Backend-->>RustNN: Output tensors
    RustNN-->>FFI: Results
    FFI-->>CPP: Output data
    CPP-->>JS: Promise resolves
    

Memory Management

C++ Side

  • Uses RefPtr for DOM objects (MLContext, MLGraph, etc.)

  • Automatic reference counting for lifetime management

  • RAII pattern ensures resource cleanup

Rust Side

  • Uses Box for heap allocation of contexts and graphs

  • Raw pointers passed across FFI boundary with explicit lifetime management

  • Destroy functions ensure proper cleanup

  • Unsafe blocks are isolated to the FFI boundary

Data Transfer

  • JavaScript TypedArrays map to C++ ArrayBufferView which map to Rust slices

  • Zero-copy when possible through pointer passing

  • Ownership is clearly defined at each boundary

Thread Safety

  • MLContext operations can be called from main thread or web workers

  • ONNX Runtime handles internal threading for inference

  • Graph building is synchronous but returns promises for async completion

  • Tensor operations use promises for async data transfer

Error Handling

Errors propagate up the stack:

  1. ONNX Runtime errors are captured as Rust Result types

  2. Rust errors are converted to FFI error codes with messages

  3. C++ checks error codes and throws DOMException objects

  4. JavaScript receives rejected promises or caught exceptions

Common error types:

  • DataError - Invalid tensor shapes or incompatible data types

  • OperationError - Graph building or execution failures

  • NotSupportedError - Unsupported operations or features

Performance Considerations

Graph Compilation

  • One-time cost when building graphs through builder.build()

  • Backend (ONNX Runtime or CoreML) applies optimizations during session creation

  • Compiled graphs can be reused for multiple inferences

Inference Execution

  • Optimized by the backend with operator fusion and memory planning

  • Benefits from graph execution caching

  • Hardware acceleration provides significant performance improvements

  • Backend selection is automatic based on platform (CoreML on macOS/iOS, ONNX Runtime elsewhere)

Memory Usage

  • Tensors can be created once and reused across multiple inferences

  • Intermediate values are managed by the backend

  • Efficient memory pooling reduces allocation overhead

Hardware Acceleration

  • Automatically detected and utilized when available

  • CPU: SIMD instructions (SSE, AVX on x86; NEON on ARM)

  • GPU: DirectML (Windows), Metal (macOS via CoreML), CUDA (Linux with NVIDIA GPUs)

  • NPU: Apple Neural Engine (macOS/iOS via CoreML)

  • Backend selection optimizes for available hardware on each platform

Future Enhancements

Areas marked with TODO comments for future development:

  • Context options - Power preference implementation (high-performance vs low-power)

  • Shape inference - More accurate shape calculation for pooling operations

  • Concat operation - Full implementation of tensor concatenation

  • Additional operations - Max, where, gather, scatter, and other operations from the W3C spec

  • WebGPU integration - Potential direct GPU execution path for improved performance

Detailed Architecture

For a more detailed view of the architecture including sequence diagrams and data flow visualizations, see the architecture.md file in the dom/webnn/docs/ directory. That file contains Mermaid diagrams showing the complete interaction between layers during graph building and inference.