Architecture

The WebNN implementation in Firefox consists of multiple layers, from the JavaScript API exposed to web content down to the ONNX Runtime backend that executes neural network operations.

Overview

The WebNN implementation spans six distinct layers:

JavaScript API Layer - Web-facing API (navigator.ml, MLContext, MLGraphBuilder, MLGraph, MLOperand, MLTensor)
WebIDL Layer - Interface definition language defining the JavaScript API surface
C++ DOM Implementation - Core implementation in dom/webnn/ directory
Rust FFI Bridge - Foreign Function Interface in dom/webnn/rustnn_bridge/
Rustnn Library - Rust implementation in third_party/rust/rustnn/
Backend - Platform-specific backend (ONNX Runtime, CoreML, etc.) for neural network execution with hardware acceleration

Architecture Diagram

        graph TB
    subgraph "Web Content (JavaScript)"
        A[navigator.ml]
        B[MLContext]
        C[MLGraphBuilder]
        D[MLGraph]
        E[MLOperand]
        F[MLTensor]
    end

    subgraph "WebIDL Layer"
        G[WebNN.webidl]
    end

    subgraph "C++ DOM Implementation (dom/webnn/)"
        H[ML.cpp]
        I[MLContext.cpp]
        J[MLGraphBuilder.cpp]
        K[MLGraph.cpp]
        L[MLOperand.cpp]
        M[MLTensor.cpp]
    end

    subgraph "Rust FFI Bridge (dom/webnn/rustnn_bridge/)"
        N[lib.rs]
        O[rustnn_context_create]
        P[rustnn_context_destroy]
        Q[rustnn_graph_build]
        R[rustnn_graph_compute]
        S[rustnn_tensor_create]
        T[rustnn_tensor_write]
        U[rustnn_tensor_read]
    end

    subgraph "Rustnn Library (third_party/rust/rustnn/)"
        V[Context]
        W[GraphBuilder]
        X[Graph]
        Y[Operation Types]
        Z[Backend Converter]
        AA[Executor]
    end

    subgraph "Backend (Platform-Specific)"
        AB[ONNX Runtime / CoreML]
        AC[Hardware Acceleration]
    end

    A --> G
    B --> G
    C --> G
    D --> G
    E --> G
    F --> G

    G --> H
    G --> I
    G --> J
    G --> K
    G --> L
    G --> M

    H --> N
    I --> O
    I --> P
    J --> Q
    K --> R
    M --> S
    M --> T
    M --> U

    O --> V
    P --> V
    Q --> W
    Q --> X
    R --> AA
    S --> V
    T --> V
    U --> V

    W --> Y
    W --> Z
    X --> Z
    Z --> AA
    AA --> AC

Layer Descriptions

JavaScript API Layer

The WebNN API is exposed to web content through navigator.ml and provides:

MLContext - Represents a compute context for neural network operations
MLGraphBuilder - Builds computational graphs by composing operations
MLGraph - Compiled computational graph ready for execution
MLOperand - Represents an operand in the graph (input, output, or intermediate value)
MLTensor - Represents tensor data that can be used across multiple graph executions

WebIDL Layer

Located in dom/webidl/WebNN.webidl, this layer:

Defines all operations, data types, and options available to web content
Acts as the contract between JavaScript and C++ implementations
Specifies which features are exposed with the dom.ml.enabled preference

C++ DOM Implementation

Located in dom/webnn/, this layer includes:

ML.cpp - Implements the navigator.ml interface and context creation
MLContext.cpp - Manages the WebNN context lifecycle and tensor operations
MLGraphBuilder.cpp - Implements all graph building operations (add, mul, conv2d, matmul, etc.)
MLGraph.cpp - Represents compiled graphs and handles inference execution
MLOperand.cpp - Manages operand metadata (shape, data type)
MLTensor.cpp - Handles tensor data storage and transfers

Rust FFI Bridge

Located in dom/webnn/rustnn_bridge/src/lib.rs, this layer provides:

C-compatible FFI functions for context, graph, and tensor operations
Pointer management for Rust objects exposed to C++
Data marshaling between C++ and Rust types
Error handling and propagation across the FFI boundary

Key FFI functions:

Context: rustnn_context_create, rustnn_context_destroy
Graph: rustnn_graph_build, rustnn_graph_compute, rustnn_graph_destroy
Tensors: rustnn_tensor_create, rustnn_tensor_write, rustnn_tensor_read

Rustnn Library

Located in third_party/rust/rustnn/, this library provides:

Context - Manages ONNX Runtime sessions and resources
GraphBuilder - Constructs computational graphs from operations
Graph - Internal graph representation with operations and tensors
Operation Types - Implementations of all supported operations
Backend Converter - Converts rustnn graphs to backend-specific format (ONNX or CoreML)
Executor - Manages inference execution through selected backend

Backend Layer

The rustnn library selects the appropriate backend based on the platform:

ONNX Runtime (Windows, Linux, Android):

Inference execution - Runs the actual neural network computations
Hardware acceleration - Utilizes CPU SIMD on all platforms, DirectML (Windows), CUDA (Linux with NVIDIA GPUs)
Optimization - Applies graph optimizations and operator fusion
Cross-platform support - Consistent behavior across Windows, Linux, and Android

CoreML (macOS, iOS):

Native Apple ML framework integration
Hardware acceleration - Utilizes Apple Neural Engine, GPU, or CPU
Optimized for Apple silicon
Native performance on macOS and iOS devices

The backend is automatically selected by rustnn at runtime based on platform availability and capabilities

Supported Operations

The WebNN implementation supports a comprehensive set of operations:

Binary Operations:

add, sub, mul, div, min - Element-wise arithmetic
matmul - Matrix multiplication
gemm - General matrix multiply with optional transpose

Unary Operations:

relu, sigmoid, tanh, softmax - Common activation functions
gelu, elu, leakyRelu, hardSwish - Advanced activations

Element-wise Operations:

abs, ceil, floor, round - Rounding and absolute value
neg, sign, reciprocal - Sign operations
exp, log, sqrt, pow - Mathematical functions

Trigonometric Operations:

sin, cos, tan, asin, acos, atan - Standard trigonometric functions

Hyperbolic Operations:

sinh, cosh, asinh, acosh, atanh - Hyperbolic functions

Convolution Operations:

conv2d - 2D convolution with configurable strides, dilations, padding, and groups

Pooling Operations:

averagePool2d, maxPool2d - 2D pooling with configurable window and strides

Normalization Operations:

batchNormalization - Batch normalization with scale and bias
instanceNormalization - Instance normalization
layerNormalization - Layer normalization

Reduction Operations:

reduceSum, reduceMean, reduceMax, reduceMin, reduceProduct - Reduction along specified axes

Shape Operations:

reshape - Change tensor shape
transpose - Permute tensor dimensions
concat - Concatenate tensors along an axis

Data Flow

Graph Building Phase

Web content calls navigator.ml.createContext()
C++ creates backend context via Rust FFI (ONNX Runtime or CoreML depending on platform)
Web content creates MLGraphBuilder and defines operations
Each operation creates an MLOperand representing the result
Web content calls builder.build() with output operands
C++ serializes operations to JSON and calls Rust FFI
Rustnn converts the graph to backend-specific format (ONNX or CoreML)
Backend creates an optimized execution session
Graph ID is returned to web content as MLGraph

        sequenceDiagram
    participant JS as JavaScript
    participant CPP as C++ (MLGraphBuilder)
    participant FFI as Rust FFI Bridge
    participant RustNN as Rustnn Library

    JS->>CPP: createContext()
    CPP->>FFI: rustnn_context_create()
    FFI->>RustNN: Context::new()
    RustNN-->>FFI: Context handle
    FFI-->>CPP: context_id
    CPP-->>JS: MLContext

    JS->>CPP: new MLGraphBuilder(context)
    CPP-->>JS: MLGraphBuilder

    JS->>CPP: input('x', {shape, dataType})
    CPP-->>JS: MLOperand

    JS->>CPP: add(a, b)
    CPP-->>JS: MLOperand

    JS->>CPP: build({output: operand})
    CPP->>FFI: rustnn_graph_build(ops_json)
    FFI->>RustNN: GraphBuilder::build()
    RustNN->>RustNN: Convert to backend format
    RustNN->>RustNN: Create backend session
    RustNN-->>FFI: Graph handle
    FFI-->>CPP: graph_id
    CPP-->>JS: MLGraph

Inference Phase

Web content calls context.compute(graph, inputs, outputs)
C++ marshals input data and calls Rust FFI with graph ID
Rustnn retrieves the backend session and prepares input tensors
Backend (ONNX Runtime or CoreML) executes the computational graph
Hardware acceleration is automatically utilized when available
Output tensors are returned through Rust FFI
C++ copies output data to JavaScript-provided buffers
Promise resolves, indicating inference completion

        sequenceDiagram
    participant JS as JavaScript
    participant CPP as C++ (MLContext)
    participant FFI as Rust FFI Bridge
    participant RustNN as Rustnn Library
    participant Backend as Backend (ONNX/CoreML)

    JS->>CPP: compute(graph, inputs, outputs)
    CPP->>FFI: rustnn_graph_compute(graph_id, inputs, outputs)
    FFI->>RustNN: Graph::compute()
    RustNN->>Backend: session.run()
    Backend->>Backend: Execute operations
    Backend-->>RustNN: Output tensors
    RustNN-->>FFI: Results
    FFI-->>CPP: Output data
    CPP-->>JS: Promise resolves

Memory Management

C++ Side

Uses RefPtr for DOM objects (MLContext, MLGraph, etc.)
Automatic reference counting for lifetime management
RAII pattern ensures resource cleanup

Rust Side

Uses Box for heap allocation of contexts and graphs
Raw pointers passed across FFI boundary with explicit lifetime management
Destroy functions ensure proper cleanup
Unsafe blocks are isolated to the FFI boundary

Data Transfer

JavaScript TypedArrays map to C++ ArrayBufferView which map to Rust slices
Zero-copy when possible through pointer passing
Ownership is clearly defined at each boundary

Thread Safety

MLContext operations can be called from main thread or web workers
ONNX Runtime handles internal threading for inference
Graph building is synchronous but returns promises for async completion
Tensor operations use promises for async data transfer

Error Handling

Errors propagate up the stack:

ONNX Runtime errors are captured as Rust Result types
Rust errors are converted to FFI error codes with messages
C++ checks error codes and throws DOMException objects
JavaScript receives rejected promises or caught exceptions

Common error types:

DataError - Invalid tensor shapes or incompatible data types
OperationError - Graph building or execution failures
NotSupportedError - Unsupported operations or features

Performance Considerations

Graph Compilation

One-time cost when building graphs through builder.build()
Backend (ONNX Runtime or CoreML) applies optimizations during session creation
Compiled graphs can be reused for multiple inferences

Inference Execution

Optimized by the backend with operator fusion and memory planning
Benefits from graph execution caching
Hardware acceleration provides significant performance improvements
Backend selection is automatic based on platform (CoreML on macOS/iOS, ONNX Runtime elsewhere)

Memory Usage

Tensors can be created once and reused across multiple inferences
Intermediate values are managed by the backend
Efficient memory pooling reduces allocation overhead

Hardware Acceleration

Automatically detected and utilized when available
CPU: SIMD instructions (SSE, AVX on x86; NEON on ARM)
GPU: DirectML (Windows), Metal (macOS via CoreML), CUDA (Linux with NVIDIA GPUs)
NPU: Apple Neural Engine (macOS/iOS via CoreML)
Backend selection optimizes for available hardware on each platform

Future Enhancements

Areas marked with TODO comments for future development:

Context options - Power preference implementation (high-performance vs low-power)
Shape inference - More accurate shape calculation for pooling operations
Concat operation - Full implementation of tensor concatenation
Additional operations - Max, where, gather, scatter, and other operations from the W3C spec
WebGPU integration - Potential direct GPU execution path for improved performance

Detailed Architecture

For a more detailed view of the architecture including sequence diagrams and data flow visualizations, see the architecture.md file in the dom/webnn/docs/ directory. That file contains Mermaid diagrams showing the complete interaction between layers during graph building and inference.