Architecture
The WebNN implementation in Firefox consists of multiple layers, from the JavaScript API exposed to web content down to the ONNX Runtime backend that executes neural network operations.
Overview
The WebNN implementation spans six distinct layers:
JavaScript API Layer - Web-facing API (navigator.ml, MLContext, MLGraphBuilder, MLGraph, MLOperand, MLTensor)
WebIDL Layer - Interface definition language defining the JavaScript API surface
C++ DOM Implementation - Core implementation in
dom/webnn/directoryRust FFI Bridge - Foreign Function Interface in
dom/webnn/rustnn_bridge/Rustnn Library - Rust implementation in
third_party/rust/rustnn/Backend - Platform-specific backend (ONNX Runtime, CoreML, etc.) for neural network execution with hardware acceleration
Architecture Diagram
graph TB
subgraph "Web Content (JavaScript)"
A[navigator.ml]
B[MLContext]
C[MLGraphBuilder]
D[MLGraph]
E[MLOperand]
F[MLTensor]
end
subgraph "WebIDL Layer"
G[WebNN.webidl]
end
subgraph "C++ DOM Implementation (dom/webnn/)"
H[ML.cpp]
I[MLContext.cpp]
J[MLGraphBuilder.cpp]
K[MLGraph.cpp]
L[MLOperand.cpp]
M[MLTensor.cpp]
end
subgraph "Rust FFI Bridge (dom/webnn/rustnn_bridge/)"
N[lib.rs]
O[rustnn_context_create]
P[rustnn_context_destroy]
Q[rustnn_graph_build]
R[rustnn_graph_compute]
S[rustnn_tensor_create]
T[rustnn_tensor_write]
U[rustnn_tensor_read]
end
subgraph "Rustnn Library (third_party/rust/rustnn/)"
V[Context]
W[GraphBuilder]
X[Graph]
Y[Operation Types]
Z[Backend Converter]
AA[Executor]
end
subgraph "Backend (Platform-Specific)"
AB[ONNX Runtime / CoreML]
AC[Hardware Acceleration]
end
A --> G
B --> G
C --> G
D --> G
E --> G
F --> G
G --> H
G --> I
G --> J
G --> K
G --> L
G --> M
H --> N
I --> O
I --> P
J --> Q
K --> R
M --> S
M --> T
M --> U
O --> V
P --> V
Q --> W
Q --> X
R --> AA
S --> V
T --> V
U --> V
W --> Y
W --> Z
X --> Z
Z --> AA
AA --> AC
Layer Descriptions
JavaScript API Layer
The WebNN API is exposed to web content through navigator.ml and provides:
MLContext - Represents a compute context for neural network operations
MLGraphBuilder - Builds computational graphs by composing operations
MLGraph - Compiled computational graph ready for execution
MLOperand - Represents an operand in the graph (input, output, or intermediate value)
MLTensor - Represents tensor data that can be used across multiple graph executions
WebIDL Layer
Located in dom/webidl/WebNN.webidl, this layer:
Defines all operations, data types, and options available to web content
Acts as the contract between JavaScript and C++ implementations
Specifies which features are exposed with the
dom.ml.enabledpreference
C++ DOM Implementation
Located in dom/webnn/, this layer includes:
ML.cpp - Implements the navigator.ml interface and context creation
MLContext.cpp - Manages the WebNN context lifecycle and tensor operations
MLGraphBuilder.cpp - Implements all graph building operations (add, mul, conv2d, matmul, etc.)
MLGraph.cpp - Represents compiled graphs and handles inference execution
MLOperand.cpp - Manages operand metadata (shape, data type)
MLTensor.cpp - Handles tensor data storage and transfers
Rust FFI Bridge
Located in dom/webnn/rustnn_bridge/src/lib.rs, this layer provides:
C-compatible FFI functions for context, graph, and tensor operations
Pointer management for Rust objects exposed to C++
Data marshaling between C++ and Rust types
Error handling and propagation across the FFI boundary
Key FFI functions:
Context:
rustnn_context_create,rustnn_context_destroyGraph:
rustnn_graph_build,rustnn_graph_compute,rustnn_graph_destroyTensors:
rustnn_tensor_create,rustnn_tensor_write,rustnn_tensor_read
Rustnn Library
Located in third_party/rust/rustnn/, this library provides:
Context - Manages ONNX Runtime sessions and resources
GraphBuilder - Constructs computational graphs from operations
Graph - Internal graph representation with operations and tensors
Operation Types - Implementations of all supported operations
Backend Converter - Converts rustnn graphs to backend-specific format (ONNX or CoreML)
Executor - Manages inference execution through selected backend
Backend Layer
The rustnn library selects the appropriate backend based on the platform:
ONNX Runtime (Windows, Linux, Android):
Inference execution - Runs the actual neural network computations
Hardware acceleration - Utilizes CPU SIMD on all platforms, DirectML (Windows), CUDA (Linux with NVIDIA GPUs)
Optimization - Applies graph optimizations and operator fusion
Cross-platform support - Consistent behavior across Windows, Linux, and Android
CoreML (macOS, iOS):
Native Apple ML framework integration
Hardware acceleration - Utilizes Apple Neural Engine, GPU, or CPU
Optimized for Apple silicon
Native performance on macOS and iOS devices
The backend is automatically selected by rustnn at runtime based on platform availability and capabilities
Supported Operations
The WebNN implementation supports a comprehensive set of operations:
Binary Operations:
add, sub, mul, div, min - Element-wise arithmetic
matmul - Matrix multiplication
gemm - General matrix multiply with optional transpose
Unary Operations:
relu, sigmoid, tanh, softmax - Common activation functions
gelu, elu, leakyRelu, hardSwish - Advanced activations
Element-wise Operations:
abs, ceil, floor, round - Rounding and absolute value
neg, sign, reciprocal - Sign operations
exp, log, sqrt, pow - Mathematical functions
Trigonometric Operations:
sin, cos, tan, asin, acos, atan - Standard trigonometric functions
Hyperbolic Operations:
sinh, cosh, asinh, acosh, atanh - Hyperbolic functions
Convolution Operations:
conv2d - 2D convolution with configurable strides, dilations, padding, and groups
Pooling Operations:
averagePool2d, maxPool2d - 2D pooling with configurable window and strides
Normalization Operations:
batchNormalization - Batch normalization with scale and bias
instanceNormalization - Instance normalization
layerNormalization - Layer normalization
Reduction Operations:
reduceSum, reduceMean, reduceMax, reduceMin, reduceProduct - Reduction along specified axes
Shape Operations:
reshape - Change tensor shape
transpose - Permute tensor dimensions
concat - Concatenate tensors along an axis
Data Flow
Graph Building Phase
Web content calls
navigator.ml.createContext()C++ creates backend context via Rust FFI (ONNX Runtime or CoreML depending on platform)
Web content creates
MLGraphBuilderand defines operationsEach operation creates an
MLOperandrepresenting the resultWeb content calls
builder.build()with output operandsC++ serializes operations to JSON and calls Rust FFI
Rustnn converts the graph to backend-specific format (ONNX or CoreML)
Backend creates an optimized execution session
Graph ID is returned to web content as
MLGraph
sequenceDiagram
participant JS as JavaScript
participant CPP as C++ (MLGraphBuilder)
participant FFI as Rust FFI Bridge
participant RustNN as Rustnn Library
JS->>CPP: createContext()
CPP->>FFI: rustnn_context_create()
FFI->>RustNN: Context::new()
RustNN-->>FFI: Context handle
FFI-->>CPP: context_id
CPP-->>JS: MLContext
JS->>CPP: new MLGraphBuilder(context)
CPP-->>JS: MLGraphBuilder
JS->>CPP: input('x', {shape, dataType})
CPP-->>JS: MLOperand
JS->>CPP: add(a, b)
CPP-->>JS: MLOperand
JS->>CPP: build({output: operand})
CPP->>FFI: rustnn_graph_build(ops_json)
FFI->>RustNN: GraphBuilder::build()
RustNN->>RustNN: Convert to backend format
RustNN->>RustNN: Create backend session
RustNN-->>FFI: Graph handle
FFI-->>CPP: graph_id
CPP-->>JS: MLGraph
Inference Phase
Web content calls
context.compute(graph, inputs, outputs)C++ marshals input data and calls Rust FFI with graph ID
Rustnn retrieves the backend session and prepares input tensors
Backend (ONNX Runtime or CoreML) executes the computational graph
Hardware acceleration is automatically utilized when available
Output tensors are returned through Rust FFI
C++ copies output data to JavaScript-provided buffers
Promise resolves, indicating inference completion
sequenceDiagram
participant JS as JavaScript
participant CPP as C++ (MLContext)
participant FFI as Rust FFI Bridge
participant RustNN as Rustnn Library
participant Backend as Backend (ONNX/CoreML)
JS->>CPP: compute(graph, inputs, outputs)
CPP->>FFI: rustnn_graph_compute(graph_id, inputs, outputs)
FFI->>RustNN: Graph::compute()
RustNN->>Backend: session.run()
Backend->>Backend: Execute operations
Backend-->>RustNN: Output tensors
RustNN-->>FFI: Results
FFI-->>CPP: Output data
CPP-->>JS: Promise resolves
Memory Management
C++ Side
Uses
RefPtrfor DOM objects (MLContext, MLGraph, etc.)Automatic reference counting for lifetime management
RAII pattern ensures resource cleanup
Rust Side
Uses
Boxfor heap allocation of contexts and graphsRaw pointers passed across FFI boundary with explicit lifetime management
Destroy functions ensure proper cleanup
Unsafe blocks are isolated to the FFI boundary
Data Transfer
JavaScript TypedArrays map to C++
ArrayBufferViewwhich map to Rust slicesZero-copy when possible through pointer passing
Ownership is clearly defined at each boundary
Thread Safety
MLContext operations can be called from main thread or web workers
ONNX Runtime handles internal threading for inference
Graph building is synchronous but returns promises for async completion
Tensor operations use promises for async data transfer
Error Handling
Errors propagate up the stack:
ONNX Runtime errors are captured as Rust
ResulttypesRust errors are converted to FFI error codes with messages
C++ checks error codes and throws
DOMExceptionobjectsJavaScript receives rejected promises or caught exceptions
Common error types:
DataError - Invalid tensor shapes or incompatible data types
OperationError - Graph building or execution failures
NotSupportedError - Unsupported operations or features
Performance Considerations
Graph Compilation
One-time cost when building graphs through
builder.build()Backend (ONNX Runtime or CoreML) applies optimizations during session creation
Compiled graphs can be reused for multiple inferences
Inference Execution
Optimized by the backend with operator fusion and memory planning
Benefits from graph execution caching
Hardware acceleration provides significant performance improvements
Backend selection is automatic based on platform (CoreML on macOS/iOS, ONNX Runtime elsewhere)
Memory Usage
Tensors can be created once and reused across multiple inferences
Intermediate values are managed by the backend
Efficient memory pooling reduces allocation overhead
Hardware Acceleration
Automatically detected and utilized when available
CPU: SIMD instructions (SSE, AVX on x86; NEON on ARM)
GPU: DirectML (Windows), Metal (macOS via CoreML), CUDA (Linux with NVIDIA GPUs)
NPU: Apple Neural Engine (macOS/iOS via CoreML)
Backend selection optimizes for available hardware on each platform
Future Enhancements
Areas marked with TODO comments for future development:
Context options - Power preference implementation (high-performance vs low-power)
Shape inference - More accurate shape calculation for pooling operations
Concat operation - Full implementation of tensor concatenation
Additional operations - Max, where, gather, scatter, and other operations from the W3C spec
WebGPU integration - Potential direct GPU execution path for improved performance
Detailed Architecture
For a more detailed view of the architecture including sequence diagrams and data flow visualizations,
see the architecture.md file in the dom/webnn/docs/ directory. That file contains Mermaid
diagrams showing the complete interaction between layers during graph building and inference.