.. _webnn-architecture: Architecture ============ The WebNN implementation in Firefox consists of multiple layers, from the JavaScript API exposed to web content down to the ONNX Runtime backend that executes neural network operations. Overview -------- The WebNN implementation spans six distinct layers: 1. **JavaScript API Layer** - Web-facing API (navigator.ml, MLContext, MLGraphBuilder, MLGraph, MLOperand, MLTensor) 2. **WebIDL Layer** - Interface definition language defining the JavaScript API surface 3. **C++ DOM Implementation** - Core implementation in ``dom/webnn/`` directory 4. **Rust FFI Bridge** - Foreign Function Interface in ``dom/webnn/rustnn_bridge/`` 5. **Rustnn Library** - Rust implementation in ``third_party/rust/rustnn/`` 6. **Backend** - Platform-specific backend (ONNX Runtime, CoreML, etc.) for neural network execution with hardware acceleration Architecture Diagram ~~~~~~~~~~~~~~~~~~~~ .. mermaid:: graph TB subgraph "Web Content (JavaScript)" A[navigator.ml] B[MLContext] C[MLGraphBuilder] D[MLGraph] E[MLOperand] F[MLTensor] end subgraph "WebIDL Layer" G[WebNN.webidl] end subgraph "C++ DOM Implementation (dom/webnn/)" H[ML.cpp] I[MLContext.cpp] J[MLGraphBuilder.cpp] K[MLGraph.cpp] L[MLOperand.cpp] M[MLTensor.cpp] end subgraph "Rust FFI Bridge (dom/webnn/rustnn_bridge/)" N[lib.rs] O[rustnn_context_create] P[rustnn_context_destroy] Q[rustnn_graph_build] R[rustnn_graph_compute] S[rustnn_tensor_create] T[rustnn_tensor_write] U[rustnn_tensor_read] end subgraph "Rustnn Library (third_party/rust/rustnn/)" V[Context] W[GraphBuilder] X[Graph] Y[Operation Types] Z[Backend Converter] AA[Executor] end subgraph "Backend (Platform-Specific)" AB[ONNX Runtime / CoreML] AC[Hardware Acceleration] end A --> G B --> G C --> G D --> G E --> G F --> G G --> H G --> I G --> J G --> K G --> L G --> M H --> N I --> O I --> P J --> Q K --> R M --> S M --> T M --> U O --> V P --> V Q --> W Q --> X R --> AA S --> V T --> V U --> V W --> Y W --> Z X --> Z Z --> AA AA --> AC Layer Descriptions ------------------ JavaScript API Layer ~~~~~~~~~~~~~~~~~~~~ The WebNN API is exposed to web content through ``navigator.ml`` and provides: * **MLContext** - Represents a compute context for neural network operations * **MLGraphBuilder** - Builds computational graphs by composing operations * **MLGraph** - Compiled computational graph ready for execution * **MLOperand** - Represents an operand in the graph (input, output, or intermediate value) * **MLTensor** - Represents tensor data that can be used across multiple graph executions WebIDL Layer ~~~~~~~~~~~~ Located in ``dom/webidl/WebNN.webidl``, this layer: * Defines all operations, data types, and options available to web content * Acts as the contract between JavaScript and C++ implementations * Specifies which features are exposed with the ``dom.ml.enabled`` preference C++ DOM Implementation ~~~~~~~~~~~~~~~~~~~~~~ Located in ``dom/webnn/``, this layer includes: * **ML.cpp** - Implements the navigator.ml interface and context creation * **MLContext.cpp** - Manages the WebNN context lifecycle and tensor operations * **MLGraphBuilder.cpp** - Implements all graph building operations (add, mul, conv2d, matmul, etc.) * **MLGraph.cpp** - Represents compiled graphs and handles inference execution * **MLOperand.cpp** - Manages operand metadata (shape, data type) * **MLTensor.cpp** - Handles tensor data storage and transfers Rust FFI Bridge ~~~~~~~~~~~~~~~ Located in ``dom/webnn/rustnn_bridge/src/lib.rs``, this layer provides: * C-compatible FFI functions for context, graph, and tensor operations * Pointer management for Rust objects exposed to C++ * Data marshaling between C++ and Rust types * Error handling and propagation across the FFI boundary Key FFI functions: * Context: ``rustnn_context_create``, ``rustnn_context_destroy`` * Graph: ``rustnn_graph_build``, ``rustnn_graph_compute``, ``rustnn_graph_destroy`` * Tensors: ``rustnn_tensor_create``, ``rustnn_tensor_write``, ``rustnn_tensor_read`` Rustnn Library ~~~~~~~~~~~~~~ Located in ``third_party/rust/rustnn/``, this library provides: * **Context** - Manages ONNX Runtime sessions and resources * **GraphBuilder** - Constructs computational graphs from operations * **Graph** - Internal graph representation with operations and tensors * **Operation Types** - Implementations of all supported operations * **Backend Converter** - Converts rustnn graphs to backend-specific format (ONNX or CoreML) * **Executor** - Manages inference execution through selected backend Backend Layer ~~~~~~~~~~~~~ The rustnn library selects the appropriate backend based on the platform: **ONNX Runtime** (Windows, Linux, Android): * Inference execution - Runs the actual neural network computations * Hardware acceleration - Utilizes CPU SIMD on all platforms, DirectML (Windows), CUDA (Linux with NVIDIA GPUs) * Optimization - Applies graph optimizations and operator fusion * Cross-platform support - Consistent behavior across Windows, Linux, and Android **CoreML** (macOS, iOS): * Native Apple ML framework integration * Hardware acceleration - Utilizes Apple Neural Engine, GPU, or CPU * Optimized for Apple silicon * Native performance on macOS and iOS devices The backend is automatically selected by rustnn at runtime based on platform availability and capabilities Supported Operations -------------------- The WebNN implementation supports a comprehensive set of operations: **Binary Operations:** * add, sub, mul, div, min - Element-wise arithmetic * matmul - Matrix multiplication * gemm - General matrix multiply with optional transpose **Unary Operations:** * relu, sigmoid, tanh, softmax - Common activation functions * gelu, elu, leakyRelu, hardSwish - Advanced activations **Element-wise Operations:** * abs, ceil, floor, round - Rounding and absolute value * neg, sign, reciprocal - Sign operations * exp, log, sqrt, pow - Mathematical functions **Trigonometric Operations:** * sin, cos, tan, asin, acos, atan - Standard trigonometric functions **Hyperbolic Operations:** * sinh, cosh, asinh, acosh, atanh - Hyperbolic functions **Convolution Operations:** * conv2d - 2D convolution with configurable strides, dilations, padding, and groups **Pooling Operations:** * averagePool2d, maxPool2d - 2D pooling with configurable window and strides **Normalization Operations:** * batchNormalization - Batch normalization with scale and bias * instanceNormalization - Instance normalization * layerNormalization - Layer normalization **Reduction Operations:** * reduceSum, reduceMean, reduceMax, reduceMin, reduceProduct - Reduction along specified axes **Shape Operations:** * reshape - Change tensor shape * transpose - Permute tensor dimensions * concat - Concatenate tensors along an axis Data Flow --------- Graph Building Phase ~~~~~~~~~~~~~~~~~~~~ 1. Web content calls ``navigator.ml.createContext()`` 2. C++ creates backend context via Rust FFI (ONNX Runtime or CoreML depending on platform) 3. Web content creates ``MLGraphBuilder`` and defines operations 4. Each operation creates an ``MLOperand`` representing the result 5. Web content calls ``builder.build()`` with output operands 6. C++ serializes operations to JSON and calls Rust FFI 7. Rustnn converts the graph to backend-specific format (ONNX or CoreML) 8. Backend creates an optimized execution session 9. Graph ID is returned to web content as ``MLGraph`` .. mermaid:: sequenceDiagram participant JS as JavaScript participant CPP as C++ (MLGraphBuilder) participant FFI as Rust FFI Bridge participant RustNN as Rustnn Library JS->>CPP: createContext() CPP->>FFI: rustnn_context_create() FFI->>RustNN: Context::new() RustNN-->>FFI: Context handle FFI-->>CPP: context_id CPP-->>JS: MLContext JS->>CPP: new MLGraphBuilder(context) CPP-->>JS: MLGraphBuilder JS->>CPP: input('x', {shape, dataType}) CPP-->>JS: MLOperand JS->>CPP: add(a, b) CPP-->>JS: MLOperand JS->>CPP: build({output: operand}) CPP->>FFI: rustnn_graph_build(ops_json) FFI->>RustNN: GraphBuilder::build() RustNN->>RustNN: Convert to backend format RustNN->>RustNN: Create backend session RustNN-->>FFI: Graph handle FFI-->>CPP: graph_id CPP-->>JS: MLGraph Inference Phase ~~~~~~~~~~~~~~~ 1. Web content calls ``context.compute(graph, inputs, outputs)`` 2. C++ marshals input data and calls Rust FFI with graph ID 3. Rustnn retrieves the backend session and prepares input tensors 4. Backend (ONNX Runtime or CoreML) executes the computational graph 5. Hardware acceleration is automatically utilized when available 6. Output tensors are returned through Rust FFI 7. C++ copies output data to JavaScript-provided buffers 8. Promise resolves, indicating inference completion .. mermaid:: sequenceDiagram participant JS as JavaScript participant CPP as C++ (MLContext) participant FFI as Rust FFI Bridge participant RustNN as Rustnn Library participant Backend as Backend (ONNX/CoreML) JS->>CPP: compute(graph, inputs, outputs) CPP->>FFI: rustnn_graph_compute(graph_id, inputs, outputs) FFI->>RustNN: Graph::compute() RustNN->>Backend: session.run() Backend->>Backend: Execute operations Backend-->>RustNN: Output tensors RustNN-->>FFI: Results FFI-->>CPP: Output data CPP-->>JS: Promise resolves Memory Management ----------------- C++ Side ~~~~~~~~ * Uses ``RefPtr`` for DOM objects (MLContext, MLGraph, etc.) * Automatic reference counting for lifetime management * RAII pattern ensures resource cleanup Rust Side ~~~~~~~~~ * Uses ``Box`` for heap allocation of contexts and graphs * Raw pointers passed across FFI boundary with explicit lifetime management * Destroy functions ensure proper cleanup * Unsafe blocks are isolated to the FFI boundary Data Transfer ~~~~~~~~~~~~~ * JavaScript TypedArrays map to C++ ``ArrayBufferView`` which map to Rust slices * Zero-copy when possible through pointer passing * Ownership is clearly defined at each boundary Thread Safety ------------- * MLContext operations can be called from main thread or web workers * ONNX Runtime handles internal threading for inference * Graph building is synchronous but returns promises for async completion * Tensor operations use promises for async data transfer Error Handling -------------- Errors propagate up the stack: 1. ONNX Runtime errors are captured as Rust ``Result`` types 2. Rust errors are converted to FFI error codes with messages 3. C++ checks error codes and throws ``DOMException`` objects 4. JavaScript receives rejected promises or caught exceptions Common error types: * **DataError** - Invalid tensor shapes or incompatible data types * **OperationError** - Graph building or execution failures * **NotSupportedError** - Unsupported operations or features Performance Considerations -------------------------- Graph Compilation ~~~~~~~~~~~~~~~~~ * One-time cost when building graphs through ``builder.build()`` * Backend (ONNX Runtime or CoreML) applies optimizations during session creation * Compiled graphs can be reused for multiple inferences Inference Execution ~~~~~~~~~~~~~~~~~~~ * Optimized by the backend with operator fusion and memory planning * Benefits from graph execution caching * Hardware acceleration provides significant performance improvements * Backend selection is automatic based on platform (CoreML on macOS/iOS, ONNX Runtime elsewhere) Memory Usage ~~~~~~~~~~~~ * Tensors can be created once and reused across multiple inferences * Intermediate values are managed by the backend * Efficient memory pooling reduces allocation overhead Hardware Acceleration ~~~~~~~~~~~~~~~~~~~~~ * Automatically detected and utilized when available * **CPU**: SIMD instructions (SSE, AVX on x86; NEON on ARM) * **GPU**: DirectML (Windows), Metal (macOS via CoreML), CUDA (Linux with NVIDIA GPUs) * **NPU**: Apple Neural Engine (macOS/iOS via CoreML) * Backend selection optimizes for available hardware on each platform Future Enhancements ------------------- Areas marked with TODO comments for future development: * **Context options** - Power preference implementation (high-performance vs low-power) * **Shape inference** - More accurate shape calculation for pooling operations * **Concat operation** - Full implementation of tensor concatenation * **Additional operations** - Max, where, gather, scatter, and other operations from the W3C spec * **WebGPU integration** - Potential direct GPU execution path for improved performance Detailed Architecture --------------------- For a more detailed view of the architecture including sequence diagrams and data flow visualizations, see the ``architecture.md`` file in the ``dom/webnn/docs/`` directory. That file contains Mermaid diagrams showing the complete interaction between layers during graph building and inference.