.. _webnn-architecture:

Architecture
============

The WebNN implementation in Firefox consists of multiple layers, from the JavaScript API exposed to web content down to the ONNX Runtime backend that executes neural network operations.

Overview
--------

The WebNN implementation spans six distinct layers:

1. **JavaScript API Layer** - Web-facing API (navigator.ml, MLContext, MLGraphBuilder, MLGraph, MLOperand, MLTensor)
2. **WebIDL Layer** - Interface definition language defining the JavaScript API surface
3. **C++ DOM Implementation** - Core implementation in ``dom/webnn/`` directory
4. **Rust FFI Bridge** - Foreign Function Interface in ``dom/webnn/rustnn_bridge/``
5. **Rustnn Library** - Rust implementation in ``third_party/rust/rustnn/``
6. **Backend** - Platform-specific backend (ONNX Runtime, CoreML, etc.) for neural network execution with hardware acceleration

Architecture Diagram
~~~~~~~~~~~~~~~~~~~~

.. mermaid::

    graph TB
        subgraph "Web Content (JavaScript)"
            A[navigator.ml]
            B[MLContext]
            C[MLGraphBuilder]
            D[MLGraph]
            E[MLOperand]
            F[MLTensor]
        end

        subgraph "WebIDL Layer"
            G[WebNN.webidl]
        end

        subgraph "C++ DOM Implementation (dom/webnn/)"
            H[ML.cpp]
            I[MLContext.cpp]
            J[MLGraphBuilder.cpp]
            K[MLGraph.cpp]
            L[MLOperand.cpp]
            M[MLTensor.cpp]
        end

        subgraph "Rust FFI Bridge (dom/webnn/rustnn_bridge/)"
            N[lib.rs]
            O[rustnn_context_create]
            P[rustnn_context_destroy]
            Q[rustnn_graph_build]
            R[rustnn_graph_compute]
            S[rustnn_tensor_create]
            T[rustnn_tensor_write]
            U[rustnn_tensor_read]
        end

        subgraph "Rustnn Library (third_party/rust/rustnn/)"
            V[Context]
            W[GraphBuilder]
            X[Graph]
            Y[Operation Types]
            Z[Backend Converter]
            AA[Executor]
        end

        subgraph "Backend (Platform-Specific)"
            AB[ONNX Runtime / CoreML]
            AC[Hardware Acceleration]
        end

        A --> G
        B --> G
        C --> G
        D --> G
        E --> G
        F --> G

        G --> H
        G --> I
        G --> J
        G --> K
        G --> L
        G --> M

        H --> N
        I --> O
        I --> P
        J --> Q
        K --> R
        M --> S
        M --> T
        M --> U

        O --> V
        P --> V
        Q --> W
        Q --> X
        R --> AA
        S --> V
        T --> V
        U --> V

        W --> Y
        W --> Z
        X --> Z
        Z --> AA
        AA --> AC

Layer Descriptions
------------------

JavaScript API Layer
~~~~~~~~~~~~~~~~~~~~

The WebNN API is exposed to web content through ``navigator.ml`` and provides:

* **MLContext** - Represents a compute context for neural network operations
* **MLGraphBuilder** - Builds computational graphs by composing operations
* **MLGraph** - Compiled computational graph ready for execution
* **MLOperand** - Represents an operand in the graph (input, output, or intermediate value)
* **MLTensor** - Represents tensor data that can be used across multiple graph executions

WebIDL Layer
~~~~~~~~~~~~

Located in ``dom/webidl/WebNN.webidl``, this layer:

* Defines all operations, data types, and options available to web content
* Acts as the contract between JavaScript and C++ implementations
* Specifies which features are exposed with the ``dom.ml.enabled`` preference

C++ DOM Implementation
~~~~~~~~~~~~~~~~~~~~~~

Located in ``dom/webnn/``, this layer includes:

* **ML.cpp** - Implements the navigator.ml interface and context creation
* **MLContext.cpp** - Manages the WebNN context lifecycle and tensor operations
* **MLGraphBuilder.cpp** - Implements all graph building operations (add, mul, conv2d, matmul, etc.)
* **MLGraph.cpp** - Represents compiled graphs and handles inference execution
* **MLOperand.cpp** - Manages operand metadata (shape, data type)
* **MLTensor.cpp** - Handles tensor data storage and transfers

Rust FFI Bridge
~~~~~~~~~~~~~~~

Located in ``dom/webnn/rustnn_bridge/src/lib.rs``, this layer provides:

* C-compatible FFI functions for context, graph, and tensor operations
* Pointer management for Rust objects exposed to C++
* Data marshaling between C++ and Rust types
* Error handling and propagation across the FFI boundary

Key FFI functions:

* Context: ``rustnn_context_create``, ``rustnn_context_destroy``
* Graph: ``rustnn_graph_build``, ``rustnn_graph_compute``, ``rustnn_graph_destroy``
* Tensors: ``rustnn_tensor_create``, ``rustnn_tensor_write``, ``rustnn_tensor_read``

Rustnn Library
~~~~~~~~~~~~~~

Located in ``third_party/rust/rustnn/``, this library provides:

* **Context** - Manages ONNX Runtime sessions and resources
* **GraphBuilder** - Constructs computational graphs from operations
* **Graph** - Internal graph representation with operations and tensors
* **Operation Types** - Implementations of all supported operations
* **Backend Converter** - Converts rustnn graphs to backend-specific format (ONNX or CoreML)
* **Executor** - Manages inference execution through selected backend

Backend Layer
~~~~~~~~~~~~~

The rustnn library selects the appropriate backend based on the platform:

**ONNX Runtime** (Windows, Linux, Android):

* Inference execution - Runs the actual neural network computations
* Hardware acceleration - Utilizes CPU SIMD on all platforms, DirectML (Windows), CUDA (Linux with NVIDIA GPUs)
* Optimization - Applies graph optimizations and operator fusion
* Cross-platform support - Consistent behavior across Windows, Linux, and Android

**CoreML** (macOS, iOS):

* Native Apple ML framework integration
* Hardware acceleration - Utilizes Apple Neural Engine, GPU, or CPU
* Optimized for Apple silicon
* Native performance on macOS and iOS devices

The backend is automatically selected by rustnn at runtime based on platform availability and capabilities

Supported Operations
--------------------

The WebNN implementation supports a comprehensive set of operations:

**Binary Operations:**

* add, sub, mul, div, min - Element-wise arithmetic
* matmul - Matrix multiplication
* gemm - General matrix multiply with optional transpose

**Unary Operations:**

* relu, sigmoid, tanh, softmax - Common activation functions
* gelu, elu, leakyRelu, hardSwish - Advanced activations

**Element-wise Operations:**

* abs, ceil, floor, round - Rounding and absolute value
* neg, sign, reciprocal - Sign operations
* exp, log, sqrt, pow - Mathematical functions

**Trigonometric Operations:**

* sin, cos, tan, asin, acos, atan - Standard trigonometric functions

**Hyperbolic Operations:**

* sinh, cosh, asinh, acosh, atanh - Hyperbolic functions

**Convolution Operations:**

* conv2d - 2D convolution with configurable strides, dilations, padding, and groups

**Pooling Operations:**

* averagePool2d, maxPool2d - 2D pooling with configurable window and strides

**Normalization Operations:**

* batchNormalization - Batch normalization with scale and bias
* instanceNormalization - Instance normalization
* layerNormalization - Layer normalization

**Reduction Operations:**

* reduceSum, reduceMean, reduceMax, reduceMin, reduceProduct - Reduction along specified axes

**Shape Operations:**

* reshape - Change tensor shape
* transpose - Permute tensor dimensions
* concat - Concatenate tensors along an axis

Data Flow
---------

Graph Building Phase
~~~~~~~~~~~~~~~~~~~~

1. Web content calls ``navigator.ml.createContext()``
2. C++ creates backend context via Rust FFI (ONNX Runtime or CoreML depending on platform)
3. Web content creates ``MLGraphBuilder`` and defines operations
4. Each operation creates an ``MLOperand`` representing the result
5. Web content calls ``builder.build()`` with output operands
6. C++ serializes operations to JSON and calls Rust FFI
7. Rustnn converts the graph to backend-specific format (ONNX or CoreML)
8. Backend creates an optimized execution session
9. Graph ID is returned to web content as ``MLGraph``

.. mermaid::

    sequenceDiagram
        participant JS as JavaScript
        participant CPP as C++ (MLGraphBuilder)
        participant FFI as Rust FFI Bridge
        participant RustNN as Rustnn Library

        JS->>CPP: createContext()
        CPP->>FFI: rustnn_context_create()
        FFI->>RustNN: Context::new()
        RustNN-->>FFI: Context handle
        FFI-->>CPP: context_id
        CPP-->>JS: MLContext

        JS->>CPP: new MLGraphBuilder(context)
        CPP-->>JS: MLGraphBuilder

        JS->>CPP: input('x', {shape, dataType})
        CPP-->>JS: MLOperand

        JS->>CPP: add(a, b)
        CPP-->>JS: MLOperand

        JS->>CPP: build({output: operand})
        CPP->>FFI: rustnn_graph_build(ops_json)
        FFI->>RustNN: GraphBuilder::build()
        RustNN->>RustNN: Convert to backend format
        RustNN->>RustNN: Create backend session
        RustNN-->>FFI: Graph handle
        FFI-->>CPP: graph_id
        CPP-->>JS: MLGraph

Inference Phase
~~~~~~~~~~~~~~~

1. Web content calls ``context.compute(graph, inputs, outputs)``
2. C++ marshals input data and calls Rust FFI with graph ID
3. Rustnn retrieves the backend session and prepares input tensors
4. Backend (ONNX Runtime or CoreML) executes the computational graph
5. Hardware acceleration is automatically utilized when available
6. Output tensors are returned through Rust FFI
7. C++ copies output data to JavaScript-provided buffers
8. Promise resolves, indicating inference completion

.. mermaid::

    sequenceDiagram
        participant JS as JavaScript
        participant CPP as C++ (MLContext)
        participant FFI as Rust FFI Bridge
        participant RustNN as Rustnn Library
        participant Backend as Backend (ONNX/CoreML)

        JS->>CPP: compute(graph, inputs, outputs)
        CPP->>FFI: rustnn_graph_compute(graph_id, inputs, outputs)
        FFI->>RustNN: Graph::compute()
        RustNN->>Backend: session.run()
        Backend->>Backend: Execute operations
        Backend-->>RustNN: Output tensors
        RustNN-->>FFI: Results
        FFI-->>CPP: Output data
        CPP-->>JS: Promise resolves

Memory Management
-----------------

C++ Side
~~~~~~~~

* Uses ``RefPtr`` for DOM objects (MLContext, MLGraph, etc.)
* Automatic reference counting for lifetime management
* RAII pattern ensures resource cleanup

Rust Side
~~~~~~~~~

* Uses ``Box`` for heap allocation of contexts and graphs
* Raw pointers passed across FFI boundary with explicit lifetime management
* Destroy functions ensure proper cleanup
* Unsafe blocks are isolated to the FFI boundary

Data Transfer
~~~~~~~~~~~~~

* JavaScript TypedArrays map to C++ ``ArrayBufferView`` which map to Rust slices
* Zero-copy when possible through pointer passing
* Ownership is clearly defined at each boundary

Thread Safety
-------------

* MLContext operations can be called from main thread or web workers
* ONNX Runtime handles internal threading for inference
* Graph building is synchronous but returns promises for async completion
* Tensor operations use promises for async data transfer

Error Handling
--------------

Errors propagate up the stack:

1. ONNX Runtime errors are captured as Rust ``Result`` types
2. Rust errors are converted to FFI error codes with messages
3. C++ checks error codes and throws ``DOMException`` objects
4. JavaScript receives rejected promises or caught exceptions

Common error types:

* **DataError** - Invalid tensor shapes or incompatible data types
* **OperationError** - Graph building or execution failures
* **NotSupportedError** - Unsupported operations or features

Performance Considerations
--------------------------

Graph Compilation
~~~~~~~~~~~~~~~~~

* One-time cost when building graphs through ``builder.build()``
* Backend (ONNX Runtime or CoreML) applies optimizations during session creation
* Compiled graphs can be reused for multiple inferences

Inference Execution
~~~~~~~~~~~~~~~~~~~

* Optimized by the backend with operator fusion and memory planning
* Benefits from graph execution caching
* Hardware acceleration provides significant performance improvements
* Backend selection is automatic based on platform (CoreML on macOS/iOS, ONNX Runtime elsewhere)

Memory Usage
~~~~~~~~~~~~

* Tensors can be created once and reused across multiple inferences
* Intermediate values are managed by the backend
* Efficient memory pooling reduces allocation overhead

Hardware Acceleration
~~~~~~~~~~~~~~~~~~~~~

* Automatically detected and utilized when available
* **CPU**: SIMD instructions (SSE, AVX on x86; NEON on ARM)
* **GPU**: DirectML (Windows), Metal (macOS via CoreML), CUDA (Linux with NVIDIA GPUs)
* **NPU**: Apple Neural Engine (macOS/iOS via CoreML)
* Backend selection optimizes for available hardware on each platform

Future Enhancements
-------------------

Areas marked with TODO comments for future development:

* **Context options** - Power preference implementation (high-performance vs low-power)
* **Shape inference** - More accurate shape calculation for pooling operations
* **Concat operation** - Full implementation of tensor concatenation
* **Additional operations** - Max, where, gather, scatter, and other operations from the W3C spec
* **WebGPU integration** - Potential direct GPU execution path for improved performance

Detailed Architecture
---------------------

For a more detailed view of the architecture including sequence diagrams and data flow visualizations,
see the ``architecture.md`` file in the ``dom/webnn/docs/`` directory. That file contains Mermaid
diagrams showing the complete interaction between layers during graph building and inference.