Layout Overview

Last update: May 2024

Introduction

Much of the layout code deals with operations on the frame tree (or rendering tree). In the frame tree, each node represents a rectangle (or, for SVG, other shapes). The frame tree has a shape similar to the content tree, since many content nodes have one corresponding frame, though it differs in a few ways: some content nodes have more than one frame or don’t have any frames at all. When elements are display:none in CSS or undisplayed for certain other reasons, they won’t have any frames. When elements are broken across lines or pages, they have multiple frames; elements may also have multiple frames when multiple frames nested inside each other are needed to display a single element (for example, <table>, or <video controls>).

Each node in the frame tree is an instance of a class derived from nsIFrame. As with the content tree, there is a substantial type hierarchy, but the type hierarchy is very different: it includes types like text frames, blocks and inlines, the various parts of tables, flex and grid containers, and the various types of HTML form controls.

Frames are allocated within an arena owned by the PresShell. Each frame is owned by its parent, created in nsCSSFrameConstructor and destroyed via nsIFrame::Destroy(). Frames are not reference counted, and code must not hold on to pointers to frames. To mitigate potential security bugs when pointers to destroyed frames are accessed, we use frame poisoning, which takes two parts. When a frame is destroyed other than at the end of life of the presentation, we fill its memory with a pattern consisting of a repeated pointer to inaccessible memory, and then put the memory on a per-frame-class freelist. This means that if code accesses the memory through a dangling pointer, it will either crash quickly by dereferencing the poison pattern or it will find a valid frame.

Like the content tree, frames must be accessed only from the main thread of their processes.

The frame tree should generally not store any data that cannot be recomputed on-the-fly. While the frame tree does usually persist while a page is being displayed, frames are often destroyed and recreated in response to certain style changes such as changing display:block to display:flex on an element.

The rectangle represented by the frame is what CSS calls the element’s border box. See the illustration in 8.1 Box dimensions in CSS2 spec. This is the outside edge of the border (or the inside edge of the margin). The margin lives outside the border; and the padding lives inside the border. In addition to nsIFrame::GetRect(), we also have the APIs nsIFrame::GetPaddingRect() to get the padding box (the outside edge of the padding, or inside edge of the border) and nsIFrame::GetContentRect() to get the content box (the outside edge of the content, or inside edge of the padding). These APIs may produce out of date results when reflow is needed (or has not yet occurred).

Ink Overflow vs Scrollable Overflow

In addition to tracking a rectangle, frames also track two overflow areas: ink overflow and scrollable overflow. These overflow areas represent the union of the area needed by the frame and by all its descendants. The ink overflow is used for painting-related optimizations: it is a rectangle covering all of the area that might be painted when the frame and all of its descendants paint. The scrollable overflow represents the area that the user should be able to scroll to to see the frame and all of its descendants. In some cases differences between the frame’s rect and its overflow happen because of descendants that stick out of the frame; in other cases they occur because of some characteristic of the frame itself. The two overflow areas are similar, but there are differences: for example, margins are part of scrollable overflow but not ink overflow, whereas text-shadows are part of ink overflow but not scrollable overflow.

Brief Intro to Fragmentation (or why we need frame continuations?)

When frames are broken across lines, columns, or pages, we create multiple frames representing the multiple rectangles of the element. The first one is called the primary frame, and the rest are called its continuation frames, or just continuations (which are more likely to be destroyed and recreated during reflow). These frames are linked together as continuations: they have a doubly-linked list that can be used to traverse the continuations using nsIFrame::GetPrevContinuation() and nsIFrame::GetNextContinuation(). (Currently continuations always have the same style data, though we may at some point want to break that invariant.)

Continuations are sometimes siblings of each other (i.e. nsIFrame::GetNextContinuation() and nsIFrame::GetNextSibling() might return the same frame), and sometimes not. For example, if a paragraph contains a span which contains a link, and the link is split across lines, then the continuations of the span are siblings (since they are both children of the paragraph), but the continuations of the link are not siblings (since each continuation of the link is descended from a different continuation of the span). Traversing the entire frame tree does not require explicit traversal of any frames’ continuations-list, since all of the continuations are descendants of the element containing the break.

We also use continuations for cases (most importantly, bidi reordering, where left-to-right text and right-to-left text need to be separated into different continuations since they may not form a contiguous rectangle) where the continuations should not be rewrapped during reflow: we call these continuations fixed rather than fluid. nsIFrame::GetNextInFlow() and nsIFrame::GetPrevInFlow() traverse only the fluid continuations and do not cross fixed continuation boundaries. We’ll explan more on continuations and fragmentation in a later section Fragmentation.

IB-splitting

If an inline frame has non-inline children, then we split the original inline frame into parts. The original inline’s children are distributed into these parts like so: The children of the original inline are grouped into runs of inline and non-inline, and runs of inline get an inline parent, while runs of non-inline get an anonymous block parent. We call this “ib-splitting” or “block-inside-inline splitting.” This splitting proceeds recursively up the frame tree until all non-inlines inside inlines are ancestors of a block frame with anonymous block wrappers in between. This splitting maintains the relative order between these child frames, and the relationship between the parts of a split inline is maintained using an ib-sibling chain. It is important to note that any wrappers created during frame construction (such as for tables) might not be included in the ib-sibling chain depending on when this wrapper creation takes place. See details in nsCSSFrameConstructor::CreateIBSiblings().

Physical Coordinates vs Logical Coordinates

In Western scripts, the text flows from left to right, and lines and block containers progress from top to bottom. To represent a rectangle in this writing mode, it is natural we choose the origin at the top-left corner in a space, i.e. (left,right) at (0,0). The size of a frame (rectangle) is specified by its (width, height). This is called the “physical coordinates.”

However, to support various international writing modes on the web, we need to generalize the concept. For example, in Chinese or Japanese vertical typesetting, the text can flow from top to bottom while the lines progress from right to left; in Mongolian script, text can flow from top to bottom while the lines progress from left to right. We define “abstract coordinate” or “logical coordinates” to unify the coordinates under different writing modes. The text flow direction is defined as “inline direction” and the direction which lines or block containers stack is defined as “block direction”. CSS defines three properties to determine a writing mode: writing-mode, direction, and text-orientation.

Nearly all the physical CSS properties have their logical counterparts. For example, width and height correspond to inline-size and block-size. left, right, top, and bottom correspond to inline-start, inline-end, block-start, and block-end.

In layout, we have physical types, such as nsPoint, nsSize, nsRect, and nsMargin; their logical counterparts are LogicalPoint, LogicalSize, LogicalRect, and LogicalMargin. Ideally, we should all work on logical coordinates, and convert the code that still uses physical coordinates to logical ones, except when the physical coordinates might make more sense.

References

Code (note that most files in base and generic have useful one line descriptions at the top that show up when browsing a directory in searchfox):

layout/base/ contains objects that coordinate everything and a bunch of other miscellaneous things
layout/generic/ contains the basic frame classes as well as support code for their reflow methods (ReflowInput, ReflowOutput, nsReflowStatus)
layout/forms/ contains frame classes for HTML form controls
layout/tables/ contains frame classes for CSS/HTML tables
layout/mathml/ contains frame classes for MathML
layout/svg/ contains frame classes for SVG
layout/xul/ contains frame classes for the XUL box model and for various XUL widgets

Bugzilla: all of the components whose names begin with “Layout” in the “Core” product.

Further documentation:

Talk: An Overview of Gecko Layout (Cameron McCormack :heycam, 2018-06-13)

Frame Construction

Frame construction is the process of creating frames, which is handled by nsCSSFrameConstructor. This is done when styles change in ways that require frames to be created or recreated or when nodes are inserted into the document. The content tree and the frame tree don’t have quite the same shape, and the frame construction process does some of the work of creating the right shape for the frame tree. It handles the aspects of creating the right shape that don’t depend on layout information. So for example, frame construction handles the work needed to implement table anonymous objects but does not handle frames that need to be created when an element is broken across lines or pages.

The basic unit of frame construction is a run of contiguous children of a single parent element. When asked to construct frames for such a run of children, the frame constructor first determines, based on the siblings and parent of the nodes involved, where in the frame tree the new frames should be inserted. Then the frame constructor walks through the list of content nodes involved and for each one creates a temporary data structure called a frame construction item, i.e. FrameConstructionItem. The frame construction item encapsulates various information needed to create the frames for the content node: its style data, some metadata about how one would create a frame for this node based on its namespace, tag name, and styles, and some data about what sort of frame will be created. This list of frame construction items is then analyzed to see whether constructing frames based on it and inserting them at the chosen insertion point will produce a valid frame tree. If it will not, the frame constructor either fixes up the list of frame construction items so that the resulting frame tree would be valid or throws away the list of frame construction items and requests the destruction and re-creation of the frame for the parent element so that it has a chance to create a list of frame construction items that it can fix up. The re-creation for the parent element is called “reframing”, which is an expensive operation, and we’d love to avoid it if possible.

Once the frame constructor has a list of frame construction items and an insertion point that would lead to a valid frame tree, it goes ahead and creates frames based on those items. Creation of a non-leaf frame recursively attempts to create frames for the children of that frame’s element, so in effect frames are created in a depth-first traversal of the content tree.

The vast majority of the code in the frame constructor, therefore, falls into one of these categories:

Code to determine the correct insertion point in the frame tree for new frames.
Code to create, for a given content node, frame construction items. This involves some searches through static data tables for metadata about the frame to be created.
Code to analyze the list of frame construction items.
Code to fix up the list of frame construction items.
Code to create frames from frame construction items.

Reflow

Reflow is the process of computing the positions and sizes of frames. (After all, frames represent rectangles, and at some point we need to figure out exactly what rectangle.) Reflow is done recursively, with each frame’s Reflow() method calling the Reflow() methods on that frame’s descendants.

In many cases, the correct results are defined by CSS specifications (particularly CSS 2.2). In some cases, the details are not defined by CSS, though in some (but not all) of those cases we are constrained by Web compatibility. When the details are defined by CSS, however, the code to compute the layout is generally structured somewhat differently from the way it is described in the CSS specifications, since the CSS specifications are generally written in terms of constraints, whereas our layout code consists of algorithms optimized for incremental recomputation.

Where does reflow start? How do we avoid reflowing the world every time?

The reflow generally starts from the root of the frame tree, though some other types of frame can act as “reflow roots” and start a reflow from them (nsTextControlFrame is one example; see the NS_FRAME_REFLOW_ROOT frame state bit). Reflow roots must obey the invariant that a change inside one of their descendants never changes their rect or overflow areas (though currently scrollbars are reflow roots but don’t quite obey this invariant).

In many cases, we want to reflow a part of the frame tree, and we want this reflow to be efficient. For example, when content is added or removed from the document tree or when styles change, we want the amount of work we need to redo to be proportional to the amount of content. We also want to efficiently handle a series of changes to the same content. To do this, we maintain two bits on frames: NS_FRAME_IS_DIRTY indicates that a frame and all of its descendants require reflow. NS_FRAME_HAS_DIRTY_CHILDREN indicates that a frame has a descendant that is dirty or has had a descendant removed (see its comment for details). These bits allow coalescing of multiple updates; this coalescing is done in PresShell, which tracks the set of reflow roots that require reflow. The bits are set during calls to PresShell::FrameNeedsReflow and are cleared during reflow.

Reflow Contract

The layout algorithms used by many of the frame classes are those specified in CSS, which are based on the traditional document formatting model, where inline sizes (widths) are input and block sizes (heights) are output.

When an individual frame’s Reflow() method is called, most of the input is provided in ReflowInput, which is setup by the parent frame. The output is filled in into ReflowOutput and nsReflowStatus. After reflow, the caller (usually the parent) is responsible for setting the frame’s size based on the metrics reported in ReflowOutput. The caller is also responsible to create a continuation based on the completion status reported in nsReflowStatus. We will cover more on nsReflowStatus in a later section in Reflow Status.

Compute intrinsic sizes

In some cases, inline sizes need to be determined based on the content. For example, an element with width:min-content or width:max-content. This depends on two intrinsic inline sizes: the minimum intrinsic inline size (see nsIFrame::GetMinISize()) and the preferred intrinsic inline size (see nsIFrame::GetPrefISize()). The concept of what these inline sizes represent is best explained by describing what they are on a paragraph containing only text: in such a paragraph the minimum intrinsic inline size is the inline size of the longest word, and the preferred intrinsic inline size is the inline size of the entire paragraph laid out on one line.

Intrinsic inline sizes are invalidated separately from the dirty bits described above. When a caller informs the pres shell that a frame needs reflow via PresShell::FrameNeedsReflow(), it passes one of the three options:

None indicates that no intrinsic inline sizes are dirty
FrameAndAncestors indicates that intrinsic inline sizes on it and its ancestors are dirty (which happens, for example, if new children are added to it)
FrameAncestorsAndDescendants indicates that intrinsic inline sizes on it, its ancestors, and its descendants are dirty (for example, if the font-size changes)

Painting

See Rendering Overview.

Fragmentation

Fragmentation (or pagination) is a concept used in printing, print-preview, and multicolumn layout.

Continuations in the Frame Tree

To render a DOM node, represented as nsIContent object, Gecko creates zero or more frames (nsIFrame objects). Each frame represents a rectangular area usually corresponding to the node’s CSS box as described by the CSS specs. Simple elements are often representable with exactly one frame, but sometimes an element needs to be represented with more than one frame. For example, text breaking across lines:

  xxxxxx AAAA
  AAA xxxxxxx

The A element is a single DOM node but obviously a single rectangular frame isn’t going to represent its layout precisely.

Similarly, consider text breaking across pages:

  | BBBBBBBBBB |
  | BBBBBBBBBB |
  +------------+

  +------------+
  | BBBBBBBBBB |
  | BBBBBBBBBB |
  |            |

Again, a single rectangular frame cannot represent the layout of the node. A multi-column container with multiple columns is similar.

Another case where a single DOM node is represented by multiple frames is when a text node contains bidirectional text (e.g. both Hebrew and English text). In this case, the text node and its inline ancestors are split so that each frame contains only unidirectional text.

The first frame for an element is called the primary frame. The other frames are called continuation frames. Primary frames are created by nsCSSFrameConstructor in response to content insertion notifications. Continuation frames are created during bidi resolution, and during reflow, when reflow detects that a content element cannot be fully laid out within the constraints assigned (e.g., when inline text will not fit within a particular width constraint, or when a block cannot be laid out within a particular height constraint).

Continuation frames created during reflow are called “fluid” continuations (or “in-flows”). Other continuation frames (currently, those created during bidi resolution), are, in contrast, “non-fluid”. The NS_FRAME_IS_FLUID_CONTINUATION state bit indicates whether a continuation frame is fluid or not.

The frames for an element are put in a doubly-linked list. The links are accessible via nsIFrame::GetNextContinuation() and nsIFrame::GetPrevContinuation(). If only fluid continuations are to be accessed, nsIFrame::GetNextInFlow() and nsIFrame::GetPrevInFlow() are used instead.

The following diagram shows the relationship between the original frame tree considering just primary frames, and a possible layout with breaking and continuations:

Original frame tree       Frame tree with A broken into three parts
    Root                      Root
     |                      /  |  \
     A                     A1  A2  A3
    / \                   / |  |    |
   B   C                 B  C1 C2   C3
   |  /|\                |  |  | \   |
   D E F G               D  E  F G1  G2

Certain kinds of frames create multiple child frames for the same content element:

nsPageSequenceFrame creates multiple page children, each one associated with the entire document, separated by page breaks
nsColumnSetFrame creates multiple block children, each one associated with the column element, separated by column breaks
nsBlockFrame creates multiple inline children, each one associated with the same inline element, separated by line breaks, or by changes in text direction
nsTableColFrame creates non-fluid continuations for itself if it has span=“N” and N > 1
If a block frame is a multi-column container and has column-span:all children, it creates multiple nsColumnSetFrame children, which are linked together as non-fluid continuations. Similarly, if a block frame is within a multi-column formatting context and has column-span:all children, it is chopped into several flows, which are linked together as non-fluid continuations as well. See documentation and example frame trees in nsCSSFrameConstructor::ConstructBlock().

Overflow Container Continuations

Sometimes the content of a frame needs to break across pages even though the frame itself is complete. This usually happens if an element with fixed block size has overflow that doesn’t fit on one page. In this case, the completed frame is “overflow incomplete”, and special continuations are created to hold its overflow. These continuations are called “overflow containers”. They are invisible, and are kept on a special list in their parent. See documentation in nsContainerFrame.h and example trees in bug 379349 comment 3.

This infrastructure was extended in bug 154892 to also manage continuations for absolutely-positioned frames.

Relationship of continuations to frame tree structure

It is worth emphasizing two points about the relationship of the prev-continuation / next-continuation linkage to the existing frame tree structure.

First, if you want to traverse the frame tree or a subtree thereof to examine all the frames once, you do not want to traverse next-continuation links. All continuations are reachable by traversing the GetNextSibling() links from the result of GetFirstChild() for all child lists.

Second, the following property holds: consider two frames F1 and F2 where F1’s next-continuation is F2 and their respective parent frames are P1 and P2. Then either P1’s next continuation is P2, or P1 == P2, because P is responsible for breaking F1 and F2.

In other words, continuations are sometimes siblings of each other, and sometimes not. If their parent content was broken at the same point, then they are not siblings, since they are children of different continuations of the parent. So in the frame tree for the markup

<p>This is <b><i>some<br/>text</i></b>.</p>

the two continuations for the <b> element are siblings (unless the line break is also a page break), but the two continuations for the <i> element are not.

There is an exception to that property when F1 is a first-in-flow float placeholder. In that case F2’s parent will be the next-in-flow of F1’s containing block.

Reflow Status

Reflow status is found in aStatus argument of Reflow(). IsComplete() means that we reflowed all the content and no more next-in-flows are needed. At that point there may still be next in flows, but the parent will delete them. IsIncomplete() means “some content did not fit in this frame”. IsOverflowIncomplete() means that the frame is itself complete, but some of its content didn’t fit: this triggers the creation of overflow containers for the frame’s continuations. IsIncomplete() and NextInFlowNeedsReflow() means “some content did not fit in this frame AND it must be reflowed”. These values are defined and documented in nsReflowStatus::Completion.

Dynamic Reflow Considerations

When we reflow a frame F with fluid continuations, two things can happen:

Some child frames do not fit in the passed-in inline size or block size constraint. These frames must be “pushed” to F’s next-in-flow. If F has no next-in-flow, we must create one under F’s parent’s next-in-flow — or if F’s parent is managing the breaking of F, then we create F’s next in flow directly under F’s parent. If F is a block, it pushes overflowing child frames to its “overflow” child list and forces F’s next in flow to be reflowed. When we reflow a block, we pull the child frames from the prev-in-flow’s overflow list into the current frame.
All child frames fit in the passed-in inline size or block size constraint. Then child frames must be “pulled” from F’s next-in-flow to fill in the available space. If F’s next-in-flow becomes empty, we may be able to delete it.

In both of these situations we might end up with a frame F containing two child frames, one of which is a continuation of the other. This is incorrect. We might also create holes, where there are frames P1 P2 and P3, P1 has child F1 and P3 has child F2, but P2 has no F child.

A strategy for avoiding these issues is this: When pulling a frame F2 from parent P2 to prev-in-flow P1, if F2 is a breakable container, then:

If F2 has no prev-in-flow F1 in P1, then create a new primary frame F1 in P1 for F2’s content, with F2 as its next-in-flow.
Pull children from F2 to F1 until F2 is empty or we run out of space. If F2 goes empty, pull from the next non-empty next-in-flow. Empty continuations with no next-in-flows can be deleted.

When pushing a frame F1 from parent P1 to P2, where F1 has a next-in-flow F2 (which must be a child of P2):

Merge F2 into F1 by moving all F2’s children into F1, then deleting F2

For inline frames F, we have our own custom strategy that coalesces adjacent inline frames. This need not change.

We do need to implement this strategy when F is a normal in-flow block, a floating block, and eventually an absolutely positioned block.