Glean Interface For Firefox Telemetry (GIFFT)¶
To make Migration from Firefox Telemetry to Glean easier, the C++ and JS Glean API can be configured (on a metric-by-metric basis) to mirror data collection to both the Glean metric and a Telemetry probe.
GIFFT should ideally be used only when the data you require for analysis still mostly lives in Telemetry, and should be removed promptly when no longer needed. Instrumentors are encouraged to have the Telemetry mirror probe expire within six versions. (As always you can renew an expiring probe if you’re still using it, but this will help us get closer to the time when we eventually turn Telemetry off.)
Note: GIFFT only works for data provided via C++ or JS. Rust Glean metrics APIs will not mirror to Telemetry as Telemetry does not have a Rust API.
Note: Using the Glean API replaces the Telemetry API. Do not use any mix of the two APIs for the same probe.
How to Mirror a Glean Metric to a Firefox Telemetry Probe¶
For the mirror to work, you need three things:
A compatible Glean metric (defined in a
metrics.yaml
)A compatible Telemetry probe (defined in
Histograms.json
,Scalars.yaml
, orEvents.yaml
)A
telemetry_mirror
property on the Glean metric definition identifying the Telemetry probe
Compatibility¶
This compatibility table explains which Telemetry probe types can be mirrors for which Glean metric types:
Glean Metric Type | Telementry Probe Type |
---|---|
boolean | Scalar of kind: boolean |
labeled_boolean | Keyed scalar of kind: boolean |
counter | Scalar of kind: uint |
labeled_counter | Keyed Scalar of kind: uint |
string | Scalar of kind: string |
labeled_string | No Supported Telemetry Type |
string_list | Keyed Scalar of kind: boolean. The keys are the strings. The values are all true . Calling Set on the labeled_string is not mirrored (since there's no way to remove keys from a keyed scalar of kind boolean). Doing so will log a warning. |
timespan | Scalar of kind: uint. The value is in units of milliseconds. |
timing_distribution | Histogram of kind "linear" or "exponential". Samples will be in units of milliseconds. |
labeled_timing_distribution | Keyed Histogram of kind "linear" or "exponential". Samples will be in units of milliseconds. |
memory_distribution | Histogram of kind "linear" or "exponential". Samples will be in memory_unit units. |
labeled_memory_distribution | Keyed Histogram of kind "linear" or "exponential". Samples will be in memory_unit units. |
custom_distribution | Histogram of kind "linear" or "exponential". Samples will be used as is. Ensure the bucket count and range match. |
labeled_custom_distribution | Keyed Histogram of kind "linear" or "exponential". Samples will be used as is. Ensure the bucket count and range match. |
uuid | Scalar of kind: string. Value will be in canonical 8-4-4-4-12 format. Value is not guaranteed to be valid, and invalid values may be present in the mirrored scalar while the uuid metric remains empty. Calling GenerateAndSet on the uuid is not mirrored, and will log a warning. |
url | Scalar of kind: string. The stringified Url will be cropped to the maximum length allowed by the legacy type. |
datetime | Scalar of kind: string. Value will be in ISO8601 format. |
events | Events. The value field will be filled by the Glean extra named value if defined and present. |
quantity | Scalar of kind: uint |
rate | Keyed Scalar of kind: uint. The keys are "numerator" and "denominator". Does not work for rate metrics with external denominators. |
text | No Supported Telemetry Type |
The telemetry_mirror
property in metrics.yaml
¶
You must use the C++ enum identifier of the Histogram, Scalar, or Event being mirrored to:
For Histograms, the Telemetry C++ enum identifier is the histogram’s name
e.g. The C++ enum identifier for
WR_RENDERER_TIME
isWR_RENDERER_TIME
(see gfx/metrics.yaml)
For Scalars, the Telemetry C++ enum identifier is the Scalar category and name in
SCREAMING_SNAKE_CASE
with any.
replaced with_
e.g. The enum identifier for
extensions.startupCache.load_time
isEXTENSIONS_STARTUPCACHE_LOAD_TIME
(see toolkit/components/extensions/metrics.yaml)
For Events, the Telemetry C++ enum identifier is the Event category, method, and object rendered in
Snakey_CamelCase
.e.g. The enum identifier for
page_load.toplevel#content
isPage_load_Toplevel_Content
(see dom/metrics.yaml)
If you use the wrong enum identifier, this will manifest as a build error.
If you are having trouble finding the correct conjugation for the mirror Telemetry probe,
you can find the specific value in the list of all Telemetry C++ enum identifiers in
<objdir>/toolkit/components/telemetry/Telemetry{Histogram|Scalar|Event}Enums.h
.
(Choose the file appropriate to the type of the Telemetry mirror.)
Artifact Build Support¶
Sadly, GIFFT does not support Artifact builds. You must build Firefox when you add the mirrored metric so the C++ enum value is present, even if you only use the metric from Javascript.
Analysis Gotchas¶
Firefox Telemetry and the Glean SDK are very different. Though GIFFT bridges the differences as best it can, there are many things it cannot account for.
These are a few of the ways that differences between Firefox Telemetry and the Glean SDK might manifest as anomalies during analysis.
Processes, Products, and Channels¶
Like Firefox on Glean itself, GIFFT doesn’t know what process, product, or channel it is recording in. Telemetry does, and imposes restrictions on which probes can be recorded to and when.
Ensure that the following fields in any Telemetry mirror’s definition aren’t too restrictive for your use:
record_in_processes
products
release_channel_collection
/releaseChannelCollection
A mismatch won’t result in an error. If you, for example, record to a Glean metric in a release channel that the Telemetry mirror probe doesn’t permit, then the Glean metric will have a value and the Telemetry mirror probe won’t.
Also recall that Telemetry probes split their values across processes. Glean metrics do not. This may manifest as curious anomalies when comparing the Glean metric to its Telemetry mirror probe. Ensure your analyses are aggregating Telemetry values from all processes, or define and use process-specific Glean metrics and Telemetry mirror probes to keep things separate.
Pings¶
Glean and Telemetry both send their built-in pings on their own schedules. This means the values present in these pings may not agree since they reflect state at different time.
For example, if you are measuring “Number of Monitors” with a
quantity
sent by default in the Glean “metrics” ping mirrored to a
Scalar of kind: uint
sent by default in the Telemetry “main” ping,
then if the user plugs in a second monitor between midnight
(when Telemetry “main” pings with reason “daily” are sent) and 4AM
(when Glean “metrics” pings with reason “today” are sent),
the value in the quantity
will be 2
while the value in the Scalar of kind: uint will be 1
.
If the metric or mirrored probe are sent in Custom pings, the schedules could line up exactly or be entirely unrelated.
Labels¶
Labeled metrics supported by GIFFT adhere to the Glean SDK’s label format.
Keyed Scalars and Keyed Histograms, on the other hand, do not have a concept of an “Invalid key”. Firefox Telemetry will accept just about any sequence of bytes as a key.
This means that a label deemed invalid by the Glean SDK may appear in the mirrored probe’s data.
For example, using 72 “1” characters as a label that doesn’t conform to the format
(it is longer than 71 printable ASCII characters).
See that the labeled_boolean
metric
correctly ascribes it to __other__
whereas the mirrored Keyed Scalar with kind boolean stores and retrieves it without change:
Glean.testOnly.mirrorsForLabeledBools["1".repeat(72)].set(true);
Assert.equal(true, Glean.testOnly.mirrorsForLabeledBools.__other__.testGetValue());
// The above actually throws NS_ERROR_LOSS_OF_SIGNIFICANT_DATA because it also records
// an invalid_label error. But you get the idea.
let snapshot = Services.telemetry.getSnapshotForKeyedScalars().parent;
Assert.equal(true, snapshot["telemetry.test.mirror_for_labeled_bool"]["1".repeat(72)]);
Telemetry Events¶
A Glean event can be mirrored to a Telemetry Event.
In order to make use of the value
field in Telemetry Events, you must
first define an event extra in the metrics.yaml file with the name “value”.
On recording the event with the Glean extra key for the “value” filled in,
GIFFT will map this to the Telemetry Event value
property and remove it from
the list of extras so it is not duplicated.
Numeric Values¶
The arguments and storage formats for Glean’s numeric types
(counter
, labeled_counter
, quantity
, rate
, and timespan
)
are different from Telemetry’s numeric type
(Scalar of kind uint
).
This results in a few notable differences.
Saturation and Overflow¶
counter
, labeled_counter
, and rate
metrics are stored as 32-bit signed values.
quantity
metrics are stored as 64-bit signed values.
timing_distribution
samples can be 64-bit signed values.
All of these Glean numeric metric types saturate at their maximum representable value,
or according to the Limits section of the Glean metric type documentation.
Scalars of kind uint
are stored as 32-bit unsigned values.
They will overflow if they exceed the value $2^{32} - 1$.
If a Glean numeric type saturates, it will record an error of type invalid_overflow
.
In your analyses please check for these errors.
Quantity Value Over-size¶
Values greater than $2^{32} - 1$ passed to a quantity
metric’s
set()
method will be clamped to $2^{32} - 1$ before being passed to the metric’s Telemetry mirror.
Negative Values¶
Values less than 0 passed to any numeric metric type’s API will not be passed on to the Telemetry mirror. This avoids small negative numbers being cast into a stunningly large numbers, and keeps the Telemetry mirror’s value closer to that of the Glean metric.
Long Time Spans¶
If the number of milliseconds between calls to a
timespan
metric’s start()
and stop()
methods exceeds $2^{32} - 1$,
the value passed to the metric’s Telemetry mirror will be clamped to $2^{32} - 1$.
The same happens for samples in timing_distribution
metrics:
values passed to the Telemetry mirror histogram will saturate at $2^{32} - 1$
until they get past $2^{64}$ when they’ll overflow.
timing_distribution
mirrors: Samples and Sums might be Different¶
A specific value in a timing_distribution
metric will not always agree with
the corresponding value in its mirrored-to histogram.
Though the calls to the clock are very close together in the code in Telemetry and Glean,
Telemetry’s are not on the exact same instruction as Glean’s and
Telemetry uses a different clock source (TimeStamp::Now()
) than Glean (time::precise_time_ns()
).
Also, if these slight drifts happen to cross the boundary of a bucket in either system, samples might end up looking more different than you’d expect.
This shouldn’t affect analysis, but it can affect testing, so please bear this difference in mind in testing.
timing_distribution
mirrors: sample-based APIs are not recorded¶
Values stored with accumulate_samples
and accumulate_single_sample
are not
passed to the Telemetry mirror histogram with GIFFT.
App Shutdown¶
Telemetry only works up to
ShutdownPhase::AppShutdownTelemetry
aka profile-before-change-telemetry
.
Telemetry data recorded after that phase just aren’t persisted.
FOG presently shuts down Glean in a later phase, and so is able to collect data deeper into shutdown. (The particular phase is not presently something anyone’s asked us to guarantee, so that’s why I’m not being precise.)
What this means is that, for data recorded later in shutdown, Glean will report more complete information than Telemetry will.
Once-per-session Scalars¶
Legacy Telemetry Scalars are guaranteed to be submitted in Telemetry “main” pings at least once every session. The default metrics transport in Glean, the “metrics” ping, is submitted at least once a day.
This means if your instrumentation code runs once per session, in your Glean metrics later sessions’ values will overwrite earlier ones until a Glean “metrics” ping is submitted.
Glean timespan metrics are slightly different
If your Glean metric is a timespan
, later sessions’ values will not overwrite earlier ones.
Instead, the earliest one will persist and
an invalid_state
error will be recorded.
If you’d prefer it to instead silently overwrite, use a quantity
instead of a timespan
.
To preserve all sessions’ values, you can use different metric
types:
For
quantity
metrics:If timing-related, use
timing_distribution
.If memory-related, use
memory_distribution
.Otherwise, use
custom_distribution
.
For
string
,uuid
,url
, ordatetime
metrics, you can usestring_list
.Note:
string_list
has a fixed limit on the number of values.
For
boolean
metrics, use alabeled_counter
with labels “true” and “false”.
To only preserve the session’s values for as long as the session is active,
use lifetime: application
and apply no_lint: [GIFFT_NON_PING_LIFETIME]
to have Glean send the value in every “metrics” ping that session,
clearing it after the session completes.
Legacy Telemetry has no concept of metric lifetimes
Be careful when using lifetime: application
in combination with GIFFT.
Legacy Telemetry has no concept of metric lifetimes.
You would do well to think through exactly what instrumentation operations are happening,
and when.
Please do reach out for assistance if you have any questions.