Crash Ping Lifecycle

Crash pings and derived data go through a number of separate programs and services. To get a better idea of how these components interact, a breakdown of the lifecycle is presented here.

This description applies to Glean crash ping data.

Origin

When a crash occurs, Glean metrics are populated and a Glean crash ping is sent with the data. This is ingested and made available in BigQuery through the usual Glean infrastructure.

Ping Definitions

BigQuery Tables

  • Desktop view: firefox_desktop.crash.

  • Crashreporter client view: firefox_crashreporter.crash. This uses the same metrics/ping definitions as desktop.

  • Combined desktop/crashreporter client view: firefox_desktop.desktop_crashes.

  • Fenix view: fenix.crash. This ping has a few different metrics, but is overall very similar to the desktop ping. As a result, it’s a little verbose to combine fenix and desktop pings in a query, however most metrics exist in both with the same name.

NOTE: When querying the source data, you should always use the crash.app_channel, crash.app_display_version, and crash.app_build metrics rather than the similarly named fields of the Glean client_info struct. These values correspond to the application information at the time of the crash, and moreover the crash reporter client can’t fully populate the client_info.

Source

All crash ping metrics are set in bulk at the same time, and typically come directly from crash annotations: * Desktop * Crashreporter client * Fenix

Post-Processing

The crash-ping-ingest repo is scheduled (using taskcluster) to run daily ingestion. It will retrieve crash pings with submissions as recent as the prior UTC day, ensuring that indexed results for the past week are available by default (in case of outages/hiccups/etc). This runs at 2:00 UTC and takes 1-2 hours, so you can expect data to be availalbe for the prior UTC day around 4:00 UTC. It also supplies a taskcluster action to manually generate data for a given date, if necessary.

Data Availability

Data was backfilled to 2024-09-01, so you can expect ping data to be available for any date after then. All nightly and beta pings are processed, while release pings are randomly sampled with about 5000 pings per os/process-type combination.

BigQuery

The ingested output (including symbolicated stacks and crash signatures) is loaded into BigQuery in the moz-fx-data-shared-prod.crash_ping_ingest_external.ingest_output table. It is partitioned on submission_timestamp to match the Glean views/tables, and it can be joined on document_id (and optionally submission_timestamp) with the fenix/desktop views.

What if post-processing has a bug?

If there’s a problem with the post-processed output, the post-processing bug can be fixed and the data can be re-generated by running the ingestion for the day(s) affected. The upload script in crash-ping-ingest will replace the data for the uploading date automatically. To run the ingestion, you must navigate to the taskcluster task group for the commits with the fixes (this is easily found by going to the taskcluster CI page for the commit on GitHub) and run the action task for “Process Pings (Manual)”. There you can choose which dates to run.

Once the data in BigQuery has been fixed, you must also clear the netlify ping-data blobs corresponding to the affected dates. This can be done using the netlify-cli (though you need to authenticate with netlify, of course).

Presentation

The crash-pings repository contains the code for the website hosted on netlify: https://crash-pings.mozilla.org. See the README for details about how it is built and what technologies it uses. It queries BigQuery and caches results, condensing data for efficient loading in the browser.

Adding data to crash pings

  1. Add crash annotations to the definition file and populate the annotations with the generated APIs.

  2. Define corresponding glean metrics to the files listed in Ping Definitions.

  3. Update the code that populates the metrics listed in Source.