JEP-214: Jenkins Telemetry

Table 1. Metadata

JEP	214
Title	Jenkins Telemetry
Sponsor	Daniel Beck
Status	Draft 💬
Type	Standards
Created	2018-08-30
BDFL-Delegate	R Tyler Croy

Table of Contents

Abstract
Specification
Motivation
Reasoning
- Reuse of existing anonymous usage statistics
Backwards Compatibility
Security
Infrastructure Requirements
Testing
Prototype Implementation
References

Abstract

Jenkins is exclusively distributed to users for installation on their own infrastructure. This presents challenges when evolving Jenkins, as we typically only learn about problems after a change has been delivered. This JEP proposes infrastructure for gathering usage telemetry.

Specification

A Telemetry extension point is added to Jenkins (core). It allows implementations in core or plugins ("collectors") to define the following:

ID matching [a-zA-Z0-9_-]+ (i.e. alphanumeric with dash and underscore)
User-friendly (localizable) display name
Time period (start and end date) during which data collection is enabled on the client (end date is mandatory)
A method returning the JSONObject to be submitted to the Jenkins project

The implementation in core has the following characteristics:

Collection and submission is possible only if the instance participates in anonymous usage statistics.
Each collector is run periodically to build its content, and the central extension point will submit it to the Jenkins project infrastructure via HTTPS.
A facility explaining what data is collected, to be shown to administrators.
The endpoint the data is submitted to is not modifiable by extension point implementations.
It submits a correlation ID to the Jenkins project to allow merging and deduplicating submissions from the same Jenkins instance. This correlation ID is independent of instance identity or any other public identifier.

Motivation

We are occasionally in the situation that we need to understand how Jenkins is used or configured, typically to estimate the feasibility or impact of a planned change. As Jenkins is not a hosted SaaS, but distributed for local installation, we have very limited ways to get any information from our users in advance of actually performing the changes. Combined with the complexity of having 1500+ plugins with virtually unrestricted access to internal APIs means the risk of problems due to major changes is rather high. While we have some tooling (mostly related to Java API use across components), that might only be a small subset of what we’d need to know.

Reasoning

Reuse of existing anonymous usage statistics

While there are anonymous usage statistics, the data provided is fairly limited (mostly basic master/agent setup, job types and count, and installed plugins). Additionally, its collection protocol does not allow for it to be easily extended with additional information due to length restrictions. Even worse, the server-side processing code, as of 2018, is complex and very difficult to change.

It is also not designed to support the time limited collection of telemetry that is an important part of this proposal.

Backwards Compatibility

There are no backwards compatibility concerns related to this proposal.

Feedback to the initial proposal indicated a desire to merge this with JEP-308. As each collector is an independent extension implementation only operational within a defined time period, we are able to retire/version the extension point in core aggressively to support future improvements.

Security

Collectors need to be written carefully to not collect user data, such as PII. The server component needs to ensure isolation between different collected information, grant access to the collected data in a fine-grained manner, and enforce the collector start and end dates server-side.

Data submission is done via HTTPS to prevent eavesdropping and man-in-the-middle attacks.

Infrastructure Requirements

A new service needs to be added to the Jenkins project infrastructure.

A URL that accepts POST submissions with a JSON body and stores them. The JSON dictionary has the following top-level items:
- type (string identifying the collector by ID)
- payload (a JSON object)
- correlator (an arbitrary per-instance identifier)
Support for definition of a collection period (start and end date), outside of which submissions are rejected.

Beyond these, the service can remain fairly basic, e.g. writing submission bodies to files, similar to the logging performed in the usage statistics service.

The traffic for each collector depends on several parameters:

The size of data gathered for each collector
The number of instances for each collector

The latter depends on the time span defined for a collector, and the component implementing the collector (popularity and user update behavior). For collectors defined in core, current usage statistics indicate that around 20,000-30,000 installations are on the current LTS line, and around 30,000 installations are on the most recent eight weekly releases.

This indicates a projected upper bound of 600MB of uncompressed data collected per day for a collector defined in core that is active for two months, if we expect a maximum average size of 10KB.

Testing

Automatic tests in Jenkins core need to ensure the constraints defined for this system (administrator control via usage statistics option, collection dates, etc.).

Prototype Implementation

Core PR 3604
"Uplink" service receiving data in Jenkins project infrastructure

References

Initial proposal and request for feedback on jenkinsci-dev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.adoc

README.adoc

JEP-214: Jenkins Telemetry

Abstract

Specification

Motivation

Reasoning

Reuse of existing anonymous usage statistics

Backwards Compatibility

Security

Infrastructure Requirements

Testing

Prototype Implementation

References

Files

README.adoc

Latest commit

History

README.adoc

File metadata and controls

JEP-214: Jenkins Telemetry

Abstract

Specification

Motivation

Reasoning

Reuse of existing anonymous usage statistics

Backwards Compatibility

Security

Infrastructure Requirements

Testing

Prototype Implementation

References