- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG
Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
The operations that a Kubernetes API server supports are reported through a collection of small documents partitioned by group-version. All clients of Kubernetes APIs must send a request to every group-version in order to "discover" the available APIs. This causes a storm of requests for clusters and is a source of latency and throttling. When new types are added to the API, types will need to be fetched again and adds an additional storm of requests. This KEP proposes centralizing the "discovery" mechanism into two aggregated documents so clients do not need to send a storm of requests to the API server to retrieve all the operations available.
All clients and users of Kubernetes APIs usually first need to “discover” what the available APIs are and how they can be used. These APIs are described through a mechanism called “Discovery” which is typically queried to then build the requests to correct APIs. Unfortunately, the “Discovery” API is made of lots of small objects that need to be queried individually, causing possibly a lot of delay due to the latency of each individual request (up to 80 requests, with most objects being less than 1,024 bytes). The more numerous the APIs provided by the Kubernetes cluster, the more requests need to be performed.
The most well known Kubernetes client that uses the discovery
mechanism is kubectl
, and more specifically the
CachedDiscoveryClient
in client-go
. To mitigate some of this
latency, kubectl has implemented a 6 hour timer during which the
discovery API is not refreshed. The drawback of this approach is that
the freshness of the cache is doubtful and the entire discovery API
needs to be refreshed after 6 hours, even if it hasn’t expired. Other
clients such as Openshift UI have slow loading times due to the
browser limit of the amount of parallel requests that can be made.
This primarily concerns clients that need a discovery cache and need to frequently poll the apiserver for the latest discovery information. Clients include kubectl, web interfaces, controllers, etc.
- Fix the discovery storm issue that clients face when first loading the discovery document
- On an update to the discovery document, efficiently allow clients to detect new types for appropriate decisions to be made
- Aggregate the discovery documents for all Kubernetes types
Since the current discovery separated by group-version is already GA, removal of the endpoint will not be attempted. There are still use cases for publishing the discovery document per group-version and this KEP will solely focus on introducing the new aggregated endpoint.
Watchable discovery is also outside the scope of this KEP. Polling with ETag support is sufficient for most users.
We are proposing augmenting the current discovery endpoints at /api
and /apis
with an new content negotiation accept type. This endpoint
will serve an aggregated discovery document that contains the
resources for all group versions. ETag support will be provided so
clients who already have the latest version of the aggregated
discovery can avoid redownloading the document.
We will add a new controller responsible for aggregating the discovery documents when a resource on the cluster changes. There will be no conflicts when aggregating since each discovery document is self-contained.
This is an important design note around selecting the group version for the new discovery types to be apidiscovery/v2beta1
. Link to the full comment
- Discovery is a non-resource API class
- As a non-resource API class, once the feature gate is "on-by-default" the API is required to be stable (only additive features)
- Non-resource APIs that are "off-by-default" do not promise stability
- A non-resource APIs that has to change before promotion to "on-by-default" must represent incompatible changes somehow to clients (if the version is "v1" and then we find a bug, we would have to rev to "v2" before "on-by-default", which means "v1" might not ever be exposed to end users)
- Unversioned net new endpoints (/healthz) are effectively v1 even if they are "off-by-default"
- We don't want to have multiple endpoints for discovery because it's confusing for users and defeats the purpose of making discovery more efficient, and we have a way to do that with negotiation
- We think there is value in a new API type (APIGroupDiscovery) which simplifies client logic, but it comes with a small risk of not being correct
- We have a good idea of what the API looks like due to a previous v1, so we are evolving an existing API and are not "completely flying blind" (i.e. implying this is really an alpha api)
- While we aren't exactly like an unversioned new endpoint (v1 from start), we want to deliver the feature (improves clients) without giving the perception that the API is perfect
The current discovery endpoints /api
and /apis
will accept a new
content negotiation type APIGroupDiscoveryList
, representing an
aggregated discovery document.
Clients requesting the aggregated document will send a request with
as
(kind), v
(version), and g
(group) set as part of the
Accept
header. For example, a client requesting the v2beta1
version will send Accept: application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io
.
Clients should send an accept header with all the acceptable responses
in preferred order. This is to avoid sending additional requests to the same endpoint if the initial preferred version is unavailable. The default accept type will not be changed and
omitting the content negotiation type will default to the unaggregated
APIGroupList
type. Requests should have application/json
or
application/vnd.kubernetes.protobuf
as a fallback option in case the
server does not support the aggregated type (eg: Different version,
feature disabled, etc) For instance, Accept: application/json;as=APIGroupDiscoveryList;v=v1;g=apidiscovery.k8s.io,application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json
will request for the aggregated discovery v2 type, aggregated
discovery v2beta1 type, and unaggregated v1 type in that order. The
server will return the first option that is supported.
Refer to the Version Skew Strategy section for more information on how backwards compatibility is maintained by both the client and server when the types are promoted from v2beta1 to v2.
The contents of this endpoint will be an APIGroupDiscoveryList
,
containing a list of APIGroupDiscovery
, with each group include a
list of versions (APIVersionDiscovery
). Each APIVersionDiscovery
will include a list of APIResourcesForDiscovery
. There are a couple
minor changes for the APIResourceForDiscovery
compared to the
current APIResource
object, but all states expressible with the
current API will be representable in the new API.
The endpoint will also publish an ETag calculated based on a hash of the data for clients.
These types will live in the apidiscovery/v2
group version.
This is what the new API will look like.
// APIGroupDiscoveryList is a resource containing a list of APIGroupDiscovery.
// This is what is returned from the /discovery/v1 endpoint and is used to discover
// the list of API resources (built-ins, Custom Resource Definitions, resources from aggregated servers)
// that a cluster supports.
type APIGroupDiscoveryList struct {
TypeMeta `json:",inline"`
// ResourceVersion will not be set, because this does not have a replayable ordering among multiple apiservers.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// +optional
ListMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// items is the list of groups for discovery.
Items []APIGroupDiscovery `json:"items" protobuf:"bytes,2,rep,name=items"`
}
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// APIGroupDiscovery holds information about which resources are being served for all version of the API Group.
// It contains a list of APIVersionDiscovery that holds a list of APIResourceDiscovery types served for a version.
// Versions are in descending order of preference, with the first version being the preferred entry.
type APIGroupDiscovery struct {
TypeMeta `json:",inline"`
// Standard object's metadata.
// The only field completed will be name. For instance, resourceVersion will be empty.
// name is the name of the API group whose discovery information is presented here.
// name is allowed to be "" to represent the legacy, ungroupified resources.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// +optional
ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// versions are the versions supported in this group. They are sorted in descending order of preference,
// with the preferred version being the first entry.
// +listType=map
// +listMapKey=version
Versions []APIVersionDiscovery `json:"versions,omitempty" protobuf:"bytes,2,rep,name=versions"`
}
// APIVersionDiscovery holds a list of APIResourceDiscovery types that are served for a particular version within an API Group.
type APIVersionDiscovery struct {
// version is the name of the version within a group version.
Version string `json:"version" protobuf:"bytes,1,opt,name=version"`
// resources is a list of APIResourceDiscovery objects for the corresponding group version.
// +listType=map
// +listMapKey=resource
Resources []APIResourceDiscovery `json:"resources,omitempty" protobuf:"bytes,2,rep,name=resources"`
// freshness marks whether a group version's discovery document is up to date.
// "Current" indicates no problems when fetching the discovery document. "Stale" indicates
// that there was an error fetching the discovery document, and the current version may not
// be up to date.
Freshness DiscoveryFreshness `json:"freshness,omitempty" protobuf:"bytes,3,opt,name=freshness"`
}
// APIResourceDiscovery provides information about an API resource for discovery.
type APIResourceDiscovery struct {
// resource is the plural name of the resource. This is used in the URL path and is the unique identifier
// for this resource across all versions in the API group.
// resources with non-"" groups are located at /apis/<APIGroupDiscovery.objectMeta.name>/<APIVersionDiscovery.version>/<APIResourceDiscovery.Resource>
// resource with "" groups are located at /api/v1/<APIResourceDiscovery.Resource>
Resource string `json:"resource" protobuf:"bytes,1,opt,name=resource"`
// responseKind describes the type of serialization that will typically be returned from this endpoint.
// APIs may return other objects types at their discretion, such as error conditions, requests for alternate representations, or other operation specific behavior.
ResponseKind GroupVersionKind `json:"responseKind" protobuf:"bytes,2,opt,name=responseKind"`
// scope indicates the scope of a resource, either Cluster or Namespaced
Scope ResourceScope `json:"scope" protobuf:"bytes,3,opt,name=scope"`
// singularResource is the singular name of the resource. This allows clients to handle plural and singular opaquely.
// For many clients the singular form of the resource will be more understandable to users reading messages and should be used when integrating the name of the resource into a sentence.
// The command line tool kubectl, for example, allows use of the singular resource name in place of plurals.
// The singular form of a resource should always be an optional element - when in doubt use the canonical resource name.
SingularResource string `json:"singularResource" protobuf:"bytes,4,opt,name=singularResource"`
// verbs is a list of supported API operation types (this includes
// but is not limited to get, list, watch, create update, patch,
// delete, deletecollection, and proxy)
Verbs Verbs `json:"verbs" protobuf:"bytes,5,opt,name=verbs"`
// shortNames is a list of suggested short names of the resource.
// +listType=set
ShortNames []string `json:"shortNames,omitempty" protobuf:"bytes,6,rep,name=shortNames"`
// categories is a list of the grouped resources this resource belongs to (e.g. 'all').
// Clients may use this to simplify acting on multiple resource types at once.
// +listType=set
Categories []string `json:"categories,omitempty" protobuf:"bytes,7,rep,name=categories"`
// subresources is a list of subresources provided by this resource. Subresources are located at /apis/<APIGroupDiscovery.objectMeta.name>/<APIVersionDiscovery.version>/<APIResourceDiscovery.Resource>/name-of-instance/<APIResourceDiscovery.subresources[i].subresource>
// +listType=map
// +listMapKey=subresource
Subresources []APISubresourceDiscovery `json:"subresources,omitempty" protobuf:"bytes,8,rep,name=subresources"`
}
// ResourceScope is an enum defining the different scopes available to a resource.
type ResourceScope string
const (
ScopeCluster ResourceScope = "Cluster"
ScopeNamespace ResourceScope = "Namespaced"
)
// DiscoveryFreshness is an enum defining whether the Discovery document published by an apiservice is up to date (fresh).
type DiscoveryFreshness string
const (
DiscoveryFreshnessCurrent DiscoveryFreshness = "Current"
DiscoveryFreshnessStale DiscoveryFreshness = "Stale"
)
// APISubresourceDiscovery provides information about an API subresource for discovery.
type APISubresourceDiscovery struct {
// subresource is the name of the subresource. This is used in the URL path and is the unique identifier
// for this resource across all versions.
Subresource string `json:"subresource" protobuf:"bytes,1,opt,name=subresource"`
// responseKind describes the type of serialization that will be returned from this endpoint.
// Some subresources do not return normal resources, these will have nil return types.
ResponseKind *GroupVersionKind `json:"responseKind,omitempty" protobuf:"bytes,2,opt,name=responseKind"`
// acceptedTypes describes the kinds that this endpoint accepts. It is possible for a subresource to accept multiple kinds.
// It is also possible for an endpoint to accept no standard types. Those will have a zero length list.
// +listType=set
AcceptedTypes []GroupVersionKind `json:"acceptedTypes,omitempty" protobuf:"bytes,3,rep,name=acceptedTypes"`
// verbs is a list of supported kube verbs: get, list, watch, create,
// update, patch, delete
Verbs Verbs `json:"verbs" protobuf:"bytes,4,opt,name=verbs"`
}
For the aggregation layer on the server, a new controller will be created to aggregate discovery for built-in types, apiextensions types (CRDs), and types from aggregated api servers.
A post start hook will be added and the kube-apiserver health check
should only pass if the discovery document is ready. Since aggregated
api servers may take longer to respond and we do not want to delay
cluster startup, the health check will only block on the local api
servers (built-ins and CRDs) to have their discovery ready. For api
servers that have not been aggregated, their group-versions will be
published with an empty resource list and a Stale
for
Freshness
to indicate that they have not synced yet.
The client-go
interface will be modified to add a new method to
retrieve the aggregated discovery document and kubectl
will be the
initial candidate. As a starting point, kubectl api-resources
should
use the aggregated discovery document instead of sending a storm of
requests.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
- k8s.io/apiserver/pkg/endpoints/discovery/aggregated: 77.4
- Note that the
fake.go
file has no unit test coverage as it is a utility designed to be used by integration tests. The rest of the files in the package have 90+ coverage.
- Note that the
- k8s.io/kube-aggregator/pkg/apiserver/handler_discovery.go: 82.2
- k8s.io/client-go/discovery/aggregated_discovery.go: 96.8
Integration tests
e2e tests
- Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
- At least one client (kubectl) has an implementation to use the aggregated discovery feature
We want all clients to benefit from this feature, but for alpha our main focus will be on kubectl and golang clients.
- kubectl uses the aggregated discovery feature by default
- Metrics are added
- Existing bugs are fixed:
- New API type
apidiscovery.k8s.io/v2
is introduced - e2e and conformance tests
Note: Generally we also wait at least two releases between beta and GA/stable, because there's no opportunity for user feedback, or even bug reports, in back-to-back releases.
For non-optional features moving to GA, the graduation criteria must include conformance tests.
Once Aggregated Discovery v2 types are GA, v2beta1 types will be deprecated and removed after 3 releases.
Aggregated discovery will be behind a feature gate. It is an in-memory feature and upgrade/downgrade is not a problem.
When moving from beta to GA, we will introduce a new API group version apidiscovery.k8s.io/v2
.
All clients v1.26 to v1.29 will only request for the beta API group version apidiscovery.k8s.io/v2beta1
.
To accommodate skew between the client and server (older client and newer server), the server will serve both v2 and v2beta1 versions based on the client request headers. The API server will continue to support v2beta1 until its removal in Kubernetes v1.33.
To accommodate skew between an older server and newer client, starting in v1.30,
client-go will request for both v2 and v2beta1 by sending a list of group versions
requested (in order from v2, v2beta1, unaggregated) and the server will return the
first group version that matches. Concretely, this is done using Accept
headers with a single request.
Accept: application/json;as=APIGroupDiscoveryList;v=v2;g=apidiscovery.k8s.io,application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json
In the case of older servers, the server will only be able to match v2beta1. The client will support both v2 and v2beta1. This allows a newer client to communicate with an older server that supports only the beta version. Other clients should follow the same convention to support version skew, though a client that is only capable of processing v2 is sufficient if it only communicates with v1.30+ servers. Otherwise, the client will need to be ready to tolerate a 406 Not Acceptable response and handle the error appropriately.
If there is no skew and both server and client are v1.30+, clients will still request for v2 and v2beta1, and the server will match the first group version and return v2.
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name: AggregatedDiscovery
- Components depending on the feature gate: kube-apiserver
Clients using client-go version 1.26 and up will use the aggregated discovery endpoint rather than the unaggregated discovery endpoint. This is handled automatically in client-go and clients should see less requests to the api server when fetching discovery information. Client versions older than 1.26 will continue to use the old unaggregated discovery endpoint without any changes.
Yes, the feature may be disabled on the apiserver by reverting the feature flag. This will disable aggregated discovery for all clients. If there is a golang specific client side bug, the feature may also be turned off in client-go via the WithLegacy() toggle and this will require a recompile of the application.
The feature does not depend on state, and can be disabled/enabled at will.
A test will be added to ensure that the RESTMapper functionality works properly both when the feature is enabled and disabled.
During a rollout, some apiservers may support aggregated discovery and
some may not. It is recommended that clients request for both the
aggregated discovery document with a fallback to the unaggregated
discovery format. This can be achieved by setting the Accept header to
have a fallback to the default GVK of the /apis
and /api
endpoint.
For example, to request the aggregated discovery type and fallback to
the unaggregated discovery, the following header can be sent: Accept: application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json
This kind of fallback is already implemented in client-go and this note is intended for non-golang clients.
High latency or failure of a metric in the newly added discovery
aggregation controller. If the /api
and /apis
endpoint returns an
error or is unreachable with the APIGroupDiscoveryList
accept type,
that could be a sign of rollback.
n/a. The API introduced does not store data and state is recalculated on the upgrade, downgrade, upgrade cycle. No state is preserved between versions.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
By enabling aggregated discovery as the default, the new API is slightly different from the unaggregated version. The StorageVersionHash field is removed from resources in the aggregated discovery API. The storage version migrator will have an additional flag when initializing the discovery client to continue using the unaggregated API.
Operators can check whether an aggregated discovery request can be
made by sending a request to apis
with
application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json
as the Accept header and looking at the the Content-Type
response
header. A Content Type response header of Content-Type: application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList
indicates that aggregated discovery is supported and a Content-Type: application/json
header indicates that aggregated discovery is not
supported. They can also check for the presence of aggregated
discovery related metrics: aggregated_discovery_aggregation_count
/api
and /apis
endpoints are populated with discovery information
when the aggregated content negotiation type accept header is passed,
and all expected group-versions are present.
Aggregated Discovery falls under a non-streaming read-only API call which is defined under the Kubernetes API call latency SLI/SLO. The number in the SLO are a good bound for Aggregated Discovery (p99 < 1s).
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
-
Metric name:
aggregator_discovery_aggregation_duration
-
Components exposing the metric:
kube-server
-
This is a metric for exposing the time it took to aggregate all the api resources.
-
Metric name:
aggregator_discovery_aggregation_count
-
Components exposing the metric:
kube-server
-
This is a metric for the number of times that the discovery document has been aggregated.
-
Are there any missing metrics that would be useful to have to improve observability of this feature?
No.
No, but if aggregated apiservers are present, the feature will attempt to contact and aggregate the data published from the aggregated apiserver on a set interval. If there is high error rate, stale data may be returned because the latest data was not able to be fetched from the aggregated apiserver.
No. Enabling this feature should reduce the total number of API calls
for client discovery. Instead of clients sending a discovery request
to all group versions (/apis/<group>/<version>
), they will only need
to send a request to the aggregated endpoint to obtain all resources
that the cluster supports.
Yes, but these API types are not persisted.
No.
No.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
The feature is built into the API server, and will not work if the API server is unavailable.
- Aggregated API Server is unavailable:
- Detection: An Aggregated API Server that is unavailable will return Stale as the DiscoveryFreshness. A prolonged period of staleness could indicate that the aggregated apiserver is unavailable.
- Mitigations: If the aggregated apiserver is not reacheable, it will not be part of the resources available. Restarting the pod or checking for any misconfigurations could be a valid next step.
- Diagnostics: Missing the (v3) log line:
DiscoveryManager: successfully downloaded discovery/legacy discovery for <apiservice>
- Testing: We test for unreacheable aggregated apiservers returning Stale, but an aggregated apiserver could be unavailable for a wide variety of reasons that would require further diagnosis.
The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false.
- v1.26: Aggregated Discovery KEP is merged and moves to alpha
- v1.27: Aggregated Discovery moves to beta
- v1.30: Aggregated Discovery moves to stable
With aggregation, the size of the aggregated discovery document could be an issue in the future since clients will need to download the entire document on any resource update. At the moment, even with 3000 CRDs (already very unlikely), the total size is still smaller than 1MB.
An alternative that was considered is Discovery Cache Busting. Cache-busting allows clients to know if the files need to be downloaded at all, and in most cases the download can be skipped entirely. This typically works by including a hash of the resource content in its name, while marking the objects as never-expiring using cache control headers. Clients can then recognize if the names have changed or stayed the same, and re-use files that have kept the same name without downloading them again.
Aggregated Discovery was selected because of the amount of requests that are saved both on startup and on changes involving multiple group versions. For a full comparison between Discovery Cache Busting and Aggregated Discovery, please refer to the Google Doc.
An additional alternative that we considered is watchable discovery. After diving into the use cases, polling with ETag support is sufficient for most clients and adding support for watch drastically changes the scope of this proposal.
Finally, another alternative that was explored was creating a new URL
endpoint /discovery/<version>
. The additional of a new URL endpoint
per serialization version creates burden for clients as the API
evolves, as they may need to check multiple endpoints to determine the
state of the feature.