diff --git a/pkg/scheduler/README.md b/pkg/scheduler/README.md index 08543f47532..a40828a3ee8 100644 --- a/pkg/scheduler/README.md +++ b/pkg/scheduler/README.md @@ -1,147 +1,72 @@ # Knative Eventing Multi-Tenant Scheduler with High-Availability -An eventing source instance (for example, [KafkaSource](https://github.com/knative-extensions/eventing-kafka/tree/main/pkg/source), [RedisStreamSource](https://github.com/knative-extensions/eventing-redis/tree/main/source), etc) gets materialized as a virtual pod (**vpod**) and can be scaled up and down by increasing or decreasing the number of virtual pod replicas (**vreplicas**). A vreplica corresponds to a resource in the source that can replicated for maximum distributed processing (for example, number of consumers running in a consumer group). +An eventing source instance (for example, KafkaSource, etc) gets materialized as a virtual pod (* +*vpod**) and can be scaled up and down by increasing or decreasing the number of virtual pod +replicas (**vreplicas**). A vreplica corresponds to a resource in the source that can replicated for +maximum distributed processing (for example, number of consumers running in a consumer group). -The vpod multi-tenant [scheduler](#1scheduler) is responsible for placing vreplicas onto real Kubernetes pods. Each pod is limited in capacity and can hold a maximum number of vreplicas. The scheduler takes a list of (source, # of vreplicas) tuples and computes a set of Placements. Placement info are added to the source status. +The vpod multi-tenant [scheduler](#scheduler) is responsible for placing vreplicas onto real +Kubernetes pods. Each pod is limited in capacity and can hold a maximum number of vreplicas. The +scheduler takes a list of (source, # of vreplicas) tuples and computes a set of Placements. +Placement info are added to the source status. -Scheduling strategies rely on pods having a sticky identity (StatefulSet replicas) and the current [State](#4state-collector) of the cluster. - -When a vreplica cannot be scheduled it is added to the list of pending vreplicas. The [Autoscaler](#3autoscaler) monitors this list and allocates more pods for placing it. - -To support high-availability the scheduler distributes vreplicas uniformly across failure domains such as zones/nodes/pods containing replicas from a StatefulSet. - -## General Scheduler Requirements - -1. High Availability: Vreplicas for a source must be evenly spread across domains to reduce impact of failure when a zone/node/pod goes unavailable for scheduling.* - -2. Equal event consumption: Vreplicas for a source must be evenly spread across adapter pods to provide an equal rate of processing events. For example, Kafka broker spreads partitions equally across pods so if vreplicas aren’t equally spread, pods with fewer vreplicas will consume events slower than others. - -3. Pod spread not more than available resources: Vreplicas for a source must be evenly spread across pods such that the total number of pods with placements does not exceed the number of resources available from the source (for example, number of Kafka partitions for the topic it's consuming from). Else, the additional pods have no resources (Kafka partitions) to consume events from and could waste Kubernetes resources. - -* Note: StatefulSet anti-affinity rules guarantee new pods to be scheduled on a new zone and node. +Scheduling strategies rely on pods having a sticky identity (StatefulSet replicas) and the +current [State](#state-collector) of the cluster. ## Components: -### 1.Scheduler -The scheduling framework has a pluggable architecture where plugins are registered and compiled into the scheduler. It allows many scheduling features to be implemented as plugins, while keeping the scheduling "core" simple and maintainable. - -Scheduling happens in a series of stages: - - 1. **Filter**: These plugins (predicates) are used to filter out pods where a vreplica cannot be placed. If any filter plugin marks the pod as infeasible, the remaining plugins will not be called for that pod. A vreplica is marked as unschedulable if no pods pass all the filters. - - 2. **Score**: These plugins (priorities) provide a score to each pod that has passed the filtering phase. Scheduler will then select the pod with the highest weighted scores sum. - -Scheduler must be Knative generic with its core functionality implemented as core plugins. Anything specific to an eventing source will be implemented as separate plugins (for example, number of Kafka partitions) - -It allocates one vreplica at a time by filtering and scoring schedulable pods. - -A vreplica can be unschedulable for several reasons such as pods not having enough capacity, constraints cannot be fulfilled, etc. - -### 2.Descheduler - -Similar to scheduler but has its own set of priorities (no predicates today). - -### 3.Autoscaler - -The autoscaler scales up pod replicas of the statefulset adapter when there are vreplicas pending to be scheduled, and scales down if there are unused pods. It takes into consideration a scaling factor that is based on number of domains for HA. - -### 4.State Collector - -Current state information about the cluster is collected after placing each vreplica and during intervals. Cluster information include computing the free capacity for each pod, list of schedulable pods (unschedulable pods are pods that are marked for eviction for compacting, and pods that are on unschedulable nodes (cordoned or unreachable nodes), number of pods (stateful set replicas), number of available nodes, number of zones, a node to zone map, total number of vreplicas in each pod for each vpod (spread), total number of vreplicas in each node for each vpod (spread), total number of vreplicas in each zone for each vpod (spread), etc. - -### 5.Reservation - -Scheduler also tracks vreplicas that have been placed (ie. scheduled) but haven't been committed yet to its vpod status. These reserved veplicas are taken into consideration when computing cluster's state for scheduling the next vreplica. - -### 6.Evictor - -Autoscaler periodically attempts to compact veplicas into a smaller number of free replicas with lower ordinals. Vreplicas placed on higher ordinal pods are evicted and rescheduled to pods with a lower ordinal using the same scheduling strategies. - -## Scheduler Profile - -### Predicates: - -1. **PodFitsResources**: check if a pod has enough capacity [CORE] - -2. **NoMaxResourceCount**: check if total number of placement pods exceed available resources [KAFKA]. It has an argument `NumPartitions` to configure the plugin with the total number of Kafka partitions. - -3. **EvenPodSpread**: check if resources are evenly spread across pods [CORE]. It has an argument `MaxSkew` to configure the plugin with an allowed skew factor. +### Scheduler -### Priorities: +The scheduler allocates as many as vreplicas as possible into the lowest possible StatefulSet +ordinal +number before triggering the autoscaler when no more capacity is left to schedule vpods. -1. **AvailabilityNodePriority**: make sure resources are evenly spread across nodes [CORE]. It has an argument `MaxSkew` to configure the plugin with an allowed skew factor. +### Autoscaler -2. **AvailabilityZonePriority**: make sure resources are evenly spread across zones [CORE]. It has an argument `MaxSkew` to configure the plugin with an allowed skew factor. +The autoscaler scales up pod replicas of the statefulset adapter when there are vreplicas pending to +be scheduled, and scales down if there are unused pods. -3. **LowestOrdinalPriority**: make sure vreplicas are placed on free smaller ordinal pods to minimize resource usage [CORE] +### State Collector -**Example ConfigMap for config-scheduler:** +Current state information about the cluster is collected after placing each vreplica and during +intervals. Cluster information include computing the free capacity for each pod, list of schedulable +pods (unschedulable pods are pods that are marked for eviction for compacting, number of pods ( +stateful set replicas), total number of vreplicas in each pod for each vpod (spread). -``` -data: - predicates: |+ - [ - {"Name": "PodFitsResources"}, - {"Name": "NoMaxResourceCount", - "Args": "{\"NumPartitions\": 100}"}, - {"Name": "EvenPodSpread", - "Args": "{\"MaxSkew\": 2}"} - ] - priorities: |+ - [ - {"Name": "AvailabilityZonePriority", - "Weight": 10, - "Args": "{\"MaxSkew\": 2}"}, - {"Name": "LowestOrdinalPriority", - "Weight": 2} - ] -``` +### Evictor -## Descheduler Profile: - -### Priorities: - -1. **RemoveWithAvailabilityNodePriority**: make sure resources are evenly spread across nodes [CORE] - -2. **RemoveWithAvailabilityZonePriority**: make sure resources are evenly spread across zones [CORE] - -3. **HighestOrdinalPriority**: make sure vreps are removed from higher ordinal pods to minimize resource usage [CORE] - -**Example ConfigMap for config-descheduler:** - -``` -data: - priorities: |+ - [ - {"Name": "RemoveWithEvenPodSpreadPriority", - "Weight": 10, - "Args": "{\"MaxSkew\": 2}"}, - {"Name": "RemoveWithAvailabilityZonePriority", - "Weight": 10, - "Args": "{\"MaxSkew\": 2}"}, - {"Name": "RemoveWithHighestOrdinalPriority", - "Weight": 2} - ] -``` +Autoscaler periodically attempts to compact veplicas into a smaller number of free replicas with +lower ordinals. Vreplicas placed on higher ordinal pods are evicted and rescheduled to pods with a +lower ordinal using the same scheduling strategies. ## Normal Operation 1. **Busy scheduler**: -Scheduler can be very busy allocating the best placements for multiple eventing sources at a time using the scheduler predicates and priorities configured. During this time, the cluster could see statefulset replicas increasing, as the autoscaler computes how many more pods are needed to complete scheduling successfully. Also, the replicas could be decreasing during idle time, either caused by less events flowing through the system, or the evictor compacting vreplicas placements into a smaller number of pods or the deletion of event sources. The current placements are stored in the eventing source's status field for observability. +Scheduler can be very busy allocating the best placements for multiple eventing sources at a time +using the scheduler predicates and priorities configured. During this time, the cluster could see +statefulset replicas increasing, as the autoscaler computes how many more pods are needed to +complete scheduling successfully. Also, the replicas could be decreasing during idle time, either +caused by less events flowing through the system, or the evictor compacting vreplicas placements +into a smaller number of pods or the deletion of event sources. The current placements are stored in +the eventing source's status field for observability. 2. **Software upgrades**: -We can expect periodic software version upgrades or fixes to be performed on the Kubernetes cluster running the scheduler or on the Knative framework installed. Either of these scenarios could involve graceful rebooting of nodes and/or reapplying of controllers, adapters and other resources. - -All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. -(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.) +We can expect periodic software version upgrades or fixes to be performed on the Kubernetes cluster +running the scheduler or on the Knative framework installed. Either of these scenarios could involve +graceful rebooting of nodes and/or reapplying of controllers, adapters and other resources. -TODO: Measure latencies in events processing using a performance tool (KPerf eventing). +All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica +scheduler. +(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member +changes.) 3. **No more cluster resources**: -When there are no resources available on existing nodes in the cluster to schedule more pods and the autoscaler continues to scale up replicas, the new pods are left in a Pending state till cluster size is increased. Nothing to do for the scheduler until then. +When there are no resources available on existing nodes in the cluster to schedule more pods and the +autoscaler continues to scale up replicas, the new pods are left in a Pending state till cluster +size is increased. Nothing to do for the scheduler until then. ## Disaster Recovery @@ -149,91 +74,14 @@ Some failure scenarios are described below: 1. **Pod failure**: -When a pod/replica in a StatefulSet goes down due to some reason (but its node and zone are healthy), a new replica is spun up by the StatefulSet with the same pod identity (pod can come up on a different node) almost immediately. - -All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. -(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.) - -TODO: Measure latencies in events processing using a performance tool (KPerf eventing). - -2. **Node failure (graceful)**: - -When a node is rebooted for upgrades etc, running pods on the node will be evicted (drained), gracefully terminated and rescheduled on a different node. The drained node will be marked as unschedulable by K8 (`node.Spec.Unschedulable` = True) after its cordoning. - -``` -k describe node knative-worker4 -Name: knative-worker4 -CreationTimestamp: Mon, 30 Aug 2021 11:13:11 -0400 -Taints: none -Unschedulable: true -``` - -All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. -(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.) - -TODO: Measure latencies in events processing using a performance tool (KPerf eventing). +When a pod/replica in a StatefulSet goes down due to some reason (but its node and zone are +healthy), a new replica is spun up by the StatefulSet with the same pod identity (pod can come up on +a different node) almost immediately. -New vreplicas will not be scheduled on pods running on this cordoned node. - -3. **Node failure (abrupt)**: - -When a node goes down unexpectedly due to some physical machine failure (network isolation/ loss, CPU issue, power loss, etc), the node controller does the following few steps - -Pods running on the failed node receives a NodeNotReady Warning event - -``` -k describe pod kafkasource-mt-adapter-5 -n knative-eventing -Name: kafkasource-mt-adapter-5 -Namespace: knative-eventing -Priority: 0 -Node: knative-worker4/172.18.0.3 -Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s - -Events: - Type Reason Age From Message - ---- ------ ---- ---- ------- - Normal Scheduled 11m default-scheduler Successfully assigned knative-eventing/kafkasource-mt-adapter-5 to knative-worker4 - Normal Pulled 11m kubelet Container image - Normal Created 11m kubelet Created container receive-adapter - Normal Started 11m kubelet Started container receive-adapter - Warning NodeNotReady 3m48s node-controller Node is not ready -``` - -Failing node is tainted with the following Key:Condition: by the node controller if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds) - -``` -k describe node knative-worker4 -Name: knative-worker4 -Taints: node.kubernetes.io/unreachable:NoExecute - node.kubernetes.io/unreachable:NoSchedule -Unschedulable: false - Events: - Type Reason Age From Message - ---- ------ ---- ---- ------- - Normal NodeNotSchedulable 5m42s kubelet Node knative-worker4 status is now: NodeNotSchedulable - Normal NodeSchedulable 2m31s kubelet Node knative-worker4 status is now: NodeSchedulable -``` - -``` -k get nodes -NAME STATUS ROLES AGE VERSION -knative-control-plane Ready control-plane,master 7h23m v1.21.1 -knative-worker Ready 7h23m v1.21.1 -knative-worker2 Ready 7h23m v1.21.1 -knative-worker3 Ready 7h23m v1.21.1 -knative-worker4 NotReady 7h23m v1.21.1 -``` - -After a timeout period (`pod-eviction-timeout` == 5 mins (default)), the pods move to the Terminating state. - -Since statefulset now has a `terminationGracePeriodSeconds: 0` setting, the terminating pods are immediately restarted on another functioning Node. A new replica is spun up with the same ordinal. - -During the time period of the failing node being unreachable (~5mins), vreplicas placed on that pod aren’t available to process work from the eventing source. (Theory) Consumption rate goes down and Kafka eventually triggers rebalancing of partitions. Also, KEDA will scale up the number of consumers to resolve the processing lag. A scale up will cause the Eventing scheduler to rebalance the total vreplicas for that source on available running pods. - -4. **Zone failure**: - -All nodes running in the failing zone will be unavailable for scheduling. Nodes will either be tainted with `unreachable` or Spec’ed as `Unschedulable` -See node failure scenarios above for what happens to vreplica placements. +All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica +scheduler. +(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member +changes.) ## References: @@ -246,7 +94,6 @@ See node failure scenarios above for what happens to vreplica placements. * https://medium.com/tailwinds-navigator/kubernetes-tip-how-statefulsets-behave-differently-than-deployments-when-node-fails-d29e36bca7d5 * https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller - --- To learn more about Knative, please visit the diff --git a/pkg/scheduler/doc.go b/pkg/scheduler/doc.go index b66262a4be9..13cf683a174 100644 --- a/pkg/scheduler/doc.go +++ b/pkg/scheduler/doc.go @@ -14,5 +14,5 @@ See the License for the specific language governing permissions and limitations under the License. */ -// The scheduler is responsible for placing virtual pod (VPod) replicas within real pods. +// Package scheduler is responsible for placing virtual pod (VPod) replicas within real pods. package scheduler diff --git a/pkg/scheduler/factory/registry.go b/pkg/scheduler/factory/registry.go deleted file mode 100644 index dbc814055c6..00000000000 --- a/pkg/scheduler/factory/registry.go +++ /dev/null @@ -1,88 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package factory - -import ( - "fmt" - - state "knative.dev/eventing/pkg/scheduler/state" -) - -// RegistryFP is a collection of all available filter plugins. -type RegistryFP map[string]state.FilterPlugin - -// RegistrySP is a collection of all available scoring plugins. -type RegistrySP map[string]state.ScorePlugin - -var ( - FilterRegistry = make(RegistryFP) - ScoreRegistry = make(RegistrySP) -) - -// Register adds a new plugin to the registry. If a plugin with the same name -// exists, it returns an error. -func RegisterFP(name string, factory state.FilterPlugin) error { - if _, ok := FilterRegistry[name]; ok { - return fmt.Errorf("a filter plugin named %v already exists", name) - } - FilterRegistry[name] = factory - return nil -} - -// Unregister removes an existing plugin from the registry. If no plugin with -// the provided name exists, it returns an error. -func UnregisterFP(name string) error { - if _, ok := FilterRegistry[name]; !ok { - return fmt.Errorf("no filter plugin named %v exists", name) - } - delete(FilterRegistry, name) - return nil -} - -func GetFilterPlugin(name string) (state.FilterPlugin, error) { - if f, exist := FilterRegistry[name]; exist { - return f, nil - } - return nil, fmt.Errorf("no fitler plugin named %v exists", name) -} - -// Register adds a new plugin to the registry. If a plugin with the same name -// exists, it returns an error. -func RegisterSP(name string, factory state.ScorePlugin) error { - if _, ok := ScoreRegistry[name]; ok { - return fmt.Errorf("a score plugin named %v already exists", name) - } - ScoreRegistry[name] = factory - return nil -} - -// Unregister removes an existing plugin from the registry. If no plugin with -// the provided name exists, it returns an error. -func UnregisterSP(name string) error { - if _, ok := ScoreRegistry[name]; !ok { - return fmt.Errorf("no score plugin named %v exists", name) - } - delete(ScoreRegistry, name) - return nil -} - -func GetScorePlugin(name string) (state.ScorePlugin, error) { - if f, exist := ScoreRegistry[name]; exist { - return f, nil - } - return nil, fmt.Errorf("no score plugin named %v exists", name) -} diff --git a/pkg/scheduler/placement.go b/pkg/scheduler/placement.go index 36250323541..65ab7897f0e 100644 --- a/pkg/scheduler/placement.go +++ b/pkg/scheduler/placement.go @@ -17,7 +17,6 @@ limitations under the License. package scheduler import ( - "k8s.io/apimachinery/pkg/util/sets" duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" ) @@ -29,24 +28,3 @@ func GetTotalVReplicas(placements []duckv1alpha1.Placement) int32 { } return r } - -// GetPlacementForPod returns the placement corresponding to podName -func GetPlacementForPod(placements []duckv1alpha1.Placement, podName string) *duckv1alpha1.Placement { - for i := 0; i < len(placements); i++ { - if placements[i].PodName == podName { - return &placements[i] - } - } - return nil -} - -// GetPodCount returns the number of pods with the given placements -func GetPodCount(placements []duckv1alpha1.Placement) int { - set := sets.NewString() - for _, p := range placements { - if p.VReplicas > 0 { - set.Insert(p.PodName) - } - } - return set.Len() -} diff --git a/pkg/scheduler/placement_test.go b/pkg/scheduler/placement_test.go index 66314dfccf2..ae1705254dd 100644 --- a/pkg/scheduler/placement_test.go +++ b/pkg/scheduler/placement_test.go @@ -62,95 +62,3 @@ func TestGetTotalVReplicas(t *testing.T) { }) } } - -func TestGetPlacementForPod(t *testing.T) { - ps1 := []duckv1alpha1.Placement{{PodName: "p", VReplicas: 2}} - ps2 := []duckv1alpha1.Placement{{PodName: "p", VReplicas: 2}, {PodName: "p2", VReplicas: 4}} - testCases := []struct { - name string - podName string - placements []duckv1alpha1.Placement - expected *duckv1alpha1.Placement - }{ - { - name: "nil placements", - podName: "p", - placements: nil, - expected: nil, - }, - { - name: "empty placements", - podName: "p", - placements: []duckv1alpha1.Placement{}, - expected: nil, - }, - { - name: "one placement", - placements: ps1, - podName: "p", - expected: &ps1[0], - }, { - name: "mayne placements", - placements: ps2, - podName: "p2", - expected: &ps2[1], - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - got := GetPlacementForPod(tc.placements, tc.podName) - if got != tc.expected { - t.Errorf("got %v, want %v", got, tc.expected) - } - }) - } -} -func TestPodCount(t *testing.T) { - testCases := []struct { - name string - placements []duckv1alpha1.Placement - expected int - }{ - { - name: "nil placements", - placements: nil, - expected: 0, - }, - { - name: "empty placements", - placements: []duckv1alpha1.Placement{}, - expected: 0, - }, - { - name: "one pod", - placements: []duckv1alpha1.Placement{{PodName: "d", VReplicas: 2}}, - expected: 1, - }, - { - name: "two pods", - placements: []duckv1alpha1.Placement{ - {PodName: "p1", VReplicas: 2}, - {PodName: "p2", VReplicas: 6}, - {PodName: "p1", VReplicas: 6}}, - expected: 2, - }, - { - name: "three pods, one with no vreplicas", - placements: []duckv1alpha1.Placement{ - {PodName: "p1", VReplicas: 2}, - {PodName: "p2", VReplicas: 6}, - {PodName: "p1", VReplicas: 0}}, - expected: 2, - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - got := GetPodCount(tc.placements) - if got != tc.expected { - t.Errorf("got %v, want %v", got, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/availabilitynodepriority/availability_node_priority.go b/pkg/scheduler/plugins/core/availabilitynodepriority/availability_node_priority.go deleted file mode 100644 index e0e60c8832f..00000000000 --- a/pkg/scheduler/plugins/core/availabilitynodepriority/availability_node_priority.go +++ /dev/null @@ -1,111 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package availabilitynodepriority - -import ( - "context" - "encoding/json" - "math" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// AvailabilityNodePriority is a score plugin that favors pods that create an even spread of resources across nodes for HA -type AvailabilityNodePriority struct { -} - -// Verify AvailabilityNodePriority Implements ScorePlugin Interface -var _ state.ScorePlugin = &AvailabilityNodePriority{} - -// Name of the plugin -const Name = state.AvailabilityNodePriority - -const ( - ErrReasonInvalidArg = "invalid arguments" - ErrReasonNoResource = "node does not exist" -) - -func init() { - factory.RegisterSP(Name, &AvailabilityNodePriority{}) -} - -// Name returns name of the plugin -func (pl *AvailabilityNodePriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for nodes that create an even spread across nodes. -func (pl *AvailabilityNodePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - logger := logging.FromContext(ctx).With("Score", pl.Name()) - var score uint64 = 0 - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name()) - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.AvailabilityNodePriorityArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - var skew int32 - - _, nodeName, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID)) - if err != nil { - return score, state.NewStatus(state.Error, ErrReasonNoResource) - } - - currentReps := states.NodeSpread[key][nodeName] //get #vreps on this node - for otherNodeName := range states.NodeToZoneMap { //compare with #vreps on other nodes - if otherNodeName != nodeName { - otherReps := states.NodeSpread[key][otherNodeName] - if skew = (currentReps + 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Node %v with %d and Other Node %v with %d causing skew %d", nodeName, currentReps, otherNodeName, otherReps, skew) - if skew > skewVal.MaxSkew { - logger.Infof("Pod %d in node %v will cause an uneven node spread %v with other node %v", podID, nodeName, states.NodeSpread[key], otherNodeName) - } - score = score + uint64(skew) - } - } - - score = math.MaxUint64 - score //lesser skews get higher score - } - - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *AvailabilityNodePriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *AvailabilityNodePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/availabilitynodepriority/availability_node_priority_test.go b/pkg/scheduler/plugins/core/availabilitynodepriority/availability_node_priority_test.go deleted file mode 100644 index fb822b7947a..00000000000 --- a/pkg/scheduler/plugins/core/availabilitynodepriority/availability_node_priority_test.go +++ /dev/null @@ -1,257 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package availabilitynodepriority - -import ( - "fmt" - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - v1 "k8s.io/api/core/v1" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/runtime" - "k8s.io/apimachinery/pkg/types" - listers "knative.dev/eventing/pkg/reconciler/testing/v1" - "knative.dev/eventing/pkg/scheduler" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" - kubeclient "knative.dev/pkg/client/injection/kube/client/fake" -) - -const ( - testNs = "test-ns" - sfsName = "statefulset-name" - vpodName = "source-name" - vpodNamespace = "source-namespace" - numNodes = 3 -) - -func TestScore(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - replicas int32 - podID int32 - expected *state.Status - expScore uint64 - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - NodeSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - expScore: 0, - args: "{\"MaxSkewness\": 2}", - }, - { - name: "no vpods, no pods, no resource", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 1, - expected: state.NewStatus(state.Error, ErrReasonNoResource), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, one zone, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 18, - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, one zone, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: { - "node1": 4, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 18, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, two zones, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 2, NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - "node1": 5, - "node2": 3, - }, - }}, - replicas: 2, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 10, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, three zones, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 3, NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - "node1": 4, - "node2": 3, - }, - }}, - replicas: 3, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 7, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 5, NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 8, - "node1": 4, - "node2": 3, - }, - }}, - replicas: 5, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 20, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, four pods, unknown zone", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{ - StatefulSetName: sfsName, - Replicas: 4, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 8, - "node1": 4, - "node2": 3, - }, - }, - SchedulablePods: []int32{0, 1, 2}, //Pod 3 in unknown zone not included in schedulable pods - NumNodes: 4, //Includes unknown zone - }, - replicas: 4, - podID: 3, - expected: state.NewStatus(state.Success), //Not failing the plugin - expScore: math.MaxUint64 - 12, - args: "{\"MaxSkew\": 2}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &AvailabilityNodePriority{} - - name := plugin.Name() - assert.Equal(t, name, state.AvailabilityNodePriority) - - nodelist := make([]*v1.Node, 0) - podlist := make([]runtime.Object, 0) - - for i := int32(0); i < numNodes; i++ { - nodeName := "node" + fmt.Sprint(i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - nodeName := "node" + fmt.Sprint(numNodes) //Node in unknown zone - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNodeNoLabel(nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - - for i := int32(0); i < tc.replicas; i++ { - nodeName := "node" + fmt.Sprint(i) - podName := sfsName + "-" + fmt.Sprint(i) - pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - podlist = append(podlist, pod) - } - - nodeToZoneMap := make(map[string]string) - for i := 0; i < len(nodelist); i++ { - node := nodelist[i] - zoneName, ok := node.GetLabels()[scheduler.ZoneLabel] - if ok && zoneName != "" { - nodeToZoneMap[node.Name] = zoneName - } else { - nodeToZoneMap[node.Name] = scheduler.UnknownZone - } - } - - lsp := listers.NewListers(podlist) - tc.state.PodLister = lsp.GetPodLister().Pods(testNs) - tc.state.NodeToZoneMap = nodeToZoneMap - - score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/availabilityzonepriority/availability_zone_priority.go b/pkg/scheduler/plugins/core/availabilityzonepriority/availability_zone_priority.go deleted file mode 100644 index 397ff075fbc..00000000000 --- a/pkg/scheduler/plugins/core/availabilityzonepriority/availability_zone_priority.go +++ /dev/null @@ -1,115 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package availabilityzonepriority - -import ( - "context" - "encoding/json" - "math" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// AvailabilityZonePriority is a score plugin that favors pods that create an even spread of resources across zones for HA -type AvailabilityZonePriority struct { -} - -// Verify AvailabilityZonePriority Implements ScorePlugin Interface -var _ state.ScorePlugin = &AvailabilityZonePriority{} - -// Name of the plugin -const Name = state.AvailabilityZonePriority - -const ( - ErrReasonInvalidArg = "invalid arguments" - ErrReasonNoResource = "zone does not exist" -) - -func init() { - factory.RegisterSP(Name, &AvailabilityZonePriority{}) -} - -// Name returns name of the plugin -func (pl *AvailabilityZonePriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for zones that create an even spread across zones. -func (pl *AvailabilityZonePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - logger := logging.FromContext(ctx).With("Score", pl.Name()) - var score uint64 = 0 - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name()) - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.AvailabilityZonePriorityArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - var skew int32 - zoneMap := make(map[string]struct{}) - for _, zoneName := range states.NodeToZoneMap { - zoneMap[zoneName] = struct{}{} - } - - zoneName, _, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID)) - if err != nil { - return score, state.NewStatus(state.Error, ErrReasonNoResource) - } - - currentReps := states.ZoneSpread[key][zoneName] //get #vreps on this zone - for otherZoneName := range zoneMap { //compare with #vreps on other zones - if otherZoneName != zoneName { - otherReps := states.ZoneSpread[key][otherZoneName] - if skew = (currentReps + 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Zone %v with %d and Other Zone %v with %d causing skew %d", zoneName, currentReps, otherZoneName, otherReps, skew) - if skew > skewVal.MaxSkew { //score low - logger.Infof("Pod %d in zone %v will cause an uneven zone spread %v with other zone %v", podID, zoneName, states.ZoneSpread[key], otherZoneName) - } - score = score + uint64(skew) - } - } - - score = math.MaxUint64 - score //lesser skews get higher score - } - - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *AvailabilityZonePriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *AvailabilityZonePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/availabilityzonepriority/availability_zone_priority_test.go b/pkg/scheduler/plugins/core/availabilityzonepriority/availability_zone_priority_test.go deleted file mode 100644 index de58f055f78..00000000000 --- a/pkg/scheduler/plugins/core/availabilityzonepriority/availability_zone_priority_test.go +++ /dev/null @@ -1,260 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package availabilityzonepriority - -import ( - "fmt" - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - v1 "k8s.io/api/core/v1" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/runtime" - "k8s.io/apimachinery/pkg/types" - listers "knative.dev/eventing/pkg/reconciler/testing/v1" - "knative.dev/eventing/pkg/scheduler" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" - kubeclient "knative.dev/pkg/client/injection/kube/client/fake" -) - -const ( - testNs = "test-ns" - sfsName = "statefulset-name" - vpodName = "source-name" - vpodNamespace = "source-namespace" - numZones = 3 - numNodes = 6 -) - -func TestScore(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - replicas int32 - podID int32 - expected *state.Status - expScore uint64 - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - expScore: 0, - args: "{\"MaxSkewness\": 2}", - }, - { - name: "no vpods, no pods, no resource", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 1, - expected: state.NewStatus(state.Error, ErrReasonNoResource), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, one zone, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 18, - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, one zone, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: { - "zone1": 4, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 18, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, two zones, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 2, ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - "zone1": 5, - "zone2": 3, - }, - }}, - replicas: 2, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 10, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, three zones, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 3, ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - "zone1": 4, - "zone2": 3, - }, - }}, - replicas: 3, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 7, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 5, ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 8, - "zone1": 4, - "zone2": 3, - }, - }}, - replicas: 5, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 20, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, seven nodes/pods, unknown zone", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{ - StatefulSetName: sfsName, - Replicas: 7, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 8, - "zone1": 4, - "zone2": 3, - }, - }, - SchedulablePods: []int32{0, 1, 2, 3, 4, 5}, //Pod 6 in unknown zone not included in schedulable pods - NumZones: 4, //Includes unknown zone - }, - replicas: 7, - podID: 6, - expected: state.NewStatus(state.Success), //Not failing the plugin - expScore: math.MaxUint64 - 12, - args: "{\"MaxSkew\": 2}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &AvailabilityZonePriority{} - - name := plugin.Name() - assert.Equal(t, name, state.AvailabilityZonePriority) - - nodelist := make([]*v1.Node, 0) - podlist := make([]runtime.Object, 0) - - for i := int32(0); i < numZones; i++ { - for j := int32(0); j < numNodes/numZones; j++ { - nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - } - nodeName := "node" + fmt.Sprint(numNodes) //Node in unknown zone - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNodeNoLabel(nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - - for i := int32(0); i < tc.replicas; i++ { - nodeName := "node" + fmt.Sprint(i) - podName := sfsName + "-" + fmt.Sprint(i) - pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - podlist = append(podlist, pod) - } - - nodeToZoneMap := make(map[string]string) - for i := 0; i < len(nodelist); i++ { - node := nodelist[i] - zoneName, ok := node.GetLabels()[scheduler.ZoneLabel] - if ok && zoneName != "" { - nodeToZoneMap[node.Name] = zoneName - } else { - nodeToZoneMap[node.Name] = scheduler.UnknownZone - } - } - - lsp := listers.NewListers(podlist) - tc.state.PodLister = lsp.GetPodLister().Pods(testNs) - tc.state.NodeToZoneMap = nodeToZoneMap - - score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/evenpodspread/even_pod_spread.go b/pkg/scheduler/plugins/core/evenpodspread/even_pod_spread.go deleted file mode 100644 index 070e47a9957..00000000000 --- a/pkg/scheduler/plugins/core/evenpodspread/even_pod_spread.go +++ /dev/null @@ -1,151 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package evenpodspread - -import ( - "context" - "encoding/json" - "math" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// EvenPodSpread is a filter or score plugin that picks/favors pods that create an equal spread of resources across pods -type EvenPodSpread struct { -} - -// Verify EvenPodSpread Implements FilterPlugin and ScorePlugin Interface -var _ state.FilterPlugin = &EvenPodSpread{} -var _ state.ScorePlugin = &EvenPodSpread{} - -// Name of the plugin -const ( - Name = state.EvenPodSpread - ErrReasonInvalidArg = "invalid arguments" - ErrReasonUnschedulable = "pod will cause an uneven spread" -) - -func init() { - factory.RegisterFP(Name, &EvenPodSpread{}) - factory.RegisterSP(Name, &EvenPodSpread{}) -} - -// Name returns name of the plugin -func (pl *EvenPodSpread) Name() string { - return Name -} - -// Filter invoked at the filter extension point. -func (pl *EvenPodSpread) Filter(ctx context.Context, args interface{}, states *state.State, key types.NamespacedName, podID int32) *state.Status { - logger := logging.FromContext(ctx).With("Filter", pl.Name()) - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Filter args %v for predicate %q are not valid", args, pl.Name()) - return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.EvenPodSpreadArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - currentReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, podID)] //get #vreps on this podID - var skew int32 - for _, otherPodID := range states.SchedulablePods { //compare with #vreps on other pods - if otherPodID != podID { - otherReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, otherPodID)] - - if otherReps == 0 && states.Free(otherPodID) <= 0 { //other pod fully occupied by other vpods - so ignore - continue - } - if skew = (currentReps + 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Pod %d with %d and Other Pod %d with %d causing skew %d", podID, currentReps, otherPodID, otherReps, skew) - if skew > skewVal.MaxSkew { - logger.Infof("Unschedulable! Pod %d will cause an uneven spread %v with other pod %v", podID, states.PodSpread[key], otherPodID) - return state.NewStatus(state.Unschedulable, ErrReasonUnschedulable) - } - } - } - } - - return state.NewStatus(state.Success) -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for pods that create an even spread across pods. -func (pl *EvenPodSpread) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - logger := logging.FromContext(ctx).With("Score", pl.Name()) - var score uint64 = 0 - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name()) - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.EvenPodSpreadArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - currentReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, podID)] //get #vreps on this podID - var skew int32 - for _, otherPodID := range states.SchedulablePods { //compare with #vreps on other pods - if otherPodID != podID { - otherReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, otherPodID)] - if otherReps == 0 && states.Free(otherPodID) == 0 { //other pod fully occupied by other vpods - so ignore - continue - } - if skew = (currentReps + 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Pod %d with %d and Other Pod %d with %d causing skew %d", podID, currentReps, otherPodID, otherReps, skew) - if skew > skewVal.MaxSkew { - logger.Infof("Pod %d will cause an uneven spread %v with other pod %v", podID, states.PodSpread[key], otherPodID) - } - score = score + uint64(skew) - } - } - score = math.MaxUint64 - score //lesser skews get higher score - } - - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *EvenPodSpread) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *EvenPodSpread) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/evenpodspread/even_pod_spread_test.go b/pkg/scheduler/plugins/core/evenpodspread/even_pod_spread_test.go deleted file mode 100644 index da9f09eb082..00000000000 --- a/pkg/scheduler/plugins/core/evenpodspread/even_pod_spread_test.go +++ /dev/null @@ -1,198 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package evenpodspread - -import ( - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - "k8s.io/apimachinery/pkg/types" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" -) - -func TestFilter(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - podID int32 - expScore uint64 - expected *state.Status - onlyFilter bool - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}}, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}}, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - expScore: 0, - args: "{\"MaxSkewness\": 2}", - }, - { - name: "one vpod, one pod, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 1, - SchedulablePods: []int32{int32(0)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - }, - }, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64, - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, one pod, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 1, - SchedulablePods: []int32{int32(0)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: { - "pod-name-0": 4, - }, - }, - }, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, two pods,same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 2, - SchedulablePods: []int32{int32(0), int32(1)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - "pod-name-1": 5, - }, - }}, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 1, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 5, - SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - "pod-name-1": 4, - "pod-name-2": 3, - "pod-name-3": 4, - "pod-name-4": 5, - }, - }}, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 3, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter unschedulable", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 5, - SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 7, - "pod-name-1": 4, - "pod-name-2": 3, - "pod-name-3": 4, - "pod-name-4": 5, - }, - }}, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable), - onlyFilter: true, - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, two pods, one pod full", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 2, - SchedulablePods: []int32{int32(0), int32(1)}, - FreeCap: []int32{int32(3), int32(0)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: { - "pod-name-0": 2, - "pod-name-1": 10, - }, - }, - }, - podID: 0, - expected: state.NewStatus(state.Success), - onlyFilter: true, - args: "{\"MaxSkew\": 2}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &EvenPodSpread{} - - name := plugin.Name() - assert.Equal(t, name, state.EvenPodSpread) - - status := plugin.Filter(ctx, tc.args, tc.state, tc.vpod, tc.podID) - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - - if !tc.onlyFilter { - score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID) - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected state, got %v, want %v", status, tc.expected) - } - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/lowestordinalpriority/lowest_ordinal_priority.go b/pkg/scheduler/plugins/core/lowestordinalpriority/lowest_ordinal_priority.go deleted file mode 100644 index a7d84ca390b..00000000000 --- a/pkg/scheduler/plugins/core/lowestordinalpriority/lowest_ordinal_priority.go +++ /dev/null @@ -1,61 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package lowestordinalpriority - -import ( - "context" - "math" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" -) - -// LowestOrdinalPriority is a score plugin that favors pods that have a lower ordinal -type LowestOrdinalPriority struct { -} - -// Verify LowestOrdinalPriority Implements ScorePlugin Interface -var _ state.ScorePlugin = &LowestOrdinalPriority{} - -// Name of the plugin -const Name = state.LowestOrdinalPriority - -func init() { - factory.RegisterSP(Name, &LowestOrdinalPriority{}) -} - -// Name returns name of the plugin -func (pl *LowestOrdinalPriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for pods with lower ordinal values. -func (pl *LowestOrdinalPriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - score := math.MaxUint64 - uint64(podID) //lower ordinals get higher score - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *LowestOrdinalPriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *LowestOrdinalPriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/lowestordinalpriority/lowest_ordinal_priority_test.go b/pkg/scheduler/plugins/core/lowestordinalpriority/lowest_ordinal_priority_test.go deleted file mode 100644 index 4ce956ebd49..00000000000 --- a/pkg/scheduler/plugins/core/lowestordinalpriority/lowest_ordinal_priority_test.go +++ /dev/null @@ -1,114 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package lowestordinalpriority - -import ( - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - "k8s.io/apimachinery/pkg/types" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" -) - -func TestScore(t *testing.T) { - testCases := []struct { - name string - state *state.State - podID int32 - expScore uint64 - expected *state.Status - }{ - { - name: "no vpods", - state: &state.State{LastOrdinal: -1}, - podID: 0, - expScore: math.MaxUint64, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods free", - state: &state.State{LastOrdinal: 0}, - podID: 0, - expScore: math.MaxUint64, - expected: state.NewStatus(state.Success), - }, - { - name: "two vpods free", - state: &state.State{LastOrdinal: 0}, - podID: 1, - expScore: math.MaxUint64 - 1, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods not free", - state: &state.State{LastOrdinal: 1}, - podID: 0, - expScore: math.MaxUint64, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods not free", - state: &state.State{LastOrdinal: 1}, - podID: 1, - expScore: math.MaxUint64 - 1, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, no gaps", - state: &state.State{LastOrdinal: 1}, - podID: 2, - expScore: math.MaxUint64 - 2, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, with gaps", - state: &state.State{LastOrdinal: 2}, - podID: 0, - expScore: math.MaxUint64, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, with gaps", - state: &state.State{LastOrdinal: 2}, - podID: 1000, - expScore: math.MaxUint64 - 1000, - expected: state.NewStatus(state.Success), - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &LowestOrdinalPriority{} - var args interface{} - - name := plugin.Name() - assert.Equal(t, name, state.LowestOrdinalPriority) - - score, status := plugin.Score(ctx, args, tc.state, tc.state.SchedulablePods, types.NamespacedName{}, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/podfitsresources/pod_fits_resources.go b/pkg/scheduler/plugins/core/podfitsresources/pod_fits_resources.go deleted file mode 100644 index a4a751e8479..00000000000 --- a/pkg/scheduler/plugins/core/podfitsresources/pod_fits_resources.go +++ /dev/null @@ -1,61 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package podfitsresources - -import ( - "context" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// PodFitsResources is a plugin that filters pods that do not have sufficient free capacity for a vreplica to be placed on it -type PodFitsResources struct { -} - -// Verify PodFitsResources Implements FilterPlugin Interface -var _ state.FilterPlugin = &PodFitsResources{} - -// Name of the plugin -const Name = state.PodFitsResources - -const ( - ErrReasonUnschedulable = "pod at full capacity" -) - -func init() { - factory.RegisterFP(Name, &PodFitsResources{}) -} - -// Name returns name of the plugin -func (pl *PodFitsResources) Name() string { - return Name -} - -// Filter invoked at the filter extension point. -func (pl *PodFitsResources) Filter(ctx context.Context, args interface{}, states *state.State, key types.NamespacedName, podID int32) *state.Status { - logger := logging.FromContext(ctx).With("Filter", pl.Name()) - - if len(states.FreeCap) == 0 || states.Free(podID) > 0 { //vpods with no placements or pods with positive free cap - return state.NewStatus(state.Success) - } - - logger.Infof("Unschedulable! Pod %d has no free capacity %v", podID, states.FreeCap) - return state.NewStatus(state.Unschedulable, ErrReasonUnschedulable) -} diff --git a/pkg/scheduler/plugins/core/podfitsresources/pod_fits_resources_test.go b/pkg/scheduler/plugins/core/podfitsresources/pod_fits_resources_test.go deleted file mode 100644 index c8973be01f1..00000000000 --- a/pkg/scheduler/plugins/core/podfitsresources/pod_fits_resources_test.go +++ /dev/null @@ -1,96 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package podfitsresources - -import ( - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - "k8s.io/apimachinery/pkg/types" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" -) - -func TestFilter(t *testing.T) { - testCases := []struct { - name string - state *state.State - podID int32 - expected *state.Status - err error - }{ - { - name: "no vpods", - state: &state.State{Capacity: 10, FreeCap: []int32{}, LastOrdinal: -1}, - podID: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods free", - state: &state.State{Capacity: 10, FreeCap: []int32{int32(9)}, LastOrdinal: 0}, - podID: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods free", - state: &state.State{Capacity: 10, FreeCap: []int32{int32(10)}, LastOrdinal: 0}, - podID: 1, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods not free", - state: &state.State{Capacity: 10, FreeCap: []int32{int32(0)}, LastOrdinal: 0}, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable), - }, - { - name: "many vpods, no gaps", - state: &state.State{Capacity: 10, FreeCap: []int32{int32(0), int32(5), int32(5)}, LastOrdinal: 2}, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable), - }, - { - name: "many vpods, with gaps", - state: &state.State{Capacity: 10, FreeCap: []int32{int32(9), int32(10), int32(5), int32(10)}, LastOrdinal: 2}, - podID: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, with gaps and reserved vreplicas", - state: &state.State{Capacity: 10, FreeCap: []int32{int32(4), int32(10), int32(5), int32(0)}, LastOrdinal: 2}, - podID: 3, - expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable), - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &PodFitsResources{} - var args interface{} - - name := plugin.Name() - assert.Equal(t, name, state.PodFitsResources) - - status := plugin.Filter(ctx, args, tc.state, types.NamespacedName{}, tc.podID) - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected state, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/removewithavailabilitynodepriority/remove_with_availability_node_priority.go b/pkg/scheduler/plugins/core/removewithavailabilitynodepriority/remove_with_availability_node_priority.go deleted file mode 100644 index 62959ee79b1..00000000000 --- a/pkg/scheduler/plugins/core/removewithavailabilitynodepriority/remove_with_availability_node_priority.go +++ /dev/null @@ -1,113 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithavailabilitynodepriority - -import ( - "context" - "encoding/json" - "math" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// RemoveWithAvailabilityNodePriority is a score plugin that favors pods that create an even spread of resources across nodes for HA -type RemoveWithAvailabilityNodePriority struct { -} - -// Verify RemoveWithAvailabilityNodePriority Implements ScorePlugin Interface -var _ state.ScorePlugin = &RemoveWithAvailabilityNodePriority{} - -// Name of the plugin -const Name = state.RemoveWithAvailabilityNodePriority - -const ( - ErrReasonInvalidArg = "invalid arguments" - ErrReasonNoResource = "node does not exist" -) - -func init() { - factory.RegisterSP(Name, &RemoveWithAvailabilityNodePriority{}) -} - -// Name returns name of the plugin -func (pl *RemoveWithAvailabilityNodePriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for nodes that create an even spread across nodes. -func (pl *RemoveWithAvailabilityNodePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - logger := logging.FromContext(ctx).With("Score", pl.Name()) - var score uint64 = 0 - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name()) - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.AvailabilityNodePriorityArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - var skew int32 - _, nodeName, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID)) - if err != nil { - return score, state.NewStatus(state.Error, ErrReasonNoResource) - } - - currentReps := states.NodeSpread[key][nodeName] //get #vreps on this node - for otherNodeName := range states.NodeToZoneMap { //compare with #vreps on other pods - if otherNodeName != nodeName { - otherReps, ok := states.NodeSpread[key][otherNodeName] - if !ok { - continue //node does not exist in current placement, so move on - } - if skew = (currentReps - 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Node %v with %d and Other Node %v with %d causing skew %d", nodeName, currentReps, otherNodeName, otherReps, skew) - if skew > skewVal.MaxSkew { //score low - logger.Infof("Pod %d in node %v will cause an uneven node spread %v with other node %v", podID, nodeName, states.NodeSpread[key], otherNodeName) - } - score = score + uint64(skew) - } - } - - score = math.MaxUint64 - score //lesser skews get higher score - } - - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *RemoveWithAvailabilityNodePriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *RemoveWithAvailabilityNodePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/removewithavailabilitynodepriority/remove_with_availability_node_priority_test.go b/pkg/scheduler/plugins/core/removewithavailabilitynodepriority/remove_with_availability_node_priority_test.go deleted file mode 100644 index 2528a131a1a..00000000000 --- a/pkg/scheduler/plugins/core/removewithavailabilitynodepriority/remove_with_availability_node_priority_test.go +++ /dev/null @@ -1,231 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithavailabilitynodepriority - -import ( - "fmt" - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - v1 "k8s.io/api/core/v1" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/runtime" - "k8s.io/apimachinery/pkg/types" - listers "knative.dev/eventing/pkg/reconciler/testing/v1" - "knative.dev/eventing/pkg/scheduler" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" - kubeclient "knative.dev/pkg/client/injection/kube/client/fake" -) - -const ( - testNs = "test-ns" - sfsName = "statefulset-name" - vpodName = "source-name" - vpodNamespace = "source-namespace" - numZones = 3 - numNodes = 6 -) - -func TestScore(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - replicas int32 - podID int32 - expected *state.Status - expScore uint64 - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - NodeSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - NodeSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - expScore: 0, - args: "{\"MaxSkewness\": 2}", - }, - { - name: "no vpods, no pods, no resource", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - NodeSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 1, - expected: state.NewStatus(state.Error, ErrReasonNoResource), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, one node, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64, - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, one node, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: { - "node1": 4, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, two nodes, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 2, NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - "node1": 5, - "node2": 3, - }, - }}, - replicas: 2, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 2, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, three nodes, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 3, NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 5, - "node1": 4, - "node2": 3, - }, - }}, - replicas: 3, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 2, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 5, NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "node0": 8, - "node1": 4, - "node2": 3, - }, - }}, - replicas: 5, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 7, - args: "{\"MaxSkew\": 2}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &RemoveWithAvailabilityNodePriority{} - - name := plugin.Name() - assert.Equal(t, name, state.RemoveWithAvailabilityNodePriority) - - nodelist := make([]*v1.Node, 0) - podlist := make([]runtime.Object, 0) - - for i := int32(0); i < numZones; i++ { - for j := int32(0); j < numNodes/numZones; j++ { - nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - } - - for i := int32(0); i < tc.replicas; i++ { - nodeName := "node" + fmt.Sprint(i) - podName := sfsName + "-" + fmt.Sprint(i) - pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - podlist = append(podlist, pod) - } - - nodeToZoneMap := make(map[string]string) - for i := 0; i < len(nodelist); i++ { - node := nodelist[i] - zoneName, ok := node.GetLabels()[scheduler.ZoneLabel] - if !ok { - continue //ignore node that doesn't have zone info (maybe a test setup or control node) - } - nodeToZoneMap[node.Name] = zoneName - } - - lsp := listers.NewListers(podlist) - tc.state.PodLister = lsp.GetPodLister().Pods(testNs) - tc.state.NodeToZoneMap = nodeToZoneMap - - score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/removewithavailabilityzonepriority/remove_with_availability_zone_priority.go b/pkg/scheduler/plugins/core/removewithavailabilityzonepriority/remove_with_availability_zone_priority.go deleted file mode 100644 index f2e3eb23f0c..00000000000 --- a/pkg/scheduler/plugins/core/removewithavailabilityzonepriority/remove_with_availability_zone_priority.go +++ /dev/null @@ -1,118 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithavailabilityzonepriority - -import ( - "context" - "encoding/json" - "math" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// RemoveWithAvailabilityZonePriority is a score plugin that favors pods that create an even spread of resources across zones for HA -type RemoveWithAvailabilityZonePriority struct { -} - -// Verify RemoveWithAvailabilityZonePriority Implements ScorePlugin Interface -var _ state.ScorePlugin = &RemoveWithAvailabilityZonePriority{} - -// Name of the plugin -const Name = state.RemoveWithAvailabilityZonePriority - -const ( - ErrReasonInvalidArg = "invalid arguments" - ErrReasonNoResource = "zone does not exist" -) - -func init() { - factory.RegisterSP(Name, &RemoveWithAvailabilityZonePriority{}) -} - -// Name returns name of the plugin -func (pl *RemoveWithAvailabilityZonePriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for zones that create an even spread across zones. -func (pl *RemoveWithAvailabilityZonePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - logger := logging.FromContext(ctx).With("Score", pl.Name()) - var score uint64 = 0 - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name()) - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.AvailabilityZonePriorityArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - var skew int32 - zoneMap := make(map[string]struct{}) - for _, zoneName := range states.NodeToZoneMap { - zoneMap[zoneName] = struct{}{} - } - - zoneName, _, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID)) - if err != nil { - return score, state.NewStatus(state.Error, ErrReasonNoResource) - } - - currentReps := states.ZoneSpread[key][zoneName] //get #vreps on this zone - for otherZoneName := range zoneMap { //compare with #vreps on other pods - if otherZoneName != zoneName { - otherReps, ok := states.ZoneSpread[key][otherZoneName] - if !ok { - continue //zone does not exist in current placement, so move on - } - if skew = (currentReps - 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Zone %v with %d and Other Zone %v with %d causing skew %d", zoneName, currentReps, otherZoneName, otherReps, skew) - if skew > skewVal.MaxSkew { //score low - logger.Infof("Pod %d in zone %v will cause an uneven zone spread %v with other zone %v", podID, zoneName, states.ZoneSpread[key], otherZoneName) - } - score = score + uint64(skew) - } - } - - score = math.MaxUint64 - score //lesser skews get higher score - } - - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *RemoveWithAvailabilityZonePriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *RemoveWithAvailabilityZonePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/removewithavailabilityzonepriority/remove_with_availability_zone_priority_test.go b/pkg/scheduler/plugins/core/removewithavailabilityzonepriority/remove_with_availability_zone_priority_test.go deleted file mode 100644 index f72504e9146..00000000000 --- a/pkg/scheduler/plugins/core/removewithavailabilityzonepriority/remove_with_availability_zone_priority_test.go +++ /dev/null @@ -1,231 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithavailabilityzonepriority - -import ( - "fmt" - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - v1 "k8s.io/api/core/v1" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/runtime" - "k8s.io/apimachinery/pkg/types" - listers "knative.dev/eventing/pkg/reconciler/testing/v1" - "knative.dev/eventing/pkg/scheduler" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" - kubeclient "knative.dev/pkg/client/injection/kube/client/fake" -) - -const ( - testNs = "test-ns" - sfsName = "statefulset-name" - vpodName = "source-name" - vpodNamespace = "source-namespace" - numZones = 3 - numNodes = 6 -) - -func TestScore(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - replicas int32 - podID int32 - expected *state.Status - expScore uint64 - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 0, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - expScore: 0, - args: "{\"MaxSkewness\": 2}", - }, - { - name: "no vpods, no pods, no resource", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{}}, - replicas: 0, - podID: 1, - expected: state.NewStatus(state.Error, ErrReasonNoResource), - expScore: 0, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, one zone, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64, - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, one zone, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 1, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: { - "zone1": 4, - }, - }, - }, - replicas: 1, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, two zones, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 2, ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - "zone1": 5, - "zone2": 3, - }, - }}, - replicas: 2, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 2, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, three zones, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 3, ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 5, - "zone1": 4, - "zone2": 3, - }, - }}, - replicas: 3, - podID: 1, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 2, - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}, - state: &state.State{StatefulSetName: sfsName, Replicas: 5, ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: { - "zone0": 8, - "zone1": 4, - "zone2": 3, - }, - }}, - replicas: 5, - podID: 0, - expected: state.NewStatus(state.Success), - expScore: math.MaxUint64 - 7, - args: "{\"MaxSkew\": 2}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &RemoveWithAvailabilityZonePriority{} - - name := plugin.Name() - assert.Equal(t, name, state.RemoveWithAvailabilityZonePriority) - - nodelist := make([]*v1.Node, 0) - podlist := make([]runtime.Object, 0) - - for i := int32(0); i < numZones; i++ { - for j := int32(0); j < numNodes/numZones; j++ { - nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - } - - for i := int32(0); i < tc.replicas; i++ { - nodeName := "node" + fmt.Sprint(i) - podName := sfsName + "-" + fmt.Sprint(i) - pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - podlist = append(podlist, pod) - } - - nodeToZoneMap := make(map[string]string) - for i := 0; i < len(nodelist); i++ { - node := nodelist[i] - zoneName, ok := node.GetLabels()[scheduler.ZoneLabel] - if !ok { - continue //ignore node that doesn't have zone info (maybe a test setup or control node) - } - nodeToZoneMap[node.Name] = zoneName - } - - lsp := listers.NewListers(podlist) - tc.state.PodLister = lsp.GetPodLister().Pods(testNs) - tc.state.NodeToZoneMap = nodeToZoneMap - - score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/removewithevenpodspreadpriority/remove_with_even_pod_spread_priority.go b/pkg/scheduler/plugins/core/removewithevenpodspreadpriority/remove_with_even_pod_spread_priority.go deleted file mode 100644 index e7b008e0b0a..00000000000 --- a/pkg/scheduler/plugins/core/removewithevenpodspreadpriority/remove_with_even_pod_spread_priority.go +++ /dev/null @@ -1,106 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithevenpodspreadpriority - -import ( - "context" - "encoding/json" - "math" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// RemoveWithEvenPodSpreadPriority is a filter plugin that eliminates pods that do not create an equal spread of resources across pods -type RemoveWithEvenPodSpreadPriority struct { -} - -// Verify RemoveWithEvenPodSpreadPriority Implements FilterPlugin Interface -var _ state.ScorePlugin = &RemoveWithEvenPodSpreadPriority{} - -// Name of the plugin -const ( - Name = state.RemoveWithEvenPodSpreadPriority - ErrReasonInvalidArg = "invalid arguments" - ErrReasonUnschedulable = "pod will cause an uneven spread" -) - -func init() { - factory.RegisterSP(Name, &RemoveWithEvenPodSpreadPriority{}) -} - -// Name returns name of the plugin -func (pl *RemoveWithEvenPodSpreadPriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for pods that create an even spread across pods. -func (pl *RemoveWithEvenPodSpreadPriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - logger := logging.FromContext(ctx).With("Score", pl.Name()) - var score uint64 = 0 - - spreadArgs, ok := args.(string) - if !ok { - logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name()) - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - skewVal := state.EvenPodSpreadArgs{} - decoder := json.NewDecoder(strings.NewReader(spreadArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&skewVal); err != nil { - return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - if states.Replicas > 0 { //need at least a pod to compute spread - currentReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, podID)] //get #vreps on this podID - var skew int32 - for _, otherPodID := range states.SchedulablePods { //compare with #vreps on other pods - if otherPodID != podID { - otherReps, ok := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, otherPodID)] - if !ok { - continue //pod does not exist in current placement, so move on - } - if skew = (currentReps - 1) - otherReps; skew < 0 { - skew = skew * int32(-1) - } - - //logger.Infof("Current Pod %v with %d and Other Pod %v with %d causing skew %d", podID, currentReps, otherPodID, otherReps, skew) - if skew > skewVal.MaxSkew { //score low - logger.Infof("Pod %d will cause an uneven spread %v with other pod %v", podID, states.PodSpread[key], otherPodID) - } - score = score + uint64(skew) - } - } - score = math.MaxUint64 - score //lesser skews get higher score - } - - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *RemoveWithEvenPodSpreadPriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *RemoveWithEvenPodSpreadPriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/removewithevenpodspreadpriority/remove_with_even_pod_spread_priority_test.go b/pkg/scheduler/plugins/core/removewithevenpodspreadpriority/remove_with_even_pod_spread_priority_test.go deleted file mode 100644 index fb2234fe05c..00000000000 --- a/pkg/scheduler/plugins/core/removewithevenpodspreadpriority/remove_with_even_pod_spread_priority_test.go +++ /dev/null @@ -1,166 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithevenpodspreadpriority - -import ( - "math" - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - "k8s.io/apimachinery/pkg/types" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" -) - -func TestFilter(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - podID int32 - expected *state.Status - expScore uint64 - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}}, - podID: 0, - expScore: 0, - expected: state.NewStatus(state.Success), - args: "{\"MaxSkew\": 2}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}}, - podID: 0, - expScore: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - args: "{\"MaxSkewness\": 2}", - }, - { - name: "one vpod, one pod, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 1, - SchedulablePods: []int32{int32(0)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - }, - }, - podID: 0, - expScore: math.MaxUint64, - expected: state.NewStatus(state.Success), - args: "{\"MaxSkew\": 2}", - }, - { - name: "two vpods, one pod, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 1, - SchedulablePods: []int32{int32(0)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: { - "pod-name-0": 4, - }, - }, - }, - podID: 0, - expScore: math.MaxUint64, - expected: state.NewStatus(state.Success), - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, two pods,same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 2, - SchedulablePods: []int32{int32(0), int32(1)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - "pod-name-1": 5, - }, - }}, - podID: 1, - expScore: math.MaxUint64 - 1, - expected: state.NewStatus(state.Success), - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 5, - SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - "pod-name-1": 4, - "pod-name-2": 3, - "pod-name-3": 4, - "pod-name-4": 5, - }, - }}, - podID: 1, - expScore: math.MaxUint64 - 5, - expected: state.NewStatus(state.Success), - args: "{\"MaxSkew\": 2}", - }, - { - name: "one vpod, five pods, same pod filter diff pod", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", Replicas: 6, - SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4), int32(5)}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 10, - "pod-name-1": 4, - "pod-name-2": 3, - "pod-name-3": 4, - "pod-name-4": 5, - }, - }}, - podID: 0, - expScore: math.MaxUint64 - 20, - expected: state.NewStatus(state.Success), - args: "{\"MaxSkew\": 2}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &RemoveWithEvenPodSpreadPriority{} - - name := plugin.Name() - assert.Equal(t, name, state.RemoveWithEvenPodSpreadPriority) - - score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/core/removewithhighestordinalpriority/remove_with_highest_ordinal_priority.go b/pkg/scheduler/plugins/core/removewithhighestordinalpriority/remove_with_highest_ordinal_priority.go deleted file mode 100644 index 324454f5e8d..00000000000 --- a/pkg/scheduler/plugins/core/removewithhighestordinalpriority/remove_with_highest_ordinal_priority.go +++ /dev/null @@ -1,60 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithhighestordinalpriority - -import ( - "context" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" -) - -// RemoveWithHighestOrdinalPriority is a score plugin that favors pods that have a higher ordinal -type RemoveWithHighestOrdinalPriority struct { -} - -// Verify RemoveWithHighestOrdinalPriority Implements ScorePlugin Interface -var _ state.ScorePlugin = &RemoveWithHighestOrdinalPriority{} - -// Name of the plugin -const Name = state.RemoveWithHighestOrdinalPriority - -func init() { - factory.RegisterSP(Name, &RemoveWithHighestOrdinalPriority{}) -} - -// Name returns name of the plugin -func (pl *RemoveWithHighestOrdinalPriority) Name() string { - return Name -} - -// Score invoked at the score extension point. The "score" returned in this function is higher for pods with higher ordinal values. -func (pl *RemoveWithHighestOrdinalPriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) { - score := uint64(podID) //higher ordinals get higher score - return score, state.NewStatus(state.Success) -} - -// ScoreExtensions of the Score plugin. -func (pl *RemoveWithHighestOrdinalPriority) ScoreExtensions() state.ScoreExtensions { - return pl -} - -// NormalizeScore invoked after scoring all pods. -func (pl *RemoveWithHighestOrdinalPriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status { - return nil -} diff --git a/pkg/scheduler/plugins/core/removewithhighestordinalpriority/remove_with_highest_ordinal_priority_test.go b/pkg/scheduler/plugins/core/removewithhighestordinalpriority/remove_with_highest_ordinal_priority_test.go deleted file mode 100644 index 37060437254..00000000000 --- a/pkg/scheduler/plugins/core/removewithhighestordinalpriority/remove_with_highest_ordinal_priority_test.go +++ /dev/null @@ -1,113 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package removewithhighestordinalpriority - -import ( - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - "k8s.io/apimachinery/pkg/types" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" -) - -func TestScore(t *testing.T) { - testCases := []struct { - name string - state *state.State - podID int32 - expScore uint64 - expected *state.Status - }{ - { - name: "no vpods", - state: &state.State{LastOrdinal: -1}, - podID: 0, - expScore: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods free", - state: &state.State{LastOrdinal: 0}, - podID: 0, - expScore: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "two vpods free", - state: &state.State{LastOrdinal: 0}, - podID: 1, - expScore: 1, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods not free", - state: &state.State{LastOrdinal: 1}, - podID: 0, - expScore: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "one vpods not free", - state: &state.State{LastOrdinal: 1}, - podID: 1, - expScore: 01, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, no gaps", - state: &state.State{LastOrdinal: 1}, - podID: 2, - expScore: 2, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, with gaps", - state: &state.State{LastOrdinal: 2}, - podID: 0, - expScore: 0, - expected: state.NewStatus(state.Success), - }, - { - name: "many vpods, with gaps", - state: &state.State{LastOrdinal: 2}, - podID: 1000, - expScore: 1000, - expected: state.NewStatus(state.Success), - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &RemoveWithHighestOrdinalPriority{} - var args interface{} - - name := plugin.Name() - assert.Equal(t, name, state.RemoveWithHighestOrdinalPriority) - - score, status := plugin.Score(ctx, args, tc.state, tc.state.SchedulablePods, types.NamespacedName{}, tc.podID) - if score != tc.expScore { - t.Errorf("unexpected score, got %v, want %v", score, tc.expScore) - } - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected status, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/plugins/kafka/nomaxresourcecount/no_max_resource_count.go b/pkg/scheduler/plugins/kafka/nomaxresourcecount/no_max_resource_count.go deleted file mode 100644 index 49975eefb89..00000000000 --- a/pkg/scheduler/plugins/kafka/nomaxresourcecount/no_max_resource_count.go +++ /dev/null @@ -1,78 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package nomaxresourcecount - -import ( - "context" - "encoding/json" - "strings" - - "k8s.io/apimachinery/pkg/types" - "knative.dev/eventing/pkg/scheduler/factory" - state "knative.dev/eventing/pkg/scheduler/state" - "knative.dev/pkg/logging" -) - -// NoMaxResourceCount plugin filters pods that cause total pods with placements to exceed total partitioncount. -type NoMaxResourceCount struct { -} - -// Verify NoMaxResourceCount Implements FilterPlugin Interface -var _ state.FilterPlugin = &NoMaxResourceCount{} - -// Name of the plugin -const Name = state.NoMaxResourceCount - -const ( - ErrReasonInvalidArg = "invalid arguments" - ErrReasonUnschedulable = "pod increases total # of pods beyond partition count" -) - -func init() { - factory.RegisterFP(Name, &NoMaxResourceCount{}) -} - -// Name returns name of the plugin -func (pl *NoMaxResourceCount) Name() string { - return Name -} - -// Filter invoked at the filter extension point. -func (pl *NoMaxResourceCount) Filter(ctx context.Context, args interface{}, states *state.State, key types.NamespacedName, podID int32) *state.Status { - logger := logging.FromContext(ctx).With("Filter", pl.Name()) - - resourceCountArgs, ok := args.(string) - if !ok { - logger.Errorf("Filter args %v for predicate %q are not valid", args, pl.Name()) - return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - resVal := state.NoMaxResourceCountArgs{} - decoder := json.NewDecoder(strings.NewReader(resourceCountArgs)) - decoder.DisallowUnknownFields() - if err := decoder.Decode(&resVal); err != nil { - return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg) - } - - podName := state.PodNameFromOrdinal(states.StatefulSetName, podID) - if _, ok := states.PodSpread[key][podName]; !ok && ((len(states.PodSpread[key]) + 1) > resVal.NumPartitions) { //pod not in vrep's partition map and counting this new pod towards total pod count - logger.Infof("Unschedulable! Pod %d filtered due to total pod count %v exceeding partition count", podID, len(states.PodSpread[key])+1) - return state.NewStatus(state.Unschedulable, ErrReasonUnschedulable) - } - - return state.NewStatus(state.Success) -} diff --git a/pkg/scheduler/plugins/kafka/nomaxresourcecount/no_max_resource_count_test.go b/pkg/scheduler/plugins/kafka/nomaxresourcecount/no_max_resource_count_test.go deleted file mode 100644 index e3417934775..00000000000 --- a/pkg/scheduler/plugins/kafka/nomaxresourcecount/no_max_resource_count_test.go +++ /dev/null @@ -1,146 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package nomaxresourcecount - -import ( - "reflect" - "testing" - - "github.com/stretchr/testify/assert" - "k8s.io/apimachinery/pkg/types" - state "knative.dev/eventing/pkg/scheduler/state" - tscheduler "knative.dev/eventing/pkg/scheduler/testing" -) - -func TestFilter(t *testing.T) { - testCases := []struct { - name string - state *state.State - vpod types.NamespacedName - podID int32 - expected *state.Status - args interface{} - }{ - { - name: "no vpods, no pods", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: -1, PodSpread: map[types.NamespacedName]map[string]int32{}}, - podID: 0, - expected: state.NewStatus(state.Success), - args: "{\"NumPartitions\": 5}", - }, - { - name: "no vpods, no pods, bad arg", - vpod: types.NamespacedName{}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: -1, PodSpread: map[types.NamespacedName]map[string]int32{}}, - podID: 0, - expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg), - args: "{\"NumParts\": 5}", - }, - { - name: "one vpod, one pod, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 0, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - }, - }, - podID: 0, - expected: state.NewStatus(state.Success), - args: "{\"NumPartitions\": 5}", - }, - { - name: "two vpods, one pod, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 0, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - }, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: { - "pod-name-0": 4, - }, - }, - }, - podID: 0, - expected: state.NewStatus(state.Success), - args: "{\"NumPartitions\": 5}", - }, - { - name: "one vpod, two pods,same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 1, PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - "pod-name-1": 5, - }, - }}, - podID: 1, - expected: state.NewStatus(state.Success), - args: "{\"NumPartitions\": 5}", - }, - { - name: "one vpod, five pods, same pod filter", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 4, PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 5, - "pod-name-1": 4, - "pod-name-2": 3, - "pod-name-3": 4, - "pod-name-4": 5, - }, - }}, - podID: 1, - expected: state.NewStatus(state.Success), - args: "{\"NumPartitions\": 5}", - }, - { - name: "one vpod, five pods, same pod filter unschedulable", - vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"}, - state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 2, PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "pod-name-0": 7, - "pod-name-1": 4, - "pod-name-2": 3, - "pod-name-3": 4, - "pod-name-4": 5, - }, - }}, - podID: 5, - expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable), - args: "{\"NumPartitions\": 5}", - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - var plugin = &NoMaxResourceCount{} - - name := plugin.Name() - assert.Equal(t, name, state.NoMaxResourceCount) - - status := plugin.Filter(ctx, tc.args, tc.state, tc.vpod, tc.podID) - if !reflect.DeepEqual(status, tc.expected) { - t.Errorf("unexpected state, got %v, want %v", status, tc.expected) - } - }) - } -} diff --git a/pkg/scheduler/scheduler.go b/pkg/scheduler/scheduler.go index a9ca7b1d5a7..62dcf163d29 100644 --- a/pkg/scheduler/scheduler.go +++ b/pkg/scheduler/scheduler.go @@ -30,57 +30,12 @@ import ( duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" ) -type SchedulerPolicyType string - const ( - // MAXFILLUP policy type adds vreplicas to existing pods to fill them up before adding to new pods - MAXFILLUP SchedulerPolicyType = "MAXFILLUP" - // PodAnnotationKey is an annotation used by the scheduler to be informed of pods // being evicted and not use it for placing vreplicas PodAnnotationKey = "eventing.knative.dev/unschedulable" ) -const ( - ZoneLabel = "topology.kubernetes.io/zone" - - UnknownZone = "unknown" -) - -const ( - // MaxWeight is the maximum weight that can be assigned for a priority. - MaxWeight uint64 = 10 - // MinWeight is the minimum weight that can be assigned for a priority. - MinWeight uint64 = 0 -) - -// Policy describes a struct of a policy resource. -type SchedulerPolicy struct { - // Holds the information to configure the fit predicate functions. - Predicates []PredicatePolicy `json:"predicates"` - // Holds the information to configure the priority functions. - Priorities []PriorityPolicy `json:"priorities"` -} - -// PredicatePolicy describes a struct of a predicate policy. -type PredicatePolicy struct { - // Identifier of the predicate policy - Name string `json:"name"` - // Holds the parameters to configure the given predicate - Args interface{} `json:"args"` -} - -// PriorityPolicy describes a struct of a priority policy. -type PriorityPolicy struct { - // Identifier of the priority policy - Name string `json:"name"` - // The numeric multiplier for the pod scores that the priority function generates - // The weight should be a positive integer - Weight uint64 `json:"weight"` - // Holds the parameters to configure the given priority function - Args interface{} `json:"args"` -} - // VPodLister is the function signature for returning a list of VPods type VPodLister func() ([]VPod, error) diff --git a/pkg/scheduler/state/helpers.go b/pkg/scheduler/state/helpers.go index ad3a5aaf765..db5d9216b2c 100644 --- a/pkg/scheduler/state/helpers.go +++ b/pkg/scheduler/state/helpers.go @@ -17,14 +17,10 @@ limitations under the License. package state import ( - "context" - "math" "strconv" "strings" - "time" "k8s.io/apimachinery/pkg/types" - "k8s.io/apimachinery/pkg/util/wait" "knative.dev/eventing/pkg/scheduler" ) @@ -36,7 +32,7 @@ func PodNameFromOrdinal(name string, ordinal int32) string { func OrdinalFromPodName(podName string) int32 { ordinal, err := strconv.ParseInt(podName[strings.LastIndex(podName, "-")+1:], 10, 32) if err != nil { - return math.MaxInt32 + panic(podName + " is not a valid pod name") } return int32(ordinal) } @@ -50,31 +46,3 @@ func GetVPod(key types.NamespacedName, vpods []scheduler.VPod) scheduler.VPod { } return nil } - -func SatisfyZoneAvailability(feasiblePods []int32, states *State) bool { - zoneMap := make(map[string]struct{}) - var zoneName string - var err error - for _, podID := range feasiblePods { - zoneName, _, err = states.GetPodInfo(PodNameFromOrdinal(states.StatefulSetName, podID)) - if err != nil { - continue - } - zoneMap[zoneName] = struct{}{} - } - return len(zoneMap) == int(states.NumZones) -} - -func SatisfyNodeAvailability(feasiblePods []int32, states *State) bool { - nodeMap := make(map[string]struct{}) - var nodeName string - var err error - for _, podID := range feasiblePods { - wait.PollUntilContextTimeout(context.Background(), 50*time.Millisecond, 5*time.Second, true, func(ctx context.Context) (bool, error) { - _, nodeName, err = states.GetPodInfo(PodNameFromOrdinal(states.StatefulSetName, podID)) - return err == nil, nil - }) - nodeMap[nodeName] = struct{}{} - } - return len(nodeMap) == int(states.NumNodes) -} diff --git a/pkg/scheduler/state/interface.go b/pkg/scheduler/state/interface.go deleted file mode 100644 index 44c7a2d4d4c..00000000000 --- a/pkg/scheduler/state/interface.go +++ /dev/null @@ -1,209 +0,0 @@ -/* -Copyright 2021 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package state - -import ( - "context" - "errors" - "strings" - - "k8s.io/apimachinery/pkg/types" -) - -const ( - PodFitsResources = "PodFitsResources" - NoMaxResourceCount = "NoMaxResourceCount" - EvenPodSpread = "EvenPodSpread" - AvailabilityNodePriority = "AvailabilityNodePriority" - AvailabilityZonePriority = "AvailabilityZonePriority" - LowestOrdinalPriority = "LowestOrdinalPriority" - RemoveWithEvenPodSpreadPriority = "RemoveWithEvenPodSpreadPriority" - RemoveWithAvailabilityNodePriority = "RemoveWithAvailabilityNodePriority" - RemoveWithAvailabilityZonePriority = "RemoveWithAvailabilityZonePriority" - RemoveWithHighestOrdinalPriority = "RemoveWithHighestOrdinalPriority" -) - -// Plugin is the parent type for all the scheduling framework plugins. -type Plugin interface { - Name() string -} - -type FilterPlugin interface { - Plugin - // Filter is called by the scheduler. - // All FilterPlugins should return "Success" to declare that - // the given pod fits the vreplica. - Filter(ctx context.Context, args interface{}, state *State, key types.NamespacedName, podID int32) *Status -} - -// ScoreExtensions is an interface for Score extended functionality. -type ScoreExtensions interface { - // NormalizeScore is called for all pod scores produced by the same plugin's "Score" - // method. A successful run of NormalizeScore will update the scores list and return - // a success status. - NormalizeScore(ctx context.Context, state *State, scores PodScoreList) *Status -} - -type ScorePlugin interface { - Plugin - // Score is called by the scheduler. - // All ScorePlugins should return "Success" unless the args are invalid. - Score(ctx context.Context, args interface{}, state *State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *Status) - - // ScoreExtensions returns a ScoreExtensions interface if it implements one, or nil if does not - ScoreExtensions() ScoreExtensions -} - -// NoMaxResourceCountArgs holds arguments used to configure the NoMaxResourceCount plugin. -type NoMaxResourceCountArgs struct { - NumPartitions int -} - -// EvenPodSpreadArgs holds arguments used to configure the EvenPodSpread plugin. -type EvenPodSpreadArgs struct { - MaxSkew int32 -} - -// AvailabilityZonePriorityArgs holds arguments used to configure the AvailabilityZonePriority plugin. -type AvailabilityZonePriorityArgs struct { - MaxSkew int32 -} - -// AvailabilityNodePriorityArgs holds arguments used to configure the AvailabilityNodePriority plugin. -type AvailabilityNodePriorityArgs struct { - MaxSkew int32 -} - -// Code is the Status code/type which is returned from plugins. -type Code int - -// These are predefined codes used in a Status. -const ( - // Success means that plugin ran correctly and found pod schedulable. - Success Code = iota - // Unschedulable is used when a plugin finds a pod unschedulable due to not satisying the predicate. - Unschedulable - // Error is used for internal plugin errors, unexpected input, etc. - Error -) - -// Status indicates the result of running a plugin. -type Status struct { - code Code - reasons []string - err error -} - -// Code returns code of the Status. -func (s *Status) Code() Code { - if s == nil { - return Success - } - return s.code -} - -// Message returns a concatenated message on reasons of the Status. -func (s *Status) Message() string { - if s == nil { - return "" - } - return strings.Join(s.reasons, ", ") -} - -// NewStatus makes a Status out of the given arguments and returns its pointer. -func NewStatus(code Code, reasons ...string) *Status { - s := &Status{ - code: code, - reasons: reasons, - } - if code == Error { - s.err = errors.New(s.Message()) - } - return s -} - -// AsStatus wraps an error in a Status. -func AsStatus(err error) *Status { - return &Status{ - code: Error, - reasons: []string{err.Error()}, - err: err, - } -} - -// AsError returns nil if the status is a success; otherwise returns an "error" object -// with a concatenated message on reasons of the Status. -func (s *Status) AsError() error { - if s.IsSuccess() { - return nil - } - if s.err != nil { - return s.err - } - return errors.New(s.Message()) -} - -// IsSuccess returns true if and only if "Status" is nil or Code is "Success". -func (s *Status) IsSuccess() bool { - return s.Code() == Success -} - -// IsError returns true if and only if "Status" is "Error". -func (s *Status) IsError() bool { - return s.Code() == Error -} - -// IsUnschedulable returns true if "Status" is Unschedulable -func (s *Status) IsUnschedulable() bool { - return s.Code() == Unschedulable -} - -type PodScore struct { - ID int32 - Score uint64 -} - -type PodScoreList []PodScore - -// PluginToPodScores declares a map from plugin name to its PodScoreList. -type PluginToPodScores map[string]PodScoreList - -// PluginToStatus maps plugin name to status. Currently used to identify which Filter plugin -// returned which status. -type PluginToStatus map[string]*Status - -// Merge merges the statuses in the map into one. The resulting status code have the following -// precedence: Error, Unschedulable, Success -func (p PluginToStatus) Merge() *Status { - if len(p) == 0 { - return nil - } - - finalStatus := NewStatus(Success) - for _, s := range p { - if s.Code() == Error { - finalStatus.err = s.AsError() - } - if s.Code() > finalStatus.code { - finalStatus.code = s.Code() - } - - finalStatus.reasons = append(finalStatus.reasons, s.reasons...) - } - - return finalStatus -} diff --git a/pkg/scheduler/state/interface_test.go b/pkg/scheduler/state/interface_test.go deleted file mode 100644 index 44c5695fa37..00000000000 --- a/pkg/scheduler/state/interface_test.go +++ /dev/null @@ -1,87 +0,0 @@ -/* -Copyright 2020 The Knative Authors - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package state - -import ( - "errors" - "testing" -) - -func TestStatus(t *testing.T) { - testCases := []struct { - name string - status *Status - code Code - err error - }{ - { - name: "success", - status: NewStatus(Success), - }, - { - name: "error", - status: NewStatus(Error), - code: Error, - }, - { - name: "error as status", - status: AsStatus(errors.New("invalid arguments")), - code: Error, - }, - { - name: "unschedulable", - status: NewStatus(Unschedulable, "invalid arguments"), - code: Unschedulable, - err: NewStatus(Unschedulable, "invalid arguments").AsError(), - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - if tc.status.IsSuccess() && tc.status.Code() != tc.code && tc.status.AsError() != tc.err { - t.Errorf("unexpected code, got %v, want %v", tc.status.code, tc.code) - } else if tc.status.IsUnschedulable() && tc.status.Code() != tc.code && tc.status.AsError() != tc.err { - t.Errorf("unexpected code/msg, got %v, want %v, got %v, want %v", tc.status.code, tc.code, tc.status.AsError().Error(), tc.err.Error()) - } else if tc.status.IsError() && tc.status.Code() != tc.code && tc.status.AsError() != tc.err { - t.Errorf("unexpected code/msg, got %v, want %v, got %v, want %v", tc.status.code, tc.code, tc.status.AsError().Error(), tc.err.Error()) - } - }) - } -} - -func TestStatusMerge(t *testing.T) { - ps := PluginToStatus{"A": NewStatus(Success), "B": NewStatus(Success)} - if !ps.Merge().IsSuccess() { - t.Errorf("unexpected status from merge") - } - - ps = PluginToStatus{"A": NewStatus(Success), "B": NewStatus(Error)} - if !ps.Merge().IsError() { - t.Errorf("unexpected status from merge") - } - - ps = PluginToStatus{"A": NewStatus(Unschedulable), "B": NewStatus(Error)} - if !ps.Merge().IsError() { - t.Errorf("unexpected status from merge") - } - - ps = PluginToStatus{"A": NewStatus(Unschedulable), "B": NewStatus(Success)} - if !ps.Merge().IsUnschedulable() { - t.Errorf("unexpected status from merge") - } - -} diff --git a/pkg/scheduler/state/state.go b/pkg/scheduler/state/state.go index 44069babe92..9d4503b9158 100644 --- a/pkg/scheduler/state/state.go +++ b/pkg/scheduler/state/state.go @@ -19,14 +19,12 @@ package state import ( "context" "encoding/json" - "errors" "math" "strconv" "go.uber.org/zap" v1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/labels" "k8s.io/apimachinery/pkg/types" "k8s.io/apimachinery/pkg/util/sets" corev1 "k8s.io/client-go/listers/core/v1" @@ -39,7 +37,7 @@ type StateAccessor interface { // State returns the current state (snapshot) about placed vpods // Take into account reserved vreplicas and update `reserved` to reflect // the current state. - State(ctx context.Context, reserved map[types.NamespacedName]map[string]int32) (*State, error) + State(ctx context.Context) (*State, error) } // state provides information about the current scheduling of all vpods @@ -61,24 +59,6 @@ type State struct { // Replicas is the (cached) number of statefulset replicas. Replicas int32 - // Number of available zones in cluster - NumZones int32 - - // Number of available nodes in cluster - NumNodes int32 - - // Scheduling policy type for placing vreplicas on pods - SchedulerPolicy scheduler.SchedulerPolicyType - - // Scheduling policy plugin for placing vreplicas on pods - SchedPolicy *scheduler.SchedulerPolicy - - // De-scheduling policy plugin for removing vreplicas from pods - DeschedPolicy *scheduler.SchedulerPolicy - - // Mapping node names of nodes currently in cluster to their zone info - NodeToZoneMap map[string]string - StatefulSetName string PodLister corev1.PodNamespaceLister @@ -86,12 +66,6 @@ type State struct { // Stores for each vpod, a map of podname to number of vreplicas placed on that pod currently PodSpread map[types.NamespacedName]map[string]int32 - // Stores for each vpod, a map of nodename to total number of vreplicas placed on all pods running on that node currently - NodeSpread map[types.NamespacedName]map[string]int32 - - // Stores for each vpod, a map of zonename to total number of vreplicas placed on all pods located in that zone currently - ZoneSpread map[types.NamespacedName]map[string]int32 - // Pending tracks the number of virtual replicas that haven't been scheduled yet // because there wasn't enough free capacity. Pending map[types.NamespacedName]int32 @@ -114,7 +88,7 @@ func (s *State) SetFree(ordinal int32, value int32) { s.FreeCap[int(ordinal)] = value } -// freeCapacity returns the number of vreplicas that can be used, +// FreeCapacity returns the number of vreplicas that can be used, // up to the last ordinal func (s *State) FreeCapacity() int32 { t := int32(0) @@ -124,20 +98,6 @@ func (s *State) FreeCapacity() int32 { return t } -func (s *State) GetPodInfo(podName string) (zoneName string, nodeName string, err error) { - pod, err := s.PodLister.Get(podName) - if err != nil { - return zoneName, nodeName, err - } - - nodeName = pod.Spec.NodeName - zoneName, ok := s.NodeToZoneMap[nodeName] - if !ok { - return zoneName, nodeName, errors.New("could not find zone") - } - return zoneName, nodeName, nil -} - func (s *State) IsSchedulablePod(ordinal int32) bool { for _, x := range s.SchedulablePods { if x == ordinal { @@ -151,32 +111,24 @@ func (s *State) IsSchedulablePod(ordinal int32) bool { type stateBuilder struct { vpodLister scheduler.VPodLister capacity int32 - schedulerPolicy scheduler.SchedulerPolicyType - nodeLister corev1.NodeLister statefulSetCache *scheduler.ScaleCache statefulSetName string podLister corev1.PodNamespaceLister - schedPolicy *scheduler.SchedulerPolicy - deschedPolicy *scheduler.SchedulerPolicy } // NewStateBuilder returns a StateAccessor recreating the state from scratch each time it is requested -func NewStateBuilder(sfsname string, lister scheduler.VPodLister, podCapacity int32, schedulerPolicy scheduler.SchedulerPolicyType, schedPolicy, deschedPolicy *scheduler.SchedulerPolicy, podlister corev1.PodNamespaceLister, nodeLister corev1.NodeLister, statefulSetCache *scheduler.ScaleCache) StateAccessor { +func NewStateBuilder(sfsname string, lister scheduler.VPodLister, podCapacity int32, podlister corev1.PodNamespaceLister, statefulSetCache *scheduler.ScaleCache) StateAccessor { return &stateBuilder{ vpodLister: lister, capacity: podCapacity, - schedulerPolicy: schedulerPolicy, - nodeLister: nodeLister, statefulSetCache: statefulSetCache, statefulSetName: sfsname, podLister: podlister, - schedPolicy: schedPolicy, - deschedPolicy: deschedPolicy, } } -func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedName]map[string]int32) (*State, error) { +func (s *stateBuilder) State(ctx context.Context) (*State, error) { vpods, err := s.vpodLister() if err != nil { return nil, err @@ -201,34 +153,6 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN withPlacement := make(map[types.NamespacedName]map[string]bool) podSpread := make(map[types.NamespacedName]map[string]int32) - nodeSpread := make(map[types.NamespacedName]map[string]int32) - zoneSpread := make(map[types.NamespacedName]map[string]int32) - - //Build the node to zone map - nodes, err := s.nodeLister.List(labels.Everything()) - if err != nil { - return nil, err - } - - nodeToZoneMap := make(map[string]string) - zoneMap := make(map[string]struct{}) - for i := 0; i < len(nodes); i++ { - node := nodes[i] - - if isNodeUnschedulable(node) { - // Ignore node that is currently unschedulable. - continue - } - - zoneName, ok := node.GetLabels()[scheduler.ZoneLabel] - if ok && zoneName != "" { - nodeToZoneMap[node.Name] = zoneName - zoneMap[zoneName] = struct{}{} - } else { - nodeToZoneMap[node.Name] = scheduler.UnknownZone - zoneMap[scheduler.UnknownZone] = struct{}{} - } - } for podId := int32(0); podId < scale.Spec.Replicas && s.podLister != nil; podId++ { pod, err := s.podLister.Get(PodNameFromOrdinal(s.statefulSetName, podId)) @@ -242,17 +166,6 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN continue } - node, err := s.nodeLister.Get(pod.Spec.NodeName) - if err != nil { - return nil, err - } - - if isNodeUnschedulable(node) { - // Node is marked as Unschedulable - CANNOT SCHEDULE VREPS on a pod running on this node. - logger.Debugw("Pod is on an unschedulable node", zap.Any("pod", node)) - continue - } - // Pod has no annotation or not annotated as unschedulable and // not on an unschedulable node, so add to feasible schedulablePods.Insert(podId) @@ -271,16 +184,11 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN withPlacement[vpod.GetKey()] = make(map[string]bool) podSpread[vpod.GetKey()] = make(map[string]int32) - nodeSpread[vpod.GetKey()] = make(map[string]int32) - zoneSpread[vpod.GetKey()] = make(map[string]int32) for i := 0; i < len(ps); i++ { podName := ps[i].PodName vreplicas := ps[i].VReplicas - // Account for reserved vreplicas - vreplicas = withReserved(vpod.GetKey(), podName, vreplicas, reserved) - free, last = s.updateFreeCapacity(logger, free, last, podName, vreplicas) withPlacement[vpod.GetKey()][podName] = true @@ -291,47 +199,15 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN } if pod != nil && schedulablePods.Has(OrdinalFromPodName(pod.GetName())) { - nodeName := pod.Spec.NodeName //node name for this pod - zoneName := nodeToZoneMap[nodeName] //zone name for this pod podSpread[vpod.GetKey()][podName] = podSpread[vpod.GetKey()][podName] + vreplicas - nodeSpread[vpod.GetKey()][nodeName] = nodeSpread[vpod.GetKey()][nodeName] + vreplicas - zoneSpread[vpod.GetKey()][zoneName] = zoneSpread[vpod.GetKey()][zoneName] + vreplicas - } - } - } - - // Account for reserved vreplicas with no prior placements - for key, ps := range reserved { - for podName, rvreplicas := range ps { - if wp, ok := withPlacement[key]; ok { - if _, ok := wp[podName]; ok { - // already accounted for - continue - } - - pod, err := s.podLister.Get(podName) - if err != nil { - logger.Warnw("Failed to get pod", zap.String("podName", podName), zap.Error(err)) - } - - if pod != nil && schedulablePods.Has(OrdinalFromPodName(pod.GetName())) { - nodeName := pod.Spec.NodeName //node name for this pod - zoneName := nodeToZoneMap[nodeName] //zone name for this pod - podSpread[key][podName] = podSpread[key][podName] + rvreplicas - nodeSpread[key][nodeName] = nodeSpread[key][nodeName] + rvreplicas - zoneSpread[key][zoneName] = zoneSpread[key][zoneName] + rvreplicas - } } - - free, last = s.updateFreeCapacity(logger, free, last, podName, rvreplicas) } } - state := &State{FreeCap: free, SchedulablePods: schedulablePods.List(), LastOrdinal: last, Capacity: s.capacity, Replicas: scale.Spec.Replicas, NumZones: int32(len(zoneMap)), NumNodes: int32(len(nodeToZoneMap)), - SchedulerPolicy: s.schedulerPolicy, SchedPolicy: s.schedPolicy, DeschedPolicy: s.deschedPolicy, NodeToZoneMap: nodeToZoneMap, StatefulSetName: s.statefulSetName, PodLister: s.podLister, - PodSpread: podSpread, NodeSpread: nodeSpread, ZoneSpread: zoneSpread, Pending: pending, ExpectedVReplicaByVPod: expectedVReplicasByVPod} + state := &State{FreeCap: free, SchedulablePods: schedulablePods.List(), LastOrdinal: last, Capacity: s.capacity, Replicas: scale.Spec.Replicas, StatefulSetName: s.statefulSetName, PodLister: s.podLister, + PodSpread: podSpread, Pending: pending, ExpectedVReplicaByVPod: expectedVReplicasByVPod} - logger.Infow("cluster state info", zap.Any("state", state), zap.Any("reserved", toJSONable(reserved))) + logger.Infow("cluster state info", zap.Any("state", state)) return state, nil } @@ -392,27 +268,6 @@ func grow(slice []int32, ordinal int32, def int32) []int32 { return slice } -func withReserved(key types.NamespacedName, podName string, committed int32, reserved map[types.NamespacedName]map[string]int32) int32 { - if reserved != nil { - if rps, ok := reserved[key]; ok { - if rvreplicas, ok := rps[podName]; ok { - if committed == rvreplicas { - // new placement has been committed. - delete(rps, podName) - if len(rps) == 0 { - delete(reserved, key) - } - } else { - // new placement hasn't been committed yet. Adjust locally - // needed for descheduling vreps using policies - return rvreplicas - } - } - } - } - return committed -} - func isPodUnschedulable(pod *v1.Pod) bool { annotVal, ok := pod.ObjectMeta.Annotations[scheduler.PodAnnotationKey] unschedulable, err := strconv.ParseBool(annotVal) @@ -423,50 +278,22 @@ func isPodUnschedulable(pod *v1.Pod) bool { return isMarkedUnschedulable || isPending } -func isNodeUnschedulable(node *v1.Node) bool { - noExec := &v1.Taint{ - Key: "node.kubernetes.io/unreachable", - Effect: v1.TaintEffectNoExecute, - } - - noSched := &v1.Taint{ - Key: "node.kubernetes.io/unreachable", - Effect: v1.TaintEffectNoSchedule, - } - - return node.Spec.Unschedulable || - contains(node.Spec.Taints, noExec) || - contains(node.Spec.Taints, noSched) -} - -func contains(taints []v1.Taint, taint *v1.Taint) bool { - for _, v := range taints { - if v.MatchTaint(taint) { - return true - } - } - return false -} - func (s *State) MarshalJSON() ([]byte, error) { type S struct { - FreeCap []int32 `json:"freeCap"` - SchedulablePods []int32 `json:"schedulablePods"` - LastOrdinal int32 `json:"lastOrdinal"` - Capacity int32 `json:"capacity"` - Replicas int32 `json:"replicas"` - NumZones int32 `json:"numZones"` - NumNodes int32 `json:"numNodes"` - NodeToZoneMap map[string]string `json:"nodeToZoneMap"` - StatefulSetName string `json:"statefulSetName"` - PodSpread map[string]map[string]int32 `json:"podSpread"` - NodeSpread map[string]map[string]int32 `json:"nodeSpread"` - ZoneSpread map[string]map[string]int32 `json:"zoneSpread"` - SchedulerPolicy scheduler.SchedulerPolicyType `json:"schedulerPolicy"` - SchedPolicy *scheduler.SchedulerPolicy `json:"schedPolicy"` - DeschedPolicy *scheduler.SchedulerPolicy `json:"deschedPolicy"` - Pending map[string]int32 `json:"pending"` + FreeCap []int32 `json:"freeCap"` + SchedulablePods []int32 `json:"schedulablePods"` + LastOrdinal int32 `json:"lastOrdinal"` + Capacity int32 `json:"capacity"` + Replicas int32 `json:"replicas"` + NumZones int32 `json:"numZones"` + NumNodes int32 `json:"numNodes"` + NodeToZoneMap map[string]string `json:"nodeToZoneMap"` + StatefulSetName string `json:"statefulSetName"` + PodSpread map[string]map[string]int32 `json:"podSpread"` + NodeSpread map[string]map[string]int32 `json:"nodeSpread"` + ZoneSpread map[string]map[string]int32 `json:"zoneSpread"` + Pending map[string]int32 `json:"pending"` } sj := S{ @@ -475,23 +302,15 @@ func (s *State) MarshalJSON() ([]byte, error) { LastOrdinal: s.LastOrdinal, Capacity: s.Capacity, Replicas: s.Replicas, - NumZones: s.NumZones, - NumNodes: s.NumNodes, - NodeToZoneMap: s.NodeToZoneMap, StatefulSetName: s.StatefulSetName, - PodSpread: toJSONable(s.PodSpread), - NodeSpread: toJSONable(s.NodeSpread), - ZoneSpread: toJSONable(s.ZoneSpread), - SchedulerPolicy: s.SchedulerPolicy, - SchedPolicy: s.SchedPolicy, - DeschedPolicy: s.DeschedPolicy, + PodSpread: ToJSONable(s.PodSpread), Pending: toJSONablePending(s.Pending), } return json.Marshal(sj) } -func toJSONable(ps map[types.NamespacedName]map[string]int32) map[string]map[string]int32 { +func ToJSONable(ps map[types.NamespacedName]map[string]int32) map[string]map[string]int32 { r := make(map[string]map[string]int32, len(ps)) for k, v := range ps { r[k.String()] = v diff --git a/pkg/scheduler/state/state_test.go b/pkg/scheduler/state/state_test.go index 5a0d09b6f6f..b1f518d95df 100644 --- a/pkg/scheduler/state/state_test.go +++ b/pkg/scheduler/state/state_test.go @@ -46,48 +46,31 @@ const ( func TestStateBuilder(t *testing.T) { testCases := []struct { - name string - replicas int32 - pendingReplicas int32 - vpods [][]duckv1alpha1.Placement - expected State - freec int32 - schedulerPolicyType scheduler.SchedulerPolicyType - schedulerPolicy *scheduler.SchedulerPolicy - deschedulerPolicy *scheduler.SchedulerPolicy - reserved map[types.NamespacedName]map[string]int32 - nodes []*v1.Node - err error + name string + replicas int32 + pendingReplicas int32 + vpods [][]duckv1alpha1.Placement + expected State + freec int32 + err error }{ { - name: "no vpods", - replicas: int32(0), - vpods: [][]duckv1alpha1.Placement{}, - expected: State{Capacity: 10, FreeCap: []int32{}, SchedulablePods: []int32{}, LastOrdinal: -1, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, Pending: map[types.NamespacedName]int32{}, ExpectedVReplicaByVPod: map[types.NamespacedName]int32{}}, - freec: int32(0), - schedulerPolicyType: scheduler.MAXFILLUP, + name: "no vpods", + replicas: int32(0), + vpods: [][]duckv1alpha1.Placement{}, + expected: State{Capacity: 10, FreeCap: []int32{}, SchedulablePods: []int32{}, LastOrdinal: -1, StatefulSetName: sfsName, Pending: map[types.NamespacedName]int32{}, ExpectedVReplicaByVPod: map[types.NamespacedName]int32{}}, + freec: int32(0), }, { name: "one vpods", replicas: int32(1), vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}}, - expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, NumNodes: 1, NumZones: 1, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0"}, + expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, StatefulSetName: sfsName, PodSpread: map[types.NamespacedName]map[string]int32{ {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { "statefulset-name-0": 1, }, }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - }, - }, Pending: map[types.NamespacedName]int32{ {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, }, @@ -95,9 +78,7 @@ func TestStateBuilder(t *testing.T) { {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1, }, }, - freec: int32(9), - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0")}, + freec: int32(9), }, { name: "many vpods, no gaps", @@ -107,8 +88,7 @@ func TestStateBuilder(t *testing.T) { {{PodName: "statefulset-name-1", VReplicas: 2}}, {{PodName: "statefulset-name-1", VReplicas: 3}, {PodName: "statefulset-name-0", VReplicas: 1}}, }, - expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(0), int32(1), int32(2)}, LastOrdinal: 2, Replicas: 3, NumNodes: 3, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2"}, + expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(0), int32(1), int32(2)}, LastOrdinal: 2, Replicas: 3, StatefulSetName: sfsName, PodSpread: map[types.NamespacedName]map[string]int32{ {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { "statefulset-name-0": 1, @@ -122,32 +102,6 @@ func TestStateBuilder(t *testing.T) { "statefulset-name-1": 3, }, }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - "node-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "node-1": 2, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "node-0": 1, - "node-1": 3, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - "zone-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "zone-1": 2, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "zone-0": 1, - "zone-1": 3, - }, - }, Pending: map[types.NamespacedName]int32{ {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 0, @@ -159,9 +113,7 @@ func TestStateBuilder(t *testing.T) { {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, }, }, - freec: int32(18), - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2")}, + freec: int32(18), }, { name: "many vpods, unschedulable pending pods (statefulset-name-0)", @@ -172,8 +124,7 @@ func TestStateBuilder(t *testing.T) { {{PodName: "statefulset-name-1", VReplicas: 2}}, {{PodName: "statefulset-name-1", VReplicas: 3}, {PodName: "statefulset-name-0", VReplicas: 1}}, }, - expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(1), int32(2)}, LastOrdinal: 2, Replicas: 3, NumNodes: 3, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2"}, + expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(1), int32(2)}, LastOrdinal: 2, Replicas: 3, StatefulSetName: sfsName, PodSpread: map[types.NamespacedName]map[string]int32{ {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { "statefulset-name-2": 5, @@ -185,28 +136,6 @@ func TestStateBuilder(t *testing.T) { "statefulset-name-1": 3, }, }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "node-1": 2, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "node-1": 3, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "zone-1": 2, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "zone-1": 3, - }, - }, Pending: map[types.NamespacedName]int32{ {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 0, @@ -218,9 +147,7 @@ func TestStateBuilder(t *testing.T) { {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, }, }, - freec: int32(10), - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2")}, + freec: int32(10), }, { name: "many vpods, with gaps", @@ -230,145 +157,7 @@ func TestStateBuilder(t *testing.T) { {{PodName: "statefulset-name-1", VReplicas: 0}}, {{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}}, }, - expected: State{Capacity: 10, FreeCap: []int32{int32(9), int32(10), int32(5), int32(10)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 3, Replicas: 4, NumNodes: 4, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0"}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "statefulset-name-0": 1, - "statefulset-name-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "statefulset-name-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "statefulset-name-1": 0, - "statefulset-name-3": 0, - }, - }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - "node-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "node-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "node-1": 0, - "node-3": 0, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - "zone-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "zone-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "zone-0": 0, - "zone-1": 0, - }, - }, - Pending: map[types.NamespacedName]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, - {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, - }, - ExpectedVReplicaByVPod: map[types.NamespacedName]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, - {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, - }, - }, - freec: int32(34), - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0")}, - }, - { - name: "many vpods, with gaps and reserved vreplicas", - replicas: int32(4), - vpods: [][]duckv1alpha1.Placement{ - {{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}}, - {{PodName: "statefulset-name-1", VReplicas: 0}}, - {{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}}, - }, - expected: State{Capacity: 10, FreeCap: []int32{int32(3), int32(10), int32(5), int32(10)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 3, Replicas: 4, NumNodes: 4, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0"}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "statefulset-name-0": 2, - "statefulset-name-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "statefulset-name-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "statefulset-name-1": 0, - "statefulset-name-3": 0, - }, - }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 2, - "node-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "node-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "node-1": 0, - "node-3": 0, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 2, - "zone-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "zone-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "zone-0": 0, - "zone-1": 0, - }, - }, - Pending: map[types.NamespacedName]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, - {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, - }, - ExpectedVReplicaByVPod: map[types.NamespacedName]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, - {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, - }, - }, - freec: int32(28), - reserved: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "statefulset-name-0": 2, - "statefulset-name-2": 5, - }, - {Name: vpodName + "-3", Namespace: vpodNs + "-3"}: { - "statefulset-name-0": 5, - }, - }, - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0")}, - }, - { - name: "many vpods, with gaps and reserved vreplicas on existing and new placements, fully committed", - replicas: int32(4), - vpods: [][]duckv1alpha1.Placement{ - {{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}}, - {{PodName: "statefulset-name-1", VReplicas: 0}}, - {{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}}, - }, - expected: State{Capacity: 10, FreeCap: []int32{int32(4), int32(7), int32(5), int32(10), int32(5)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 4, Replicas: 4, NumNodes: 4, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0"}, + expected: State{Capacity: 10, FreeCap: []int32{int32(9), int32(10), int32(5), int32(10)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 3, Replicas: 4, StatefulSetName: sfsName, PodSpread: map[types.NamespacedName]map[string]int32{ {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { "statefulset-name-0": 1, @@ -382,32 +171,6 @@ func TestStateBuilder(t *testing.T) { "statefulset-name-3": 0, }, }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - "node-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "node-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "node-1": 0, - "node-3": 0, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - "zone-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "zone-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "zone-0": 0, - "zone-1": 0, - }, - }, Pending: map[types.NamespacedName]int32{ {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, @@ -419,116 +182,18 @@ func TestStateBuilder(t *testing.T) { {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, }, }, - freec: int32(26), - reserved: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-3", Namespace: "vpod-ns-3"}: { - "statefulset-name-4": 5, - }, - {Name: "vpod-name-4", Namespace: "vpod-ns-4"}: { - "statefulset-name-0": 5, - "statefulset-name-1": 3, - }, - }, - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0")}, - }, - { - name: "many vpods, with gaps and reserved vreplicas on existing and new placements, partially committed", - replicas: int32(5), - vpods: [][]duckv1alpha1.Placement{ - {{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}}, - {{PodName: "statefulset-name-1", VReplicas: 0}}, - {{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}}, - }, - expected: State{Capacity: 10, FreeCap: []int32{int32(4), int32(7), int32(5), int32(10), int32(2)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)}, LastOrdinal: 4, Replicas: 5, NumNodes: 5, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0", "node-4": "zone-1"}, - PodSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "statefulset-name-0": 1, - "statefulset-name-2": 5, - "statefulset-name-4": 8, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "statefulset-name-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "statefulset-name-1": 0, - "statefulset-name-3": 0, - }, - }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - "node-2": 5, - "node-4": 8, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "node-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "node-1": 0, - "node-3": 0, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - "zone-1": 8, - "zone-2": 5, - }, - {Name: vpodName + "-1", Namespace: vpodNs + "-1"}: { - "zone-1": 0, - }, - {Name: vpodName + "-2", Namespace: vpodNs + "-2"}: { - "zone-0": 0, - "zone-1": 0, - }, - }, - Pending: map[types.NamespacedName]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, - {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, - }, - ExpectedVReplicaByVPod: map[types.NamespacedName]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1, - {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1, - {Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1, - }, - }, - freec: int32(28), - reserved: map[types.NamespacedName]map[string]int32{ - {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: { - "statefulset-name-4": 8, - }, - {Name: "vpod-name-4", Namespace: "vpod-ns-4"}: { - "statefulset-name-0": 5, - "statefulset-name-1": 3, - }, - }, - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0"), tscheduler.MakeNode("node-4", "zone-1")}, + freec: int32(34), }, { name: "three vpods but one tainted and one with no zone label", replicas: int32(1), vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}}, - expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, NumNodes: 2, NumZones: 2, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": scheduler.UnknownZone}, + expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, StatefulSetName: sfsName, PodSpread: map[types.NamespacedName]map[string]int32{ {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { "statefulset-name-0": 1, }, }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - }, - }, Pending: map[types.NamespacedName]int32{ {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, }, @@ -536,31 +201,18 @@ func TestStateBuilder(t *testing.T) { {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1, }, }, - freec: int32(9), - schedulerPolicyType: scheduler.MAXFILLUP, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNodeNoLabel("node-1"), tscheduler.MakeNodeTainted("node-2", "zone-2")}, + freec: int32(9), }, { name: "one vpod (HA)", replicas: int32(1), vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}}, - expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, NumNodes: 1, NumZones: 1, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, - NodeToZoneMap: map[string]string{"node-0": "zone-0"}, + expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, StatefulSetName: sfsName, PodSpread: map[types.NamespacedName]map[string]int32{ {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { "statefulset-name-0": 1, }, }, - NodeSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "node-0": 1, - }, - }, - ZoneSpread: map[types.NamespacedName]map[string]int32{ - {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { - "zone-0": 1, - }, - }, Pending: map[types.NamespacedName]int32{ {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, }, @@ -569,17 +221,6 @@ func TestStateBuilder(t *testing.T) { }, }, freec: int32(9), - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0")}, }, } @@ -587,7 +228,6 @@ func TestStateBuilder(t *testing.T) { t.Run(tc.name, func(t *testing.T) { ctx, _ := tscheduler.SetupFakeContext(t) vpodClient := tscheduler.NewVPodClient() - nodelist := make([]runtime.Object, 0, len(tc.nodes)) podlist := make([]runtime.Object, 0, tc.replicas) if tc.pendingReplicas > tc.replicas { @@ -610,14 +250,6 @@ func TestStateBuilder(t *testing.T) { } } - for i := 0; i < len(tc.nodes); i++ { - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tc.nodes[i], metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - for i := tc.replicas - 1; i >= 0; i-- { var pod *v1.Pod var err error @@ -641,12 +273,11 @@ func TestStateBuilder(t *testing.T) { } lsp := listers.NewListers(podlist) - lsn := listers.NewListers(nodelist) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) - stateBuilder := NewStateBuilder(sfsName, vpodClient.List, int32(10), tc.schedulerPolicyType, &scheduler.SchedulerPolicy{}, &scheduler.SchedulerPolicy{}, lsp.GetPodLister().Pods(testNs), lsn.GetNodeLister(), scaleCache) - state, err := stateBuilder.State(ctx, tc.reserved) + stateBuilder := NewStateBuilder(sfsName, vpodClient.List, int32(10), lsp.GetPodLister().Pods(testNs), scaleCache) + state, err := stateBuilder.State(ctx) if err != nil { t.Fatal("unexpected error", err) } @@ -658,15 +289,6 @@ func TestStateBuilder(t *testing.T) { if tc.expected.PodSpread == nil { tc.expected.PodSpread = make(map[types.NamespacedName]map[string]int32) } - if tc.expected.NodeSpread == nil { - tc.expected.NodeSpread = make(map[types.NamespacedName]map[string]int32) - } - if tc.expected.ZoneSpread == nil { - tc.expected.ZoneSpread = make(map[types.NamespacedName]map[string]int32) - } - if tc.expected.NodeToZoneMap == nil { - tc.expected.NodeToZoneMap = make(map[string]string) - } if !reflect.DeepEqual(*state, tc.expected) { diff := cmp.Diff(tc.expected, *state, cmpopts.IgnoreInterfaces(struct{ corev1.PodNamespaceLister }{})) t.Errorf("unexpected state, got %v, want %v\n(-want, +got)\n%s", *state, tc.expected, diff) @@ -675,14 +297,6 @@ func TestStateBuilder(t *testing.T) { if state.FreeCapacity() != tc.freec { t.Errorf("unexpected free capacity, got %d, want %d", state.FreeCapacity(), tc.freec) } - - if tc.schedulerPolicy != nil && !SatisfyZoneAvailability(state.SchedulablePods, state) { - t.Errorf("unexpected state, got %v, want %v", *state, tc.expected) - } - - if tc.schedulerPolicy != nil && !SatisfyNodeAvailability(state.SchedulablePods, state) { - t.Errorf("unexpected state, got %v, want %v", *state, tc.expected) - } }) } } diff --git a/pkg/scheduler/statefulset/autoscaler.go b/pkg/scheduler/statefulset/autoscaler.go index 3245dabc161..8b61ca4a83c 100644 --- a/pkg/scheduler/statefulset/autoscaler.go +++ b/pkg/scheduler/statefulset/autoscaler.go @@ -28,6 +28,7 @@ import ( v1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/types" + "k8s.io/utils/integer" "knative.dev/pkg/logging" "knative.dev/pkg/reconciler" @@ -62,7 +63,8 @@ type autoscaler struct { evictor scheduler.Evictor // capacity is the total number of virtual replicas available per pod. - capacity int32 + capacity int32 + minReplicas int32 // refreshPeriod is how often the autoscaler tries to scale down the statefulset refreshPeriod time.Duration @@ -113,6 +115,7 @@ func newAutoscaler(cfg *Config, stateAccessor st.StateAccessor, statefulSetCache evictor: cfg.Evictor, trigger: make(chan context.Context, 1), capacity: cfg.PodCapacity, + minReplicas: cfg.MinReplicas, refreshPeriod: cfg.RefreshPeriod, retryPeriod: cfg.RetryPeriod, lock: new(sync.Mutex), @@ -188,7 +191,7 @@ func (a *autoscaler) doautoscale(ctx context.Context, attemptScaleDown bool) err logger := logging.FromContext(ctx).With("component", "autoscaler") ctx = logging.WithLogger(ctx, logger) - state, err := a.stateAccessor.State(ctx, a.getReserved()) + state, err := a.stateAccessor.State(ctx) if err != nil { logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err)) return err @@ -205,46 +208,15 @@ func (a *autoscaler) doautoscale(ctx context.Context, attemptScaleDown bool) err zap.Int32("replicas", scale.Spec.Replicas), zap.Any("state", state)) - var scaleUpFactor, newreplicas, minNumPods int32 - scaleUpFactor = 1 // Non-HA scaling - if state.SchedPolicy != nil && contains(nil, state.SchedPolicy.Priorities, st.AvailabilityZonePriority) { //HA scaling across zones - scaleUpFactor = state.NumZones - } - if state.SchedPolicy != nil && contains(nil, state.SchedPolicy.Priorities, st.AvailabilityNodePriority) { //HA scaling across nodes - scaleUpFactor = state.NumNodes - } - - newreplicas = state.LastOrdinal + 1 // Ideal number - - if state.SchedulerPolicy == scheduler.MAXFILLUP { - newreplicas = int32(math.Ceil(float64(state.TotalExpectedVReplicas()) / float64(state.Capacity))) - } else { - // Take into account pending replicas and pods that are already filled (for even pod spread) - pending := state.TotalPending() - if pending > 0 { - // Make sure to allocate enough pods for holding all pending replicas. - if state.SchedPolicy != nil && contains(state.SchedPolicy.Predicates, nil, st.EvenPodSpread) && len(state.FreeCap) > 0 { //HA scaling across pods - leastNonZeroCapacity := a.minNonZeroInt(state.FreeCap) - minNumPods = int32(math.Ceil(float64(pending) / float64(leastNonZeroCapacity))) - } else { - minNumPods = int32(math.Ceil(float64(pending) / float64(a.capacity))) - } - newreplicas += int32(math.Ceil(float64(minNumPods)/float64(scaleUpFactor)) * float64(scaleUpFactor)) - } - - if newreplicas <= state.LastOrdinal { - // Make sure to never scale down past the last ordinal - newreplicas = state.LastOrdinal + scaleUpFactor - } - } + newReplicas := integer.Int32Max(int32(math.Ceil(float64(state.TotalExpectedVReplicas())/float64(state.Capacity))), a.minReplicas) // Only scale down if permitted - if !attemptScaleDown && newreplicas < scale.Spec.Replicas { - newreplicas = scale.Spec.Replicas + if !attemptScaleDown && newReplicas < scale.Spec.Replicas { + newReplicas = scale.Spec.Replicas } - if newreplicas != scale.Spec.Replicas { - scale.Spec.Replicas = newreplicas + if newReplicas != scale.Spec.Replicas { + scale.Spec.Replicas = newReplicas logger.Infow("updating adapter replicas", zap.Int32("replicas", scale.Spec.Replicas)) _, err = a.statefulSetCache.UpdateScale(ctx, a.statefulSetName, scale, metav1.UpdateOptions{}) @@ -255,12 +227,12 @@ func (a *autoscaler) doautoscale(ctx context.Context, attemptScaleDown bool) err } else if attemptScaleDown { // since the number of replicas hasn't changed and time has approached to scale down, // take the opportunity to compact the vreplicas - return a.mayCompact(logger, state, scaleUpFactor) + return a.mayCompact(logger, state) } return nil } -func (a *autoscaler) mayCompact(logger *zap.SugaredLogger, s *st.State, scaleUpFactor int32) error { +func (a *autoscaler) mayCompact(logger *zap.SugaredLogger, s *st.State) error { // This avoids a too aggressive scale down by adding a "grace period" based on the refresh // period @@ -275,53 +247,33 @@ func (a *autoscaler) mayCompact(logger *zap.SugaredLogger, s *st.State, scaleUpF } logger.Debugw("Trying to compact and scale down", - zap.Int32("scaleUpFactor", scaleUpFactor), zap.Any("state", s), ) // when there is only one pod there is nothing to move or number of pods is just enough! - if s.LastOrdinal < 1 || len(s.SchedulablePods) <= int(scaleUpFactor) { + if s.LastOrdinal < 1 || len(s.SchedulablePods) <= 1 { return nil } - if s.SchedulerPolicy == scheduler.MAXFILLUP { - // Determine if there is enough free capacity to - // move all vreplicas placed in the last pod to pods with a lower ordinal - freeCapacity := s.FreeCapacity() - s.Free(s.LastOrdinal) - usedInLastPod := s.Capacity - s.Free(s.LastOrdinal) - - if freeCapacity >= usedInLastPod { - a.lastCompactAttempt = time.Now() - err := a.compact(s, scaleUpFactor) - if err != nil { - return fmt.Errorf("vreplicas compaction failed (scaleUpFactor %d): %w", scaleUpFactor, err) - } - } - - // only do 1 replica at a time to avoid overloading the scheduler with too many - // rescheduling requests. - } else if s.SchedPolicy != nil { - //Below calculation can be optimized to work for recovery scenarios when nodes/zones are lost due to failure - freeCapacity := s.FreeCapacity() - usedInLastXPods := s.Capacity * scaleUpFactor - for i := int32(0); i < scaleUpFactor && s.LastOrdinal-i >= 0; i++ { - freeCapacity = freeCapacity - s.Free(s.LastOrdinal-i) - usedInLastXPods = usedInLastXPods - s.Free(s.LastOrdinal-i) - } + // Determine if there is enough free capacity to + // move all vreplicas placed in the last pod to pods with a lower ordinal + freeCapacity := s.FreeCapacity() - s.Free(s.LastOrdinal) + usedInLastPod := s.Capacity - s.Free(s.LastOrdinal) - if (freeCapacity >= usedInLastXPods) && //remaining pods can hold all vreps from evicted pods - (s.Replicas-scaleUpFactor >= scaleUpFactor) { //remaining # of pods is enough for HA scaling - a.lastCompactAttempt = time.Now() - err := a.compact(s, scaleUpFactor) - if err != nil { - return fmt.Errorf("vreplicas compaction failed (scaleUpFactor %d): %w", scaleUpFactor, err) - } + if freeCapacity >= usedInLastPod { + a.lastCompactAttempt = time.Now() + err := a.compact(s) + if err != nil { + return fmt.Errorf("vreplicas compaction failed: %w", err) } } + + // only do 1 replica at a time to avoid overloading the scheduler with too many + // rescheduling requests. return nil } -func (a *autoscaler) compact(s *st.State, scaleUpFactor int32) error { +func (a *autoscaler) compact(s *st.State) error { var pod *v1.Pod vpods, err := a.vpodLister() if err != nil { @@ -331,47 +283,20 @@ func (a *autoscaler) compact(s *st.State, scaleUpFactor int32) error { for _, vpod := range vpods { placements := vpod.GetPlacements() for i := len(placements) - 1; i >= 0; i-- { //start from the last placement - for j := int32(0); j < scaleUpFactor; j++ { - ordinal := st.OrdinalFromPodName(placements[i].PodName) - - if ordinal == s.LastOrdinal-j { - pod, err = s.PodLister.Get(placements[i].PodName) - if err != nil { - return fmt.Errorf("failed to get pod %s: %w", placements[i].PodName, err) - } - - err = a.evictor(pod, vpod, &placements[i]) - if err != nil { - return fmt.Errorf("failed to evict pod %s: %w", pod.Name, err) - } + ordinal := st.OrdinalFromPodName(placements[i].PodName) + + if ordinal == s.LastOrdinal { + pod, err = s.PodLister.Get(placements[i].PodName) + if err != nil { + return fmt.Errorf("failed to get pod %s: %w", placements[i].PodName, err) + } + + err = a.evictor(pod, vpod, &placements[i]) + if err != nil { + return fmt.Errorf("failed to evict pod %s: %w", pod.Name, err) } } } } return nil } - -func contains(preds []scheduler.PredicatePolicy, priors []scheduler.PriorityPolicy, name string) bool { - for _, v := range preds { - if v.Name == name { - return true - } - } - for _, v := range priors { - if v.Name == name { - return true - } - } - - return false -} - -func (a *autoscaler) minNonZeroInt(slice []int32) int32 { - min := a.capacity - for _, v := range slice { - if v < min && v > 0 { - min = v - } - } - return min -} diff --git a/pkg/scheduler/statefulset/autoscaler_test.go b/pkg/scheduler/statefulset/autoscaler_test.go index d7ff1c02310..441c6cab7a8 100644 --- a/pkg/scheduler/statefulset/autoscaler_test.go +++ b/pkg/scheduler/statefulset/autoscaler_test.go @@ -41,7 +41,6 @@ import ( duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" "knative.dev/eventing/pkg/scheduler" "knative.dev/eventing/pkg/scheduler/state" - st "knative.dev/eventing/pkg/scheduler/state" tscheduler "knative.dev/eventing/pkg/scheduler/testing" ) @@ -51,15 +50,12 @@ const ( func TestAutoscaler(t *testing.T) { testCases := []struct { - name string - replicas int32 - vpods []scheduler.VPod - scaleDown bool - wantReplicas int32 - schedulerPolicyType scheduler.SchedulerPolicyType - schedulerPolicy *scheduler.SchedulerPolicy - deschedulerPolicy *scheduler.SchedulerPolicy - reserved map[types.NamespacedName]map[string]int32 + name string + replicas int32 + vpods []scheduler.VPod + scaleDown bool + wantReplicas int32 + reserved map[types.NamespacedName]map[string]int32 }{ { name: "no replicas, no placements, no pending", @@ -67,8 +63,7 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 0, nil), }, - wantReplicas: int32(0), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(0), }, { name: "no replicas, no placements, with pending", @@ -76,8 +71,7 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 5, nil), }, - wantReplicas: int32(1), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(1), }, { name: "no replicas, with placements, no pending", @@ -87,8 +81,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(2), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(2), }, { name: "no replicas, with placements, with pending, enough capacity", @@ -98,8 +91,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(2), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(2), }, { name: "no replicas, with placements, with pending, not enough capacity", @@ -109,8 +101,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(3), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(3), }, { name: "with replicas, no placements, no pending, scale down", @@ -118,17 +109,15 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 0, nil), }, - scaleDown: true, - wantReplicas: int32(0), - schedulerPolicyType: scheduler.MAXFILLUP, + scaleDown: true, + wantReplicas: int32(0), }, { - name: "with replicas, no placements, no pending, scale down (no vpods)", - replicas: int32(3), - vpods: []scheduler.VPod{}, - scaleDown: true, - wantReplicas: int32(0), - schedulerPolicyType: scheduler.MAXFILLUP, + name: "with replicas, no placements, no pending, scale down (no vpods)", + replicas: int32(3), + vpods: []scheduler.VPod{}, + scaleDown: true, + wantReplicas: int32(0), }, { name: "with replicas, no placements, with pending, scale down", @@ -136,9 +125,8 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 5, nil), }, - scaleDown: true, - wantReplicas: int32(1), - schedulerPolicyType: scheduler.MAXFILLUP, + scaleDown: true, + wantReplicas: int32(1), }, { name: "with replicas, no placements, with pending, scale down disabled", @@ -146,9 +134,8 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 5, nil), }, - scaleDown: false, - wantReplicas: int32(3), - schedulerPolicyType: scheduler.MAXFILLUP, + scaleDown: false, + wantReplicas: int32(3), }, { name: "with replicas, no placements, with pending, scale up", @@ -156,8 +143,7 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 45, nil), }, - wantReplicas: int32(5), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(5), }, { name: "with replicas, no placements, with pending, no change", @@ -165,8 +151,7 @@ func TestAutoscaler(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 25, nil), }, - wantReplicas: int32(3), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(3), }, { name: "with replicas, with placements, no pending, no change", @@ -176,8 +161,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(2), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(2), }, { name: "with replicas, with placements, with reserved", @@ -187,8 +171,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(2), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(2), reserved: map[types.NamespacedName]map[string]int32{ {Namespace: testNs, Name: "vpod-1"}: { "statefulset-name-0": 8, @@ -203,8 +186,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(2)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(3), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(3), reserved: map[types.NamespacedName]map[string]int32{ {Namespace: testNs, Name: "vpod-1"}: { "statefulset-name-0": 9, @@ -219,8 +201,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(3), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(3), }, { name: "with replicas, with placements, with pending (scale up)", @@ -233,8 +214,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(4), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(4), }, { name: "with replicas, with placements, with pending (scale up), 1 over capacity", @@ -247,8 +227,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(5), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(5), }, { name: "with replicas, with placements, with pending, attempt scale down", @@ -258,9 +237,8 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(3), - scaleDown: true, - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(3), + scaleDown: true, }, { name: "with replicas, with placements, no pending, scale down", @@ -270,9 +248,8 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - scaleDown: true, - wantReplicas: int32(2), - schedulerPolicyType: scheduler.MAXFILLUP, + scaleDown: true, + wantReplicas: int32(2), }, { name: "with replicas, with placements, with pending, enough capacity", @@ -282,8 +259,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(2), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(2), }, { name: "with replicas, with placements, with pending, not enough capacity", @@ -293,8 +269,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - wantReplicas: int32(3), - schedulerPolicyType: scheduler.MAXFILLUP, + wantReplicas: int32(3), }, { name: "with replicas, with placements, no pending, round up capacity", @@ -307,105 +282,7 @@ func TestAutoscaler(t *testing.T) { {PodName: "statefulset-name-3", VReplicas: int32(1)}, {PodName: "statefulset-name-4", VReplicas: int32(1)}}), }, - wantReplicas: int32(5), - schedulerPolicyType: scheduler.MAXFILLUP, - }, - { - name: "with replicas, with placements, with pending, enough capacity, with Predicates and Zone Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, wantReplicas: int32(5), - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - }, - { - name: "with replicas, with placements, with pending, enough capacity, with Predicates and Node Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, - wantReplicas: int32(8), - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - }, - { - name: "with replicas, with placements, with pending, enough capacity, with Pod Predicates and Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, - wantReplicas: int32(4), - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - }, - { - name: "with replicas, with placements, with pending, enough capacity, with Pod Predicates and Zone Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, - wantReplicas: int32(5), - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - }, - { - name: "with replicas, with placements, with pending, enough capacity, with Pod Predicates and Node Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, - wantReplicas: int32(8), - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, }, } @@ -413,22 +290,9 @@ func TestAutoscaler(t *testing.T) { t.Run(tc.name, func(t *testing.T) { ctx, _ := tscheduler.SetupFakeContext(t) - nodelist := make([]runtime.Object, 0, numZones) podlist := make([]runtime.Object, 0, tc.replicas) vpodClient := tscheduler.NewVPodClient() - for i := int32(0); i < numZones; i++ { - for j := int32(0); j < numNodes/numZones; j++ { - nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - } - for i := int32(0); i < int32(math.Max(float64(tc.wantReplicas), float64(tc.replicas))); i++ { nodeName := "node" + fmt.Sprint(i) podName := sfsName + "-" + fmt.Sprint(i) @@ -440,19 +304,14 @@ func TestAutoscaler(t *testing.T) { } var lspp v1.PodNamespaceLister - var lsnn v1.NodeLister if len(podlist) != 0 { lsp := listers.NewListers(podlist) lspp = lsp.GetPodLister().Pods(testNs) } - if len(nodelist) != 0 { - lsn := listers.NewListers(nodelist) - lsnn = lsn.GetNodeLister() - } scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) - stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, tc.schedulerPolicyType, tc.schedulerPolicy, tc.deschedulerPolicy, lspp, lsnn, scaleCache) + stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, lspp, scaleCache) sfsClient := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs) _, err := sfsClient.Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, tc.replicas), metav1.CreateOptions{}) @@ -511,9 +370,8 @@ func TestAutoscalerScaleDownToZero(t *testing.T) { }) vpodClient := tscheduler.NewVPodClient() - ls := listers.NewListers(nil) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) - stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, scheduler.MAXFILLUP, &scheduler.SchedulerPolicy{}, &scheduler.SchedulerPolicy{}, nil, ls.GetNodeLister(), scaleCache) + stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, nil, scaleCache) sfsClient := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs) _, err := sfsClient.Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, 10), metav1.CreateOptions{}) @@ -571,13 +429,10 @@ func TestAutoscalerScaleDownToZero(t *testing.T) { func TestCompactor(t *testing.T) { testCases := []struct { - name string - replicas int32 - vpods []scheduler.VPod - schedulerPolicyType scheduler.SchedulerPolicyType - wantEvictions map[types.NamespacedName][]duckv1alpha1.Placement - schedulerPolicy *scheduler.SchedulerPolicy - deschedulerPolicy *scheduler.SchedulerPolicy + name string + replicas int32 + vpods []scheduler.VPod + wantEvictions map[types.NamespacedName][]duckv1alpha1.Placement }{ { name: "no replicas, no placements, no pending", @@ -585,8 +440,7 @@ func TestCompactor(t *testing.T) { vpods: []scheduler.VPod{ tscheduler.NewVPod(testNs, "vpod-1", 0, nil), }, - schedulerPolicyType: scheduler.MAXFILLUP, - wantEvictions: nil, + wantEvictions: nil, }, { name: "one vpod, with placements in 2 pods, compacted", @@ -596,8 +450,7 @@ func TestCompactor(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(7)}}), }, - schedulerPolicyType: scheduler.MAXFILLUP, - wantEvictions: nil, + wantEvictions: nil, }, { name: "one vpod, with placements in 2 pods, compacted edge", @@ -607,8 +460,7 @@ func TestCompactor(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(3)}}), }, - schedulerPolicyType: scheduler.MAXFILLUP, - wantEvictions: nil, + wantEvictions: nil, }, { name: "one vpod, with placements in 2 pods, not compacted", @@ -618,7 +470,6 @@ func TestCompactor(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-1", VReplicas: int32(2)}}), }, - schedulerPolicyType: scheduler.MAXFILLUP, wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ {Name: "vpod-1", Namespace: testNs}: {{PodName: "statefulset-name-1", VReplicas: int32(2)}}, }, @@ -635,127 +486,10 @@ func TestCompactor(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(2)}, {PodName: "statefulset-name-2", VReplicas: int32(7)}}), }, - schedulerPolicyType: scheduler.MAXFILLUP, - wantEvictions: nil, - }, - { - name: "multiple vpods, with placements in multiple pods, not compacted", - replicas: int32(3), - // pod-0:6, pod-1:7, pod-2:7 - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 6, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(4)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - tscheduler.NewVPod(testNs, "vpod-2", 15, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(2)}, - {PodName: "statefulset-name-2", VReplicas: int32(7)}}), - }, - schedulerPolicyType: scheduler.MAXFILLUP, - wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ - {Name: "vpod-2", Namespace: testNs}: {{PodName: "statefulset-name-2", VReplicas: int32(7)}}, - }, - }, - { - name: "no replicas, no placements, no pending, with Predicates and Priorities", - replicas: int32(0), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 0, nil), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "one vpod, with placements in 2 pods, compacted, with Predicates and Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, wantEvictions: nil, }, { - name: "one vpod, with placements in 2 pods, compacted edge, with Predicates and Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 11, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(3)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "one vpod, with placements in 2 pods, not compacted, with Predicates and Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 10, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(2)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ - {Name: "vpod-1", Namespace: testNs}: {{PodName: "statefulset-name-1", VReplicas: int32(2)}}, - }, - }, - { - name: "multiple vpods, with placements in multiple pods, compacted, with Predicates and Priorities", - replicas: int32(3), - // pod-0:6, pod-1:8, pod-2:7 - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 12, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(4)}, - {PodName: "statefulset-name-1", VReplicas: int32(8)}}), - tscheduler.NewVPod(testNs, "vpod-2", 9, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(2)}, - {PodName: "statefulset-name-2", VReplicas: int32(7)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "multiple vpods, with placements in multiple pods, not compacted, with Predicates and Priorities", + name: "multiple vpods, with placements in multiple pods, not compacted", replicas: int32(3), // pod-0:6, pod-1:7, pod-2:7 vpods: []scheduler.VPod{ @@ -766,173 +500,19 @@ func TestCompactor(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: int32(2)}, {PodName: "statefulset-name-2", VReplicas: int32(7)}}), }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ {Name: "vpod-2", Namespace: testNs}: {{PodName: "statefulset-name-2", VReplicas: int32(7)}}, }, }, - { - name: "no replicas, no placements, no pending, with Predicates and HA Priorities", - replicas: int32(0), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 0, nil), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "one vpod, with placements in 2 pods, compacted, with Predicates and HA Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "one vpod, with placements in 2 pods, compacted edge, with Predicates and HA Priorities", - replicas: int32(2), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 11, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(3)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "one vpod, with placements in 3 pods, compacted, with Predicates and HA Priorities", - replicas: int32(3), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 14, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(2)}, - {PodName: "statefulset-name-2", VReplicas: int32(4)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "multiple vpods, with placements in multiple pods, compacted, with Predicates and HA Priorities", - replicas: int32(3), - // pod-0:6, pod-1:8, pod-2:7 - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 12, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(4)}, - {PodName: "statefulset-name-1", VReplicas: int32(8)}}), - tscheduler.NewVPod(testNs, "vpod-2", 9, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(2)}, - {PodName: "statefulset-name-2", VReplicas: int32(7)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: nil, - }, - { - name: "multiple vpods, with placements in multiple pods, not compacted, with Predicates and HA Priorities", - replicas: int32(6), - vpods: []scheduler.VPod{ - tscheduler.NewVPod(testNs, "vpod-1", 16, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(4)}, - {PodName: "statefulset-name-1", VReplicas: int32(2)}, - {PodName: "statefulset-name-2", VReplicas: int32(2)}, - {PodName: "statefulset-name-3", VReplicas: int32(2)}, - {PodName: "statefulset-name-4", VReplicas: int32(3)}, - {PodName: "statefulset-name-5", VReplicas: int32(3)}}), - tscheduler.NewVPod(testNs, "vpod-2", 11, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(2)}, - {PodName: "statefulset-name-1", VReplicas: int32(4)}, - {PodName: "statefulset-name-2", VReplicas: int32(5)}}), - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, - }, - }, - wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ - {Name: "vpod-1", Namespace: testNs}: {{PodName: "statefulset-name-5", VReplicas: int32(3)}, {PodName: "statefulset-name-4", VReplicas: int32(3)}, {PodName: "statefulset-name-3", VReplicas: int32(2)}}, - }, - }, } for _, tc := range testCases { t.Run(tc.name, func(t *testing.T) { ctx, _ := tscheduler.SetupFakeContext(t) - nodelist := make([]runtime.Object, 0, numZones) podlist := make([]runtime.Object, 0, tc.replicas) vpodClient := tscheduler.NewVPodClient() - for i := int32(0); i < numZones; i++ { - for j := int32(0); j < numNodes/numZones; j++ { - nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - } for i := int32(0); i < tc.replicas; i++ { nodeName := "node" + fmt.Sprint(i) podName := sfsName + "-" + fmt.Sprint(i) @@ -949,9 +529,8 @@ func TestCompactor(t *testing.T) { } lsp := listers.NewListers(podlist) - lsn := listers.NewListers(nodelist) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) - stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, tc.schedulerPolicyType, tc.schedulerPolicy, tc.deschedulerPolicy, lsp.GetPodLister().Pods(testNs), lsn.GetNodeLister(), scaleCache) + stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, lsp.GetPodLister().Pods(testNs), scaleCache) evictions := make(map[types.NamespacedName][]duckv1alpha1.Placement) recordEviction := func(pod *corev1.Pod, vpod scheduler.VPod, from *duckv1alpha1.Placement) error { @@ -975,21 +554,12 @@ func TestCompactor(t *testing.T) { vpodClient.Append(vpod) } - state, err := stateAccessor.State(ctx, nil) + state, err := stateAccessor.State(ctx) if err != nil { t.Fatalf("unexpected error: %v", err) } - var scaleUpFactor int32 - if tc.schedulerPolicy != nil && contains(nil, tc.schedulerPolicy.Priorities, st.AvailabilityZonePriority) { //HA scaling across zones - scaleUpFactor = state.NumZones - } else if tc.schedulerPolicy != nil && contains(nil, tc.schedulerPolicy.Priorities, st.AvailabilityNodePriority) { //HA scalingacross nodes - scaleUpFactor = state.NumNodes - } else { - scaleUpFactor = 1 // Non-HA scaling - } - - if err := autoscaler.mayCompact(logging.FromContext(ctx), state, scaleUpFactor); err != nil { + if err := autoscaler.mayCompact(logging.FromContext(ctx), state); err != nil { t.Fatal(err) } diff --git a/pkg/scheduler/statefulset/scheduler.go b/pkg/scheduler/statefulset/scheduler.go index 6995d6ff452..cf2834e7c34 100644 --- a/pkg/scheduler/statefulset/scheduler.go +++ b/pkg/scheduler/statefulset/scheduler.go @@ -18,9 +18,7 @@ package statefulset import ( "context" - "crypto/rand" "fmt" - "math/big" "sort" "sync" "time" @@ -28,11 +26,11 @@ import ( "go.uber.org/zap" appsv1 "k8s.io/api/apps/v1" "k8s.io/apimachinery/pkg/types" + "k8s.io/apimachinery/pkg/util/sets" "k8s.io/client-go/informers" clientappsv1 "k8s.io/client-go/kubernetes/typed/apps/v1" corev1listers "k8s.io/client-go/listers/core/v1" "k8s.io/client-go/tools/cache" - "k8s.io/utils/integer" "knative.dev/pkg/logging" "knative.dev/pkg/reconciler" @@ -41,19 +39,7 @@ import ( duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" "knative.dev/eventing/pkg/scheduler" - "knative.dev/eventing/pkg/scheduler/factory" st "knative.dev/eventing/pkg/scheduler/state" - - _ "knative.dev/eventing/pkg/scheduler/plugins/core/availabilitynodepriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/availabilityzonepriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/evenpodspread" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/lowestordinalpriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/podfitsresources" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithavailabilitynodepriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithavailabilityzonepriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithevenpodspreadpriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithhighestordinalpriority" - _ "knative.dev/eventing/pkg/scheduler/plugins/kafka/nomaxresourcecount" ) type GetReserved func() map[types.NamespacedName]map[string]int32 @@ -65,19 +51,16 @@ type Config struct { ScaleCacheConfig scheduler.ScaleCacheConfig `json:"scaleCacheConfig"` // PodCapacity max capacity for each StatefulSet's pod. PodCapacity int32 `json:"podCapacity"` + // MinReplicas is the minimum replicas of the statefulset. + MinReplicas int32 `json:"minReplicas"` // Autoscaler refresh period RefreshPeriod time.Duration `json:"refreshPeriod"` // Autoscaler retry period RetryPeriod time.Duration `json:"retryPeriod"` - SchedulerPolicy scheduler.SchedulerPolicyType `json:"schedulerPolicy"` - SchedPolicy *scheduler.SchedulerPolicy `json:"schedPolicy"` - DeschedPolicy *scheduler.SchedulerPolicy `json:"deschedPolicy"` - Evictor scheduler.Evictor `json:"-"` - VPodLister scheduler.VPodLister `json:"-"` - NodeLister corev1listers.NodeLister `json:"-"` + VPodLister scheduler.VPodLister `json:"-"` // Pod lister for statefulset: StatefulSetNamespace / StatefulSetName PodLister corev1listers.PodNamespaceLister `json:"-"` @@ -93,7 +76,7 @@ func New(ctx context.Context, cfg *Config) (scheduler.Scheduler, error) { scaleCache := scheduler.NewScaleCache(ctx, cfg.StatefulSetNamespace, kubeclient.Get(ctx).AppsV1().StatefulSets(cfg.StatefulSetNamespace), cfg.ScaleCacheConfig) - stateAccessor := st.NewStateBuilder(cfg.StatefulSetName, cfg.VPodLister, cfg.PodCapacity, cfg.SchedulerPolicy, cfg.SchedPolicy, cfg.DeschedPolicy, cfg.PodLister, cfg.NodeLister, scaleCache) + stateAccessor := st.NewStateBuilder(cfg.StatefulSetName, cfg.VPodLister, cfg.PodCapacity, cfg.PodLister, scaleCache) var getReserved GetReserved cfg.getReserved = func() map[types.NamespacedName]map[string]int32 { @@ -118,14 +101,6 @@ func New(ctx context.Context, cfg *Config) (scheduler.Scheduler, error) { type Pending map[types.NamespacedName]int32 -func (p Pending) Total() int32 { - t := int32(0) - for _, vr := range p { - t += vr - } - return t -} - // StatefulSetScheduler is a scheduler placing VPod into statefulset-managed set of pods type StatefulSetScheduler struct { statefulSetName string @@ -152,9 +127,35 @@ var ( // Promote implements reconciler.LeaderAware. func (s *StatefulSetScheduler) Promote(b reconciler.Bucket, enq func(reconciler.Bucket, types.NamespacedName)) error { + if !b.Has(ephemeralLeaderElectionObject) { + return nil + } + if v, ok := s.autoscaler.(reconciler.LeaderAware); ok { return v.Promote(b, enq) } + if err := s.initReserved(); err != nil { + return err + } + return nil +} + +func (s *StatefulSetScheduler) initReserved() error { + s.reservedMu.Lock() + defer s.reservedMu.Unlock() + + vPods, err := s.vpodLister() + if err != nil { + return fmt.Errorf("failed to list vPods during init: %w", err) + } + + s.reserved = make(map[types.NamespacedName]map[string]int32, len(vPods)) + for _, vPod := range vPods { + s.reserved[vPod.GetKey()] = make(map[string]int32, len(vPod.GetPlacements())) + for _, placement := range vPod.GetPlacements() { + s.reserved[vPod.GetKey()][placement.PodName] += placement.VReplicas + } + } return nil } @@ -170,7 +171,7 @@ func newStatefulSetScheduler(ctx context.Context, stateAccessor st.StateAccessor, autoscaler Autoscaler) *StatefulSetScheduler { - scheduler := &StatefulSetScheduler{ + s := &StatefulSetScheduler{ statefulSetNamespace: cfg.StatefulSetNamespace, statefulSetName: cfg.StatefulSetName, statefulSetClient: kubeclient.Get(ctx).AppsV1().StatefulSets(cfg.StatefulSetNamespace), @@ -188,13 +189,16 @@ func newStatefulSetScheduler(ctx context.Context, informers.WithNamespace(cfg.StatefulSetNamespace), ) - sif.Apps().V1().StatefulSets().Informer(). + _, err := sif.Apps().V1().StatefulSets().Informer(). AddEventHandler(cache.FilteringResourceEventHandler{ FilterFunc: controller.FilterWithNameAndNamespace(cfg.StatefulSetNamespace, cfg.StatefulSetName), Handler: controller.HandleAll(func(i interface{}) { - scheduler.updateStatefulset(ctx, i) + s.updateStatefulset(ctx, i) }), }) + if err != nil { + logging.FromContext(ctx).Fatalw("Failed to register informer", zap.Error(err)) + } sif.Start(ctx.Done()) _ = sif.WaitForCacheSync(ctx.Done()) @@ -204,7 +208,7 @@ func newStatefulSetScheduler(ctx context.Context, sif.Shutdown() }() - return scheduler + return s } func (s *StatefulSetScheduler) Schedule(ctx context.Context, vpod scheduler.VPod) ([]duckv1alpha1.Placement, error) { @@ -214,9 +218,6 @@ func (s *StatefulSetScheduler) Schedule(ctx context.Context, vpod scheduler.VPod defer s.reservedMu.Unlock() placements, err := s.scheduleVPod(ctx, vpod) - if placements == nil { - return placements, err - } sort.SliceStable(placements, func(i int, j int) bool { return st.OrdinalFromPodName(placements[i].PodName) < st.OrdinalFromPodName(placements[j].PodName) @@ -234,30 +235,42 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler. // Get the current placements state // Quite an expensive operation but safe and simple. - state, err := s.stateAccessor.State(ctx, s.reserved) + state, err := s.stateAccessor.State(ctx) if err != nil { logger.Debug("error while refreshing scheduler state (will retry)", zap.Error(err)) return nil, err } - // Clean up reserved from removed resources that don't appear in the vpod list anymore and have - // no pending resources. - reserved := make(map[types.NamespacedName]map[string]int32) - for k, v := range s.reserved { - if pendings, ok := state.Pending[k]; ok { - if pendings == 0 { - reserved[k] = map[string]int32{} - } else { - reserved[k] = v - } + reservedByPodName := make(map[string]int32, 2) + for _, v := range s.reserved { + for podName, vReplicas := range v { + v, _ := reservedByPodName[podName] + reservedByPodName[podName] = vReplicas + v + } + } + + // Use reserved placements as starting point, if we have them. + existingPlacements := make([]duckv1alpha1.Placement, 0) + if placements, ok := s.reserved[vpod.GetKey()]; ok { + existingPlacements = make([]duckv1alpha1.Placement, 0, len(placements)) + for podName, n := range placements { + existingPlacements = append(existingPlacements, duckv1alpha1.Placement{ + PodName: podName, + VReplicas: n, + }) } } - s.reserved = reserved - logger.Debugw("scheduling", zap.Any("state", state)) + sort.SliceStable(existingPlacements, func(i int, j int) bool { + return st.OrdinalFromPodName(existingPlacements[i].PodName) < st.OrdinalFromPodName(existingPlacements[j].PodName) + }) - existingPlacements := vpod.GetPlacements() - var left int32 + logger.Debugw("scheduling state", + zap.Any("state", state), + zap.Any("reservedByPodName", reservedByPodName), + zap.Any("reserved", st.ToJSONable(s.reserved)), + zap.Any("vpod", vpod), + ) // Remove unschedulable or adjust overcommitted pods from placements var placements []duckv1alpha1.Placement @@ -272,23 +285,26 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler. } // Handle overcommitted pods. - if state.Free(ordinal) < 0 { + reserved, _ := reservedByPodName[p.PodName] + if state.Capacity-reserved < 0 { // vr > free => vr: 9, overcommit 4 -> free: 0, vr: 5, pending: +4 // vr = free => vr: 4, overcommit 4 -> free: 0, vr: 0, pending: +4 // vr < free => vr: 3, overcommit 4 -> free: -1, vr: 0, pending: +3 - overcommit := -state.FreeCap[ordinal] + overcommit := -(state.Capacity - reserved) logger.Debugw("overcommit", zap.Any("overcommit", overcommit), zap.Any("placement", p)) if p.VReplicas >= overcommit { state.SetFree(ordinal, 0) state.Pending[vpod.GetKey()] += overcommit + reservedByPodName[p.PodName] -= overcommit p.VReplicas = p.VReplicas - overcommit } else { state.SetFree(ordinal, p.VReplicas-overcommit) state.Pending[vpod.GetKey()] += p.VReplicas + reservedByPodName[p.PodName] -= p.VReplicas p.VReplicas = 0 } @@ -314,51 +330,25 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler. return placements, nil } - if state.SchedulerPolicy != "" { - // Need less => scale down - if tr > vpod.GetVReplicas() { - logger.Debugw("scaling down", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()), - zap.Any("placements", placements), - zap.Any("existingPlacements", existingPlacements)) - - placements = s.removeReplicas(tr-vpod.GetVReplicas(), placements) - - // Do not trigger the autoscaler to avoid unnecessary churn - - return placements, nil - } - - // Need more => scale up - logger.Debugw("scaling up", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()), + // Need less => scale down + if tr > vpod.GetVReplicas() { + logger.Debugw("scaling down", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()), zap.Any("placements", placements), zap.Any("existingPlacements", existingPlacements)) - placements, left = s.addReplicas(state, vpod.GetVReplicas()-tr, placements) + placements = s.removeReplicas(tr-vpod.GetVReplicas(), placements) - } else { //Predicates and priorities must be used for scheduling - // Need less => scale down - if tr > vpod.GetVReplicas() && state.DeschedPolicy != nil { - logger.Infow("scaling down", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()), - zap.Any("placements", placements), - zap.Any("existingPlacements", existingPlacements)) - placements = s.removeReplicasWithPolicy(ctx, vpod, tr-vpod.GetVReplicas(), placements) + // Do not trigger the autoscaler to avoid unnecessary churn - // Do not trigger the autoscaler to avoid unnecessary churn - - return placements, nil - } + return placements, nil + } - if state.SchedPolicy != nil { + // Need more => scale up + logger.Debugw("scaling up", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()), + zap.Any("placements", placements), + zap.Any("existingPlacements", existingPlacements)) - // Need more => scale up - // rebalancing needed for all vreps most likely since there are pending vreps from previous reconciliation - // can fall here when vreps scaled up or after eviction - logger.Infow("scaling up with a rebalance (if needed)", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()), - zap.Any("placements", placements), - zap.Any("existingPlacements", existingPlacements)) - placements, left = s.rebalanceReplicasWithPolicy(ctx, vpod, vpod.GetVReplicas(), placements) - } - } + placements, left := s.addReplicas(state, reservedByPodName, vpod, vpod.GetVReplicas()-tr, placements) if left > 0 { // Give time for the autoscaler to do its job @@ -370,12 +360,6 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler. s.autoscaler.Autoscale(ctx) } - if state.SchedulerPolicy == "" && state.SchedPolicy != nil { - logger.Info("reverting to previous placements") - s.reservePlacements(vpod, existingPlacements) // rebalancing doesn't care about new placements since all vreps will be re-placed - return existingPlacements, s.notEnoughPodReplicas(left) // requeue to wait for the autoscaler to do its job - } - return placements, s.notEnoughPodReplicas(left) } @@ -384,408 +368,125 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler. return placements, nil } -func toJSONable(pending map[types.NamespacedName]int32) map[string]int32 { - r := make(map[string]int32, len(pending)) - for k, v := range pending { - r[k.String()] = v - } - return r -} - -func (s *StatefulSetScheduler) rebalanceReplicasWithPolicy(ctx context.Context, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) { - s.makeZeroPlacements(vpod, placements) - placements, diff = s.addReplicasWithPolicy(ctx, vpod, diff, make([]duckv1alpha1.Placement, 0)) //start fresh with a new placements list - - return placements, diff -} - -func (s *StatefulSetScheduler) removeReplicasWithPolicy(ctx context.Context, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement { - logger := logging.FromContext(ctx).Named("remove replicas with policy") - numVreps := diff - - for i := int32(0); i < numVreps; i++ { //deschedule one vreplica at a time - state, err := s.stateAccessor.State(ctx, s.reserved) - if err != nil { - logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err)) - return placements - } - - feasiblePods := s.findFeasiblePods(ctx, state, vpod, state.DeschedPolicy) - feasiblePods = s.removePodsNotInPlacement(vpod, feasiblePods) - if len(feasiblePods) == 1 { //nothing to score, remove vrep from that pod - placementPodID := feasiblePods[0] - logger.Infof("Selected pod #%v to remove vreplica #%v from", placementPodID, i) - placements = s.removeSelectionFromPlacements(placementPodID, placements) - state.SetFree(placementPodID, state.Free(placementPodID)+1) - s.reservePlacements(vpod, placements) - continue - } - - priorityList, err := s.prioritizePods(ctx, state, vpod, feasiblePods, state.DeschedPolicy) - if err != nil { - logger.Info("error while scoring pods using priorities", zap.Error(err)) - s.reservePlacements(vpod, placements) - break - } - - placementPodID, err := s.selectPod(priorityList) - if err != nil { - logger.Info("error while selecting the placement pod", zap.Error(err)) - s.reservePlacements(vpod, placements) - break - } - - logger.Infof("Selected pod #%v to remove vreplica #%v from", placementPodID, i) - placements = s.removeSelectionFromPlacements(placementPodID, placements) - state.SetFree(placementPodID, state.Free(placementPodID)+1) - s.reservePlacements(vpod, placements) - } - return placements -} - -func (s *StatefulSetScheduler) removeSelectionFromPlacements(placementPodID int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement { +func (s *StatefulSetScheduler) removeReplicas(diff int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement { newPlacements := make([]duckv1alpha1.Placement, 0, len(placements)) - - for i := 0; i < len(placements); i++ { - ordinal := st.OrdinalFromPodName(placements[i].PodName) - if placementPodID == ordinal { - if placements[i].VReplicas == 1 { - // remove the entire placement - } else { - newPlacements = append(newPlacements, duckv1alpha1.Placement{ - PodName: placements[i].PodName, - VReplicas: placements[i].VReplicas - 1, - }) - } + for i := len(placements) - 1; i > -1; i-- { + if diff >= placements[i].VReplicas { + // remove the entire placement + diff -= placements[i].VReplicas } else { newPlacements = append(newPlacements, duckv1alpha1.Placement{ PodName: placements[i].PodName, - VReplicas: placements[i].VReplicas, + VReplicas: placements[i].VReplicas - diff, }) + diff = 0 } } return newPlacements } -func (s *StatefulSetScheduler) addReplicasWithPolicy(ctx context.Context, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) { - logger := logging.FromContext(ctx).Named("add replicas with policy") - - numVreps := diff - for i := int32(0); i < numVreps; i++ { //schedule one vreplica at a time (find most suitable pod placement satisying predicates with high score) - // Get the current placements state - state, err := s.stateAccessor.State(ctx, s.reserved) - if err != nil { - logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err)) - return placements, diff - } - - if s.replicas == 0 { //no pods to filter - logger.Infow("no pods available in statefulset") - s.reservePlacements(vpod, placements) - diff = numVreps - i //for autoscaling up - break //end the iteration for all vreps since there are not pods - } - - feasiblePods := s.findFeasiblePods(ctx, state, vpod, state.SchedPolicy) - if len(feasiblePods) == 0 { //no pods available to schedule this vreplica - logger.Info("no feasible pods available to schedule this vreplica") - s.reservePlacements(vpod, placements) - diff = numVreps - i //for autoscaling up and possible rebalancing - break - } - - /* if len(feasiblePods) == 1 { //nothing to score, place vrep on that pod (Update: for HA, must run HA scorers) - placementPodID := feasiblePods[0] - logger.Infof("Selected pod #%v for vreplica #%v ", placementPodID, i) - placements = s.addSelectionToPlacements(placementPodID, placements) - //state.SetFree(placementPodID, state.Free(placementPodID)-1) - s.reservePlacements(vpod, placements) - diff-- - continue - } */ - - priorityList, err := s.prioritizePods(ctx, state, vpod, feasiblePods, state.SchedPolicy) - if err != nil { - logger.Info("error while scoring pods using priorities", zap.Error(err)) - s.reservePlacements(vpod, placements) - diff = numVreps - i //for autoscaling up and possible rebalancing - break - } - - placementPodID, err := s.selectPod(priorityList) - if err != nil { - logger.Info("error while selecting the placement pod", zap.Error(err)) - s.reservePlacements(vpod, placements) - diff = numVreps - i //for autoscaling up and possible rebalancing - break - } - - logger.Infof("Selected pod #%v for vreplica #%v", placementPodID, i) - placements = s.addSelectionToPlacements(placementPodID, placements) - state.SetFree(placementPodID, state.Free(placementPodID)-1) - s.reservePlacements(vpod, placements) - diff-- +func (s *StatefulSetScheduler) addReplicas(states *st.State, reservedByPodName map[string]int32, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) { + if states.Replicas <= 0 { + return placements, diff } - return placements, diff -} -func (s *StatefulSetScheduler) addSelectionToPlacements(placementPodID int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement { - seen := false - - for i := 0; i < len(placements); i++ { - ordinal := st.OrdinalFromPodName(placements[i].PodName) - if placementPodID == ordinal { - seen = true - placements[i].VReplicas = placements[i].VReplicas + 1 - } - } - if !seen { - placements = append(placements, duckv1alpha1.Placement{ - PodName: st.PodNameFromOrdinal(s.statefulSetName, placementPodID), - VReplicas: 1, - }) - } - return placements -} + newPlacements := make([]duckv1alpha1.Placement, 0, len(placements)) -// findFeasiblePods finds the pods that fit the filter plugins -func (s *StatefulSetScheduler) findFeasiblePods(ctx context.Context, state *st.State, vpod scheduler.VPod, policy *scheduler.SchedulerPolicy) []int32 { - feasiblePods := make([]int32, 0) - for _, podId := range state.SchedulablePods { - statusMap := s.RunFilterPlugins(ctx, state, vpod, podId, policy) - status := statusMap.Merge() - if status.IsSuccess() { - feasiblePods = append(feasiblePods, podId) - } + // Preserve existing placements + for _, p := range placements { + newPlacements = append(newPlacements, *p.DeepCopy()) } - return feasiblePods -} + candidates := s.candidatesOrdered(states, vpod, placements) -// removePodsNotInPlacement removes pods that do not have vreplicas placed -func (s *StatefulSetScheduler) removePodsNotInPlacement(vpod scheduler.VPod, feasiblePods []int32) []int32 { - newFeasiblePods := make([]int32, 0) - for _, e := range vpod.GetPlacements() { - for _, podID := range feasiblePods { - if podID == st.OrdinalFromPodName(e.PodName) { //if pod is in current placement list - newFeasiblePods = append(newFeasiblePods, podID) + // Spread replicas in as many candidates as possible. + foundFreeCandidate := true + for diff > 0 && foundFreeCandidate { + foundFreeCandidate = false + for _, ordinal := range candidates { + if diff <= 0 { + break } - } - } - - return newFeasiblePods -} -// prioritizePods prioritizes the pods by running the score plugins, which return a score for each pod. -// The scores from each plugin are added together to make the score for that pod. -func (s *StatefulSetScheduler) prioritizePods(ctx context.Context, states *st.State, vpod scheduler.VPod, feasiblePods []int32, policy *scheduler.SchedulerPolicy) (st.PodScoreList, error) { - logger := logging.FromContext(ctx).Named("prioritize all feasible pods") - - // If no priority configs are provided, then all pods will have a score of one - result := make(st.PodScoreList, 0, len(feasiblePods)) - if !s.HasScorePlugins(states, policy) { - for _, podID := range feasiblePods { - result = append(result, st.PodScore{ - ID: podID, - Score: 1, - }) - } - return result, nil - } + podName := st.PodNameFromOrdinal(states.StatefulSetName, ordinal) + reserved, _ := reservedByPodName[podName] + // Is there space? + if states.Capacity-reserved > 0 { + foundFreeCandidate = true + allocation := int32(1) - scoresMap, scoreStatus := s.RunScorePlugins(ctx, states, vpod, feasiblePods, policy) - if !scoreStatus.IsSuccess() { - logger.Infof("FAILURE! Cannot score feasible pods due to plugin errors %v", scoreStatus.AsError()) - return nil, scoreStatus.AsError() - } - - // Summarize all scores. - for i := range feasiblePods { - result = append(result, st.PodScore{ID: feasiblePods[i], Score: 0}) - for j := range scoresMap { - result[i].Score += scoresMap[j][i].Score - } - } - - return result, nil -} + newPlacements = upsertPlacements(newPlacements, duckv1alpha1.Placement{ + PodName: st.PodNameFromOrdinal(states.StatefulSetName, ordinal), + VReplicas: allocation, + }) -// selectPod takes a prioritized list of pods and then picks one -func (s *StatefulSetScheduler) selectPod(podScoreList st.PodScoreList) (int32, error) { - if len(podScoreList) == 0 { - return -1, fmt.Errorf("empty priority list") //no selected pod - } - - maxScore := podScoreList[0].Score - selected := podScoreList[0].ID - cntOfMaxScore := int64(1) - for _, ps := range podScoreList[1:] { - if ps.Score > maxScore { - maxScore = ps.Score - selected = ps.ID - cntOfMaxScore = 1 - } else if ps.Score == maxScore { //if equal scores, randomly picks one - cntOfMaxScore++ - randNum, err := rand.Int(rand.Reader, big.NewInt(cntOfMaxScore)) - if err != nil { - return -1, fmt.Errorf("failed to generate random number") - } - if randNum.Int64() == int64(0) { - selected = ps.ID + diff -= allocation + reservedByPodName[podName] += allocation } } } - return selected, nil -} -// RunFilterPlugins runs the set of configured Filter plugins for a vrep on the given pod. -// If any of these plugins doesn't return "Success", the pod is not suitable for placing the vrep. -// Meanwhile, the failure message and status are set for the given pod. -func (s *StatefulSetScheduler) RunFilterPlugins(ctx context.Context, states *st.State, vpod scheduler.VPod, podID int32, policy *scheduler.SchedulerPolicy) st.PluginToStatus { - logger := logging.FromContext(ctx).Named("run all filter plugins") - - statuses := make(st.PluginToStatus) - for _, plugin := range policy.Predicates { - pl, err := factory.GetFilterPlugin(plugin.Name) - if err != nil { - logger.Error("Could not find filter plugin in Registry: ", plugin.Name) - continue - } - - //logger.Infof("Going to run filter plugin: %s using state: %v ", pl.Name(), states) - pluginStatus := s.runFilterPlugin(ctx, pl, plugin.Args, states, vpod, podID) - if !pluginStatus.IsSuccess() { - if !pluginStatus.IsUnschedulable() { - errStatus := st.NewStatus(st.Error, fmt.Sprintf("running %q filter plugin for pod %q failed with: %v", pl.Name(), podID, pluginStatus.Message())) - return map[string]*st.Status{pl.Name(): errStatus} //TODO: if one plugin fails, then no more plugins are run - } - statuses[pl.Name()] = pluginStatus - return statuses - } + if len(newPlacements) == 0 { + return nil, diff } - - return statuses + return newPlacements, diff } -func (s *StatefulSetScheduler) runFilterPlugin(ctx context.Context, pl st.FilterPlugin, args interface{}, states *st.State, vpod scheduler.VPod, podID int32) *st.Status { - status := pl.Filter(ctx, args, states, vpod.GetKey(), podID) - return status -} +func (s *StatefulSetScheduler) candidatesOrdered(states *st.State, vpod scheduler.VPod, placements []duckv1alpha1.Placement) []int32 { + existingPlacements := sets.New[string]() + candidates := make([]int32, len(states.SchedulablePods)) -// RunScorePlugins runs the set of configured scoring plugins. It returns a list that stores for each scoring plugin name the corresponding PodScoreList(s). -// It also returns *Status, which is set to non-success if any of the plugins returns a non-success status. -func (s *StatefulSetScheduler) RunScorePlugins(ctx context.Context, states *st.State, vpod scheduler.VPod, feasiblePods []int32, policy *scheduler.SchedulerPolicy) (st.PluginToPodScores, *st.Status) { - logger := logging.FromContext(ctx).Named("run all score plugins") + firstIdx := 0 + lastIdx := len(candidates) - 1 - pluginToPodScores := make(st.PluginToPodScores, len(policy.Priorities)) - for _, plugin := range policy.Priorities { - pl, err := factory.GetScorePlugin(plugin.Name) - if err != nil { - logger.Error("Could not find score plugin in registry: ", plugin.Name) + // De-prioritize existing placements pods, add existing placements to the tail of the candidates. + // Start from the last one so that within the "existing replicas" group, we prioritize lower ordinals + // to reduce compaction. + for i := len(placements) - 1; i >= 0; i-- { + placement := placements[i] + ordinal := st.OrdinalFromPodName(placement.PodName) + if !states.IsSchedulablePod(ordinal) { continue } - - //logger.Infof("Going to run score plugin: %s using state: %v ", pl.Name(), states) - pluginToPodScores[pl.Name()] = make(st.PodScoreList, len(feasiblePods)) - for index, podID := range feasiblePods { - score, pluginStatus := s.runScorePlugin(ctx, pl, plugin.Args, states, feasiblePods, vpod, podID) - if !pluginStatus.IsSuccess() { - errStatus := st.NewStatus(st.Error, fmt.Sprintf("running %q scoring plugin for pod %q failed with: %v", pl.Name(), podID, pluginStatus.AsError())) - return pluginToPodScores, errStatus //TODO: if one plugin fails, then no more plugins are run - } - - score = score * plugin.Weight //WEIGHED SCORE VALUE - //logger.Infof("scoring plugin %q produced score %v for pod %q: %v", pl.Name(), score, podID, pluginStatus) - pluginToPodScores[pl.Name()][index] = st.PodScore{ - ID: podID, - Score: score, - } - } - - status := pl.ScoreExtensions().NormalizeScore(ctx, states, pluginToPodScores[pl.Name()]) //NORMALIZE SCORES FOR ALL FEASIBLE PODS - if !status.IsSuccess() { - errStatus := st.NewStatus(st.Error, fmt.Sprintf("running %q scoring plugin failed with: %v", pl.Name(), status.AsError())) - return pluginToPodScores, errStatus + // This should really never happen as placements are de-duped, however, better to handle + // edge cases in case the prerequisite doesn't hold in the future. + if existingPlacements.Has(placement.PodName) { + continue } + candidates[lastIdx] = ordinal + lastIdx-- + existingPlacements.Insert(placement.PodName) } - return pluginToPodScores, st.NewStatus(st.Success) -} - -func (s *StatefulSetScheduler) runScorePlugin(ctx context.Context, pl st.ScorePlugin, args interface{}, states *st.State, feasiblePods []int32, vpod scheduler.VPod, podID int32) (uint64, *st.Status) { - score, status := pl.Score(ctx, args, states, feasiblePods, vpod.GetKey(), podID) - return score, status -} - -// HasScorePlugins returns true if at least one score plugin is defined. -func (s *StatefulSetScheduler) HasScorePlugins(state *st.State, policy *scheduler.SchedulerPolicy) bool { - return len(policy.Priorities) > 0 -} - -func (s *StatefulSetScheduler) removeReplicas(diff int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement { - newPlacements := make([]duckv1alpha1.Placement, 0, len(placements)) - for i := len(placements) - 1; i > -1; i-- { - if diff >= placements[i].VReplicas { - // remove the entire placement - diff -= placements[i].VReplicas - } else { - newPlacements = append(newPlacements, duckv1alpha1.Placement{ - PodName: placements[i].PodName, - VReplicas: placements[i].VReplicas - diff, - }) - diff = 0 + // Prioritize reserved placements that don't appear in the committed placements. + if reserved, ok := s.reserved[vpod.GetKey()]; ok { + for podName := range reserved { + if !states.IsSchedulablePod(st.OrdinalFromPodName(podName)) { + continue + } + if existingPlacements.Has(podName) { + continue + } + candidates[firstIdx] = st.OrdinalFromPodName(podName) + firstIdx++ + existingPlacements.Insert(podName) } } - return newPlacements -} - -func (s *StatefulSetScheduler) addReplicas(states *st.State, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) { - // Pod affinity algorithm: prefer adding replicas to existing pods before considering other replicas - newPlacements := make([]duckv1alpha1.Placement, 0, len(placements)) - - // Add to existing - for i := 0; i < len(placements); i++ { - podName := placements[i].PodName - ordinal := st.OrdinalFromPodName(podName) - - // Is there space in PodName? - f := states.Free(ordinal) - if diff >= 0 && f > 0 { - allocation := integer.Int32Min(f, diff) - newPlacements = append(newPlacements, duckv1alpha1.Placement{ - PodName: podName, - VReplicas: placements[i].VReplicas + allocation, - }) - diff -= allocation - states.SetFree(ordinal, f-allocation) - } else { - newPlacements = append(newPlacements, placements[i]) + // Add all the ordinals to the candidates list. + // De-prioritize the last ordinals over lower ordinals so that we reduce the chances for compaction. + for ordinal := s.replicas - 1; ordinal >= 0; ordinal-- { + if !states.IsSchedulablePod(ordinal) { + continue } - } - - if diff > 0 { - // Needs to allocate replicas to additional pods - for ordinal := int32(0); ordinal < s.replicas; ordinal++ { - f := states.Free(ordinal) - if f > 0 { - allocation := integer.Int32Min(f, diff) - newPlacements = append(newPlacements, duckv1alpha1.Placement{ - PodName: st.PodNameFromOrdinal(s.statefulSetName, ordinal), - VReplicas: allocation, - }) - - diff -= allocation - states.SetFree(ordinal, f-allocation) - } - - if diff == 0 { - break - } + podName := st.PodNameFromOrdinal(states.StatefulSetName, ordinal) + if existingPlacements.Has(podName) { + continue } + candidates[lastIdx] = ordinal + lastIdx-- } - - return newPlacements, diff + return candidates } func (s *StatefulSetScheduler) updateStatefulset(ctx context.Context, obj interface{}) { @@ -808,31 +509,17 @@ func (s *StatefulSetScheduler) updateStatefulset(ctx context.Context, obj interf func (s *StatefulSetScheduler) reservePlacements(vpod scheduler.VPod, placements []duckv1alpha1.Placement) { if len(placements) == 0 { // clear our old placements in reserved - s.reserved[vpod.GetKey()] = make(map[string]int32) + delete(s.reserved, vpod.GetKey()) + return } + s.reserved[vpod.GetKey()] = make(map[string]int32, len(placements)) + for _, p := range placements { - // note: track all vreplicas, not only the new ones since - // the next time `state()` is called some vreplicas might - // have been committed. - if _, ok := s.reserved[vpod.GetKey()]; !ok { - s.reserved[vpod.GetKey()] = make(map[string]int32) - } s.reserved[vpod.GetKey()][p.PodName] = p.VReplicas } } -func (s *StatefulSetScheduler) makeZeroPlacements(vpod scheduler.VPod, placements []duckv1alpha1.Placement) { - newPlacements := make([]duckv1alpha1.Placement, len(placements)) - for i := 0; i < len(placements); i++ { - newPlacements[i].PodName = placements[i].PodName - newPlacements[i].VReplicas = 0 - } - // This is necessary to make sure State() zeroes out initial pod/node/zone spread and - // free capacity when there are existing placements for a vpod - s.reservePlacements(vpod, newPlacements) -} - // newNotEnoughPodReplicas returns an error explaining what is the problem, what are the actions we're taking // to try to fix it (retry), wrapping a controller.requeueKeyError which signals to ReconcileKind to requeue the // object after a given delay. @@ -859,3 +546,18 @@ func (s *StatefulSetScheduler) Reserved() map[types.NamespacedName]map[string]in return r } + +func upsertPlacements(placements []duckv1alpha1.Placement, placement duckv1alpha1.Placement) []duckv1alpha1.Placement { + found := false + for i := range placements { + if placements[i].PodName == placement.PodName { + placements[i].VReplicas = placements[i].VReplicas + placement.VReplicas + found = true + break + } + } + if !found { + placements = append(placements, placement) + } + return placements +} diff --git a/pkg/scheduler/statefulset/scheduler_test.go b/pkg/scheduler/statefulset/scheduler_test.go index f6250b11fa2..8bb5c5a035b 100644 --- a/pkg/scheduler/statefulset/scheduler_test.go +++ b/pkg/scheduler/statefulset/scheduler_test.go @@ -19,15 +19,15 @@ package statefulset import ( "context" "fmt" + "math/rand" "reflect" - "sync/atomic" "testing" "time" - "github.com/stretchr/testify/assert" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/types" + "k8s.io/apimachinery/pkg/util/sets" "k8s.io/apimachinery/pkg/util/wait" kubeclient "knative.dev/pkg/client/injection/kube/client/fake" _ "knative.dev/pkg/client/injection/kube/informers/apps/v1/statefulset/fake" @@ -45,51 +45,45 @@ const ( sfsName = "statefulset-name" vpodName = "source-name" vpodNamespace = "source-namespace" - numZones = 3 - numNodes = 6 ) func TestStatefulsetScheduler(t *testing.T) { testCases := []struct { - name string - vreplicas int32 - replicas int32 - placements []duckv1alpha1.Placement - expected []duckv1alpha1.Placement - err error - schedulerPolicyType scheduler.SchedulerPolicyType - schedulerPolicy *scheduler.SchedulerPolicy - deschedulerPolicy *scheduler.SchedulerPolicy - pending map[types.NamespacedName]int32 + name string + vreplicas int32 + replicas int32 + placements []duckv1alpha1.Placement + expected []duckv1alpha1.Placement + err error + pending map[types.NamespacedName]int32 + initialReserved map[types.NamespacedName]map[string]int32 + expectedReserved map[types.NamespacedName]map[string]int32 + unschedulablePods sets.Set[int32] + capacity int32 }{ { - name: "no replicas, no vreplicas", - vreplicas: 0, - replicas: int32(0), - expected: nil, - schedulerPolicyType: scheduler.MAXFILLUP, + name: "no replicas, no vreplicas", + vreplicas: 0, + replicas: int32(0), + expected: nil, }, { - name: "no replicas, 1 vreplicas, fail.", - vreplicas: 1, - replicas: int32(0), - err: controller.NewRequeueAfter(5 * time.Second), - expected: []duckv1alpha1.Placement{}, - schedulerPolicyType: scheduler.MAXFILLUP, + name: "no replicas, 1 vreplicas, fail.", + vreplicas: 1, + replicas: int32(0), + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "one replica, one vreplicas", - vreplicas: 1, - replicas: int32(1), - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 1}}, - schedulerPolicyType: scheduler.MAXFILLUP, + name: "one replica, one vreplicas", + vreplicas: 1, + replicas: int32(1), + expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 1}}, }, { - name: "one replica, 3 vreplicas", - vreplicas: 3, - replicas: int32(1), - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 3}}, - schedulerPolicyType: scheduler.MAXFILLUP, + name: "one replica, 3 vreplicas", + vreplicas: 3, + replicas: int32(1), + expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 3}}, }, { name: "one replica, 8 vreplicas, already scheduled on unschedulable pod, add replicas", @@ -102,7 +96,6 @@ func TestStatefulsetScheduler(t *testing.T) { expected: []duckv1alpha1.Placement{ {PodName: "statefulset-name-0", VReplicas: 8}, }, - schedulerPolicyType: scheduler.MAXFILLUP, }, { name: "one replica, 1 vreplicas, already scheduled on unschedulable pod, remove replicas", @@ -115,25 +108,55 @@ func TestStatefulsetScheduler(t *testing.T) { expected: []duckv1alpha1.Placement{ {PodName: "statefulset-name-0", VReplicas: 1}, }, - schedulerPolicyType: scheduler.MAXFILLUP, }, { - name: "one replica, 15 vreplicas, unschedulable", - vreplicas: 15, - replicas: int32(1), - err: controller.NewRequeueAfter(5 * time.Second), - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 10}}, - schedulerPolicyType: scheduler.MAXFILLUP, + name: "one replica, 15 vreplicas, unschedulable", + vreplicas: 15, + replicas: int32(1), + err: controller.NewRequeueAfter(5 * time.Second), + expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 10}}, }, { name: "two replicas, 15 vreplicas, scheduled", vreplicas: 15, replicas: int32(2), expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 5}, + {PodName: "statefulset-name-0", VReplicas: 8}, + {PodName: "statefulset-name-1", VReplicas: 7}, + }, + }, + { + name: "5 replicas, 4 vreplicas spread, scheduled", + vreplicas: 4, + replicas: int32(5), + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 1}, + {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-2", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, + }, + }, + { + name: "2 replicas, 4 vreplicas spread, scheduled", + vreplicas: 4, + replicas: int32(2), + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 2}, + {PodName: "statefulset-name-1", VReplicas: 2}, + }, + }, + { + name: "3 replicas, 2 new vreplicas spread, scheduled", + vreplicas: 5, + replicas: int32(3), + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 1}, + }, + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 2}, + {PodName: "statefulset-name-1", VReplicas: 2}, + {PodName: "statefulset-name-2", VReplicas: 1}, }, - schedulerPolicyType: scheduler.MAXFILLUP, }, { name: "two replicas, 15 vreplicas, already scheduled", @@ -147,7 +170,6 @@ func TestStatefulsetScheduler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: 10}, {PodName: "statefulset-name-1", VReplicas: 5}, }, - schedulerPolicyType: scheduler.MAXFILLUP, }, { name: "two replicas, 20 vreplicas, scheduling", @@ -161,7 +183,6 @@ func TestStatefulsetScheduler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: 10}, {PodName: "statefulset-name-1", VReplicas: 10}, }, - schedulerPolicyType: scheduler.MAXFILLUP, }, { name: "two replicas, 15 vreplicas, too much scheduled (scale down)", @@ -175,621 +196,525 @@ func TestStatefulsetScheduler(t *testing.T) { {PodName: "statefulset-name-0", VReplicas: 10}, {PodName: "statefulset-name-1", VReplicas: 5}, }, - schedulerPolicyType: scheduler.MAXFILLUP, }, { - name: "no replicas, no vreplicas with Predicates and Priorities", - vreplicas: 0, - replicas: int32(0), - expected: nil, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, - }, + name: "two replicas, 12 vreplicas, already scheduled on overcommitted pod, remove replicas", + vreplicas: 12, + replicas: int32(2), + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 12}, }, - }, - { - name: "no replicas, 1 vreplicas, fail with Predicates and Priorities", - vreplicas: 1, - replicas: int32(0), - err: controller.NewRequeueAfter(5 * time.Second), - expected: nil, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, - }, + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 10}, + {PodName: "statefulset-name-1", VReplicas: 2}, }, }, { - name: "one replica, one vreplicas with Predicates and Priorities", - vreplicas: 1, + name: "one replica, 12 vreplicas, already scheduled on overcommitted pod, remove replicas", + vreplicas: 12, replicas: int32(1), - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 1}}, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, - }, + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 12}, }, + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 10}, + }, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "one replica, 3 vreplicas with Predicates and Priorities", + name: "with reserved replicas, same vpod", vreplicas: 3, replicas: int32(1), expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 3}}, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 3, }, }, }, { - name: "one replica, 15 vreplicas, unschedulable with Predicates and Priorities", - vreplicas: 15, + name: "with reserved replicas, different vpod, not enough replicas", + vreplicas: 3, replicas: int32(1), - err: controller.NewRequeueAfter(5 * time.Second), - expected: nil, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, }, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "two replicas, 12 vreplicas, scheduled with Predicates and no Priorities", - vreplicas: 12, - replicas: int32(2), - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 6}, - {PodName: "statefulset-name-1", VReplicas: 6}, + name: "with reserved replicas, different vpod, not enough replicas and existing placements", + vreplicas: 3, + replicas: int32(1), + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 1}, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, }, - }, - { - name: "two replicas, 15 vreplicas, scheduled with Predicates and Priorities", - vreplicas: 15, - replicas: int32(2), - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 5}, - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, }, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "two replicas, 15 vreplicas, already scheduled with Predicates and Priorities", - vreplicas: 15, - replicas: int32(2), + name: "with reserved replicas, different vpod, not enough replicas and existing overcommitted placements", + vreplicas: 3, + replicas: int32(1), placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 5}, + {PodName: "statefulset-name-0", VReplicas: 1}, }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 5}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, + }, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 5, + }, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 5, }, }, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "two replicas, 20 vreplicas, scheduling with Predicates and Priorities", - vreplicas: 20, - replicas: int32(2), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 5}, - {PodName: "statefulset-name-1", VReplicas: 5}, - }, + name: "with reserved replicas, different vpod, with some space", + vreplicas: 3, + replicas: int32(1), expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 10}, + {PodName: "statefulset-name-0", VReplicas: 1}, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 1}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 9, }, }, - }, - { - name: "no replicas, no vreplicas with two Predicates and two Priorities", - vreplicas: 0, - replicas: int32(0), - expected: nil, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", - Args: "{\"MaxSkew\": 2}"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 9, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 1, }, }, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "no replicas, 1 vreplicas, fail with two Predicates and two Priorities", - vreplicas: 1, - replicas: int32(0), - err: controller.NewRequeueAfter(5 * time.Second), - expected: nil, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + name: "with reserved replicas, different vpod", + vreplicas: 2, + replicas: int32(4), + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-3", VReplicas: 2}, + }, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-3": 3, }, }, - }, - { - name: "three replicas, one vreplica, with two Predicates and two Priorities (HA)", - vreplicas: 1, - replicas: int32(3), - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 1}}, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-3": 2, }, }, }, { - name: "three replicas, three vreplicas, with two Predicates and two Priorities (HA)", - vreplicas: 3, - replicas: int32(3), + name: "with reserved replicas, same vpod, no op", + vreplicas: 2, + replicas: int32(4), + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, + }, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-1", VReplicas: 1}, - {PodName: "statefulset-name-2", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-3": 1, + "statefulset-name-1": 1, }, }, - }, - { - name: "one replica, 15 vreplicas, with two Predicates and two Priorities (HA)", - vreplicas: 15, - replicas: int32(1), - err: controller.NewRequeueAfter(5 * time.Second), - expected: nil, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-3": 1, + "statefulset-name-1": 1, }, }, }, { - name: "three replicas, 15 vreplicas, scheduled, with two Predicates and two Priorities (HA)", - vreplicas: 15, - replicas: int32(3), + name: "with reserved replicas, same vpod, add replica", + vreplicas: 3, + replicas: int32(4), + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, + }, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 5}, - {PodName: "statefulset-name-1", VReplicas: 5}, - {PodName: "statefulset-name-2", VReplicas: 5}, + {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-2", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-1": 1, + "statefulset-name-3": 1, }, }, - }, - { - name: "three replicas, 15 vreplicas, already scheduled, with two Predicates and two Priorities (HA)", - vreplicas: 15, - replicas: int32(3), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 5}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 5}, - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-1": 1, + "statefulset-name-2": 1, + "statefulset-name-3": 1, }, }, }, { - name: "three replicas, 30 vreplicas, with two Predicates and two Priorities (HA)", - vreplicas: 30, - replicas: int32(3), + name: "with reserved replicas, same vpod, add replicas (replica reserved)", + vreplicas: 4, + replicas: int32(4), placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 5}, - {PodName: "statefulset-name-1", VReplicas: 5}, - {PodName: "statefulset-name-2", VReplicas: 10}, + {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, }, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 10}, - {PodName: "statefulset-name-2", VReplicas: 10}, + {PodName: "statefulset-name-1", VReplicas: 2}, + {PodName: "statefulset-name-2", VReplicas: 1}, + {PodName: "statefulset-name-3", VReplicas: 1}, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 5}"}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 5}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-1": 1, + "statefulset-name-3": 1, }, }, - }, - { - name: "three replicas, 15 vreplicas, with two Predicates and two Priorities (HA)", - vreplicas: 15, - replicas: int32(3), - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 5}, - {PodName: "statefulset-name-1", VReplicas: 5}, - {PodName: "statefulset-name-2", VReplicas: 5}, - }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-1": 2, + "statefulset-name-2": 1, + "statefulset-name-3": 1, }, }, }, { - name: "three replicas, 20 vreplicas, with two Predicates and two Priorities (HA)", - vreplicas: 20, - replicas: int32(3), + name: "with reserved replicas, same vpod, remove replicas", + vreplicas: 1, + replicas: int32(4), + placements: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 1}, + {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-2", VReplicas: 3}, + {PodName: "statefulset-name-3", VReplicas: 7}, + }, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 7}, - {PodName: "statefulset-name-1", VReplicas: 7}, - {PodName: "statefulset-name-2", VReplicas: 6}, + {PodName: "statefulset-name-1", VReplicas: 1}, }, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 2}"}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "LowestOrdinalPriority", Weight: 2}, - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 1, + "statefulset-name-1": 1, + "statefulset-name-2": 3, + "statefulset-name-3": 7, }, }, - }, - { - name: "one replica, 8 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 8, - replicas: int32(1), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 8}, - }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithEvenPodSpreadPriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 10, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-1": 1, }, }, }, { - name: "two replicas, 15 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 15, + name: "Scale one replica up with many existing placements", + vreplicas: 32, replicas: int32(2), placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 10}, + {PodName: "statefulset-name-0", VReplicas: 1}, }, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 8}, - {PodName: "statefulset-name-1", VReplicas: 7}, + {PodName: "statefulset-name-0", VReplicas: 13}, + {PodName: "statefulset-name-1", VReplicas: 19}, }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithEvenPodSpreadPriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-c", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-d", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-e", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-f", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-g", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 1, }, }, - }, - { - name: "three replicas, 15 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 15, - replicas: int32(3), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 10}, - {PodName: "statefulset-name-2", VReplicas: 5}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 5}, - {PodName: "statefulset-name-1", VReplicas: 5}, - {PodName: "statefulset-name-2", VReplicas: 5}, - }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithAvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-c", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-d", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-e", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-f", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace + "-g", Name: vpodName}: { + "statefulset-name-0": 1, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 13, + "statefulset-name-1": 19, }, }, + capacity: 20, }, { - name: "three replicas, 2 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 2, - replicas: int32(3), + name: "Reserved inconsistent with placements", + vreplicas: 32, + replicas: int32(2), placements: []duckv1alpha1.Placement{ {PodName: "statefulset-name-0", VReplicas: 1}, - {PodName: "statefulset-name-1", VReplicas: 1}, - {PodName: "statefulset-name-2", VReplicas: 1}, }, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - {PodName: "statefulset-name-1", VReplicas: 1}, + {PodName: "statefulset-name-0", VReplicas: 13}, + {PodName: "statefulset-name-1", VReplicas: 19}, }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithEvenPodSpreadPriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 7, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 7, }, }, - }, - { - name: "three replicas, 2 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 2, - replicas: int32(3), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - {PodName: "statefulset-name-1", VReplicas: 1}, - {PodName: "statefulset-name-2", VReplicas: 1}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - {PodName: "statefulset-name-1", VReplicas: 1}, - }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithAvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 7, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 13, + "statefulset-name-1": 19, }, }, + capacity: 20, }, { - name: "three replicas, 3 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 3, - replicas: int32(3), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 2}, - {PodName: "statefulset-name-1", VReplicas: 2}, - {PodName: "statefulset-name-2", VReplicas: 2}, - }, + name: "Reserved and overcommit", + vreplicas: 32, + replicas: int32(2), + placements: []duckv1alpha1.Placement{}, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - {PodName: "statefulset-name-1", VReplicas: 1}, - {PodName: "statefulset-name-2", VReplicas: 1}, + {PodName: "statefulset-name-0", VReplicas: 9}, + {PodName: "statefulset-name-1", VReplicas: 20}, }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithEvenPodSpreadPriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 11, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 17, }, }, - }, - { - name: "three replicas, 6 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 7, - replicas: int32(3), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 10}, - {PodName: "statefulset-name-2", VReplicas: 5}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 3}, - {PodName: "statefulset-name-1", VReplicas: 2}, - {PodName: "statefulset-name-2", VReplicas: 2}, - }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithEvenPodSpreadPriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 11, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 9, + "statefulset-name-1": 20, }, }, + capacity: 20, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "four replicas, 7 vreplicas, too much scheduled (scale down), with two desched Priorities", - vreplicas: 7, - replicas: int32(4), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 4}, - {PodName: "statefulset-name-1", VReplicas: 3}, - {PodName: "statefulset-name-2", VReplicas: 4}, - {PodName: "statefulset-name-3", VReplicas: 3}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 2}, - {PodName: "statefulset-name-1", VReplicas: 2}, - {PodName: "statefulset-name-2", VReplicas: 2}, - {PodName: "statefulset-name-3", VReplicas: 1}, + name: "Reserved and overcommit, remove the full placement", + vreplicas: 32, + replicas: int32(1), + placements: []duckv1alpha1.Placement{}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, + }, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 2, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 10, + }, }, - deschedulerPolicy: &scheduler.SchedulerPolicy{ - Priorities: []scheduler.PriorityPolicy{ - {Name: "RemoveWithEvenPodSpreadPriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "RemoveWithHighestOrdinalPriority", Weight: 2}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, + }, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 2, }, }, + capacity: 20, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "three replicas, 15 vreplicas with Predicates and Priorities and non-zero pending for rebalancing", - vreplicas: 15, - replicas: int32(3), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 5}, - {PodName: "statefulset-name-1", VReplicas: 5}, - {PodName: "statefulset-name-2", VReplicas: 5}, - }, - pending: map[types.NamespacedName]int32{{Name: vpodName, Namespace: vpodNamespace}: 5}, - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, - {Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, + name: "Reserved and overcommit, remove the full placement, keep others overcommitted replicas", + vreplicas: 32, + replicas: int32(1), + placements: []duckv1alpha1.Placement{}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, + }, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 3, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 10, }, }, - }, - { - name: "six replicas, one vreplica, with Zone Priority (HA)", - vreplicas: 1, - replicas: int32(6), //Includes pod/node in unknown zone - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 1}}, //Not failing the plugin - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, + types.NamespacedName{Namespace: vpodNamespace + "-b", Name: vpodName}: { + "statefulset-name-0": 3, }, }, + capacity: 20, + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "six replicas, one vreplica, with Node Priority (HA)", - vreplicas: 1, - replicas: int32(6), //Includes pod/node in unknown zone - expected: []duckv1alpha1.Placement{{PodName: "statefulset-name-0", VReplicas: 1}}, //Not failing the plugin - schedulerPolicy: &scheduler.SchedulerPolicy{ - Predicates: []scheduler.PredicatePolicy{ - {Name: "PodFitsResources"}, + name: "Unschedulable pod", + vreplicas: 20, + replicas: int32(1), + capacity: 20, + unschedulablePods: sets.New[int32](0), + err: controller.NewRequeueAfter(5 * time.Second), + }, + { + name: "Unschedulable pod, with reserved and some space", + vreplicas: 20, + replicas: int32(2), + capacity: 20, + expected: []duckv1alpha1.Placement{ + {PodName: "statefulset-name-0", VReplicas: 2}, + }, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, }, - Priorities: []scheduler.PriorityPolicy{ - {Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 2}"}, - {Name: "LowestOrdinalPriority", Weight: 5}, + }, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 2, }, }, + unschedulablePods: sets.New[int32](1), + err: controller.NewRequeueAfter(5 * time.Second), }, { - name: "two replicas, 12 vreplicas, already scheduled on overcommitted pod, remove replicas", - vreplicas: 12, - replicas: int32(2), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 12}, - }, + name: "Unschedulable middle pod, with reserved and some space", + vreplicas: 20, + replicas: int32(3), + capacity: 20, expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, - {PodName: "statefulset-name-1", VReplicas: 2}, + {PodName: "statefulset-name-0", VReplicas: 2}, + {PodName: "statefulset-name-2", VReplicas: 18}, }, - schedulerPolicyType: scheduler.MAXFILLUP, - }, - { - name: "one replica, 12 vreplicas, already scheduled on overcommitted pod, remove replicas", - vreplicas: 12, - replicas: int32(1), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 12}, + initialReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, + }, }, - expected: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 10}, + expectedReserved: map[types.NamespacedName]map[string]int32{ + types.NamespacedName{Namespace: vpodNamespace + "-a", Name: vpodName}: { + "statefulset-name-0": 18, + }, + types.NamespacedName{Namespace: vpodNamespace, Name: vpodName}: { + "statefulset-name-0": 2, + "statefulset-name-2": 18, + }, }, - err: controller.NewRequeueAfter(5 * time.Second), - schedulerPolicyType: scheduler.MAXFILLUP, + unschedulablePods: sets.New[int32](1), }, } for _, tc := range testCases { t.Run(tc.name, func(t *testing.T) { ctx, _ := tscheduler.SetupFakeContext(t) - nodelist := make([]runtime.Object, 0, numZones) podlist := make([]runtime.Object, 0, tc.replicas) - vpodClient := tscheduler.NewVPodClient() - for i := int32(0); i < numZones; i++ { - for j := int32(0); j < numNodes/numZones; j++ { - nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i) - zoneName := "zone" + fmt.Sprint(i) - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) - } - } - nodeName := "node" + fmt.Sprint(numNodes) //Node in unknown zone - node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNodeNoLabel(nodeName), metav1.CreateOptions{}) - if err != nil { - t.Fatal("unexpected error", err) - } - nodelist = append(nodelist, node) + vpodClient := tscheduler.NewVPodClient() + vpod := vpodClient.Create(vpodNamespace, vpodName, tc.vreplicas, tc.placements) for i := int32(0); i < tc.replicas; i++ { nodeName := "node" + fmt.Sprint(i) podName := sfsName + "-" + fmt.Sprint(i) + if tc.unschedulablePods.Has(i) { + nodeName = "" + } pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{}) if err != nil { t.Fatal("unexpected error", err) @@ -797,20 +722,31 @@ func TestStatefulsetScheduler(t *testing.T) { podlist = append(podlist, pod) } - _, err = kubeclient.Get(ctx).AppsV1().StatefulSets(testNs).Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, tc.replicas), metav1.CreateOptions{}) + capacity := int32(10) + if tc.capacity > 0 { + capacity = tc.capacity + } + + _, err := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs).Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, tc.replicas), metav1.CreateOptions{}) if err != nil { t.Fatal("unexpected error", err) } lsp := listers.NewListers(podlist) - lsn := listers.NewListers(nodelist) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) - sa := state.NewStateBuilder(sfsName, vpodClient.List, 10, tc.schedulerPolicyType, tc.schedulerPolicy, tc.deschedulerPolicy, lsp.GetPodLister().Pods(testNs), lsn.GetNodeLister(), scaleCache) + sa := state.NewStateBuilder(sfsName, vpodClient.List, capacity, lsp.GetPodLister().Pods(testNs), scaleCache) cfg := &Config{ StatefulSetNamespace: testNs, StatefulSetName: sfsName, VPodLister: vpodClient.List, } s := newStatefulSetScheduler(ctx, cfg, sa, nil) + err = s.Promote(reconciler.UniversalBucket(), func(bucket reconciler.Bucket, name types.NamespacedName) {}) + if err != nil { + t.Fatal("unexpected error", err) + } + if tc.initialReserved != nil { + s.reserved = tc.initialReserved + } // Give some time for the informer to notify the scheduler and set the number of replicas err = wait.PollUntilContextTimeout(ctx, 200*time.Millisecond, time.Second, true, func(ctx context.Context) (bool, error) { @@ -822,7 +758,6 @@ func TestStatefulsetScheduler(t *testing.T) { t.Fatalf("expected number of statefulset replica to be %d (got %d)", tc.replicas, s.replicas) } - vpod := vpodClient.Create(vpodNamespace, vpodName, tc.vreplicas, tc.placements) placements, err := s.Schedule(ctx, vpod) if tc.err == nil && err != nil { @@ -834,120 +769,61 @@ func TestStatefulsetScheduler(t *testing.T) { } if !reflect.DeepEqual(placements, tc.expected) { - t.Errorf("got %v, want %v", placements, tc.expected) + t.Errorf("placements: got %v, want %v", placements, tc.expected) } - }) - } -} - -func TestReservePlacements(t *testing.T) { - testCases := []struct { - name string - vpod scheduler.VPod - placements []duckv1alpha1.Placement - reserved map[string]int32 - }{ - { - name: "no replicas, no placement, no reserved", - vpod: tscheduler.NewVPod(testNs, "vpod-1", 0, nil), - placements: nil, - reserved: make(map[string]int32), - }, - { - name: "one vpod, with placements in 2 pods, no reserved", - vpod: tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - placements: nil, - reserved: make(map[string]int32), - }, - { - name: "no replicas, new placements, with reserved", - vpod: tscheduler.NewVPod(testNs, "vpod-1", 0, nil), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - }, - reserved: map[string]int32{"statefulset-name-0": 1}, - }, - { - name: "one vpod, with placements in 2 pods, with reserved", - vpod: tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - }, - reserved: map[string]int32{"statefulset-name-0": 1, "statefulset-name-1": 7}, - }, - { - name: "one vpod, with placements in 2 pods, with reserved", - vpod: tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: int32(8)}, - {PodName: "statefulset-name-1", VReplicas: int32(7)}}), - placements: []duckv1alpha1.Placement{ - {PodName: "statefulset-name-0", VReplicas: 1}, - {PodName: "statefulset-name-1", VReplicas: 1}, - }, - reserved: map[string]int32{"statefulset-name-0": 1, "statefulset-name-1": 1}, - }, - } - - for _, tc := range testCases { - t.Run(tc.name, func(t *testing.T) { - ctx, _ := tscheduler.SetupFakeContext(t) - - vpodClient := tscheduler.NewVPodClient() - vpodClient.Append(tc.vpod) - - cfg := &Config{ - StatefulSetNamespace: testNs, - StatefulSetName: sfsName, - VPodLister: vpodClient.List, - } - fa := newFakeAutoscaler() - s := newStatefulSetScheduler(ctx, cfg, nil, fa) - _ = s.Promote(reconciler.UniversalBucket(), func(bucket reconciler.Bucket, name types.NamespacedName) {}) - - s.reservePlacements(tc.vpod, tc.vpod.GetPlacements()) //initial reserve - - s.reservePlacements(tc.vpod, tc.placements) //new reserve - if !reflect.DeepEqual(s.reserved[tc.vpod.GetKey()], tc.reserved) { - t.Errorf("got %v, want %v", s.reserved[tc.vpod.GetKey()], tc.reserved) + if tc.expectedReserved != nil { + res := s.Reserved() + if !reflect.DeepEqual(tc.expectedReserved, res) { + t.Errorf("expected reserved: got %v, want %v", placements, tc.expected) + } } - - assert.Equal(t, true, fa.isLeader.Load()) - - s.Demote(reconciler.UniversalBucket()) - assert.Equal(t, false, fa.isLeader.Load()) }) } } -type fakeAutoscaler struct { - isLeader atomic.Bool -} - -func (f *fakeAutoscaler) Start(ctx context.Context) { -} +func BenchmarkSchedule(b *testing.B) { + ctx, _ := tscheduler.SetupFakeContext(b) -func (f *fakeAutoscaler) Autoscale(ctx context.Context) { -} - -func newFakeAutoscaler() *fakeAutoscaler { - return &fakeAutoscaler{ - isLeader: atomic.Bool{}, + _, err := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs).Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, 10000), metav1.CreateOptions{}) + if err != nil { + b.Fatal("unexpected error", err) } -} -func (f *fakeAutoscaler) Promote(b reconciler.Bucket, enq func(reconciler.Bucket, types.NamespacedName)) error { - f.isLeader.Store(true) - return nil -} + vpodClient := tscheduler.NewVPodClient() + for i := 0; i < 1000; i++ { + vpodClient.Create(vpodNamespace, vpodName, rand.Int31n(100), nil) + } + k8s := kubeclient.Get(ctx).CoreV1() + + podlist := make([]runtime.Object, 0) + for i := int32(0); i < 10000; i++ { + nodeName := "node" + fmt.Sprint(i) + podName := sfsName + "-" + fmt.Sprint(i) + pod, err := k8s.Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{}) + if err != nil { + b.Fatal("unexpected error", err) + } + podlist = append(podlist, pod) + } + lsp := listers.NewListers(podlist) + scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) + sa := state.NewStateBuilder(sfsName, vpodClient.List, 20, lsp.GetPodLister().Pods(testNs), scaleCache) + cfg := &Config{ + StatefulSetNamespace: testNs, + StatefulSetName: sfsName, + VPodLister: vpodClient.List, + } + s := newStatefulSetScheduler(ctx, cfg, sa, nil) + err = s.Promote(reconciler.UniversalBucket(), func(bucket reconciler.Bucket, name types.NamespacedName) {}) + if err != nil { + b.Fatal("unexpected error", err) + } -func (f *fakeAutoscaler) Demote(bucket reconciler.Bucket) { - f.isLeader.Store(false) + b.ResetTimer() + for i := 0; i < b.N; i++ { + if _, err := s.Schedule(ctx, vpodClient.Random()); err != nil { + b.Fatal("unexpected error", err) + } + } } - -var _ reconciler.LeaderAware = &fakeAutoscaler{} -var _ Autoscaler = &fakeAutoscaler{} diff --git a/pkg/scheduler/testing/client.go b/pkg/scheduler/testing/client.go index 8ecc9398154..6131addfa01 100644 --- a/pkg/scheduler/testing/client.go +++ b/pkg/scheduler/testing/client.go @@ -17,6 +17,8 @@ limitations under the License. package testing import ( + "math/rand" + duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" "knative.dev/eventing/pkg/scheduler" ) @@ -51,3 +53,10 @@ func (s *VPodClient) Append(vpod scheduler.VPod) { func (s *VPodClient) List() ([]scheduler.VPod, error) { return s.lister() } + +func (s *VPodClient) Random() scheduler.VPod { + s.store.lock.Lock() + defer s.store.lock.Unlock() + + return s.store.vpods[rand.Intn(len(s.store.vpods))] +} diff --git a/pkg/scheduler/testing/vpod.go b/pkg/scheduler/testing/vpod.go index f3347173b65..f11dda3e488 100644 --- a/pkg/scheduler/testing/vpod.go +++ b/pkg/scheduler/testing/vpod.go @@ -25,15 +25,14 @@ import ( "k8s.io/apimachinery/pkg/types" "knative.dev/pkg/controller" - duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" - "knative.dev/eventing/pkg/scheduler" - appsv1 "k8s.io/api/apps/v1" autoscalingv1 "k8s.io/api/autoscaling/v1" v1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" gtesting "k8s.io/client-go/testing" + duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" + kubeclient "knative.dev/pkg/client/injection/kube/client/fake" _ "knative.dev/pkg/client/injection/kube/informers/apps/v1/statefulset/fake" rectesting "knative.dev/pkg/reconciler/testing" @@ -74,45 +73,6 @@ func (d *sampleVPod) GetResourceVersion() string { return d.rsrcversion } -func MakeNode(name, zonename string) *v1.Node { - obj := &v1.Node{ - ObjectMeta: metav1.ObjectMeta{ - Name: name, - Labels: map[string]string{ - scheduler.ZoneLabel: zonename, - }, - }, - } - return obj -} - -func MakeNodeNoLabel(name string) *v1.Node { - obj := &v1.Node{ - ObjectMeta: metav1.ObjectMeta{ - Name: name, - }, - } - return obj -} - -func MakeNodeTainted(name, zonename string) *v1.Node { - obj := &v1.Node{ - ObjectMeta: metav1.ObjectMeta{ - Name: name, - Labels: map[string]string{ - scheduler.ZoneLabel: zonename, - }, - }, - Spec: v1.NodeSpec{ - Taints: []v1.Taint{ - {Key: "node.kubernetes.io/unreachable", Effect: v1.TaintEffectNoExecute}, - {Key: "node.kubernetes.io/unreachable", Effect: v1.TaintEffectNoSchedule}, - }, - }, - } - return obj -} - func MakeStatefulset(ns, name string, replicas int32) *appsv1.StatefulSet { obj := &appsv1.StatefulSet{ ObjectMeta: metav1.ObjectMeta{ @@ -143,7 +103,7 @@ func MakePod(ns, name, nodename string) *v1.Pod { return obj } -func SetupFakeContext(t *testing.T) (context.Context, context.CancelFunc) { +func SetupFakeContext(t testing.TB) (context.Context, context.CancelFunc) { ctx, cancel, informers := rectesting.SetupFakeContextWithCancel(t) err := controller.StartInformers(ctx.Done(), informers...) if err != nil {