-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement workload simulator for autoscaler development #1686
Comments
/assign |
It's worth reporting on progress so far. I've developed two different "simulators": a prototype and a discrete-event simulator. (The codebase is not currently public. I hope to remedy that soon.) The prototype simulatorThe prototype explored two aspects:
The prototype works as a fixed-increment simulation. Each iteration of the simulation loop advances time by a fixed increment of 1 second. At each step the simulation decides whether to change the variables observed by the autoscaler system: how many Endpoints are active and how many requests are being concurrently handled. It captures the scaling decisions of the autoscaler. The statistics for each step are gathered and output into a simple graph of simulation. Problems with the prototype simulatorThe first problem is that the simulation is unrealistic, parts of it are "faked". The times at which the controlled variables are changed are all hardcoded. The second problem is the fixed-increment approach itself. For large parts of the simulation, none of the variables have changed. An iteration of the simulation loop where nothing changes is basically wasted effort. This doesn't matter with the small scenario used (1000 increments of 1 second each), but will become worse as the scenarios grow longer or as precision grows finer. The third and more serious problem is that the timeline of the simulation is only partially under simulator control. The statistics given to the autoscaler are timestamped based on the simulator's scenario. But the autoscaler itself keeps internal time which is not controlled. Results are inconsistent because the autoscaler assumes events are occurring in a real clock time and bucketizes statistics on that basis. Meanwhile the simulator is injecting timestamped statistics at a faster-than-clock-time pace. As the simulator effort proceeds, there will need to be a way for the simulation to directly control the time perceived by the autoscaler's decision system. Discrete event simulator harnessDiscrete event simulation is based on the idea that the simulated system proceeds from state to a successor state a specific points of time; these are the "discrete events". This assumption enables next-event simulation. In next-event simulation, events are placed into a "Future Event List" (FEL), typically a priority queue, ordered by the timestamp of when they are going to occur. This has two key performance implications:
This means that simulation time becomes proportional to the number of events, not the number of increments. Time can be modeled with arbitrary precision without increasing running time. A simulation at nanosecond precision takes approximately the same time to run as a simulation at second precision. My main reference for this work has been Simulation Modeling & Analysis, 5e by Averill M. Law. In the first two chapters it gives a detailed description of an implementation, including sample C source. Other textbooks in this area appear to have similar contents. The design outlined has the Future Events List (FEL). The FEL is iterated over by a single function which then reacts based on a single switch statement. In each branch of the switch, various global variables are manipulated which represent the modeled system. I'm not a big fan of having a single switch that manages global variables. In the book Law mentions "process-oriented" discrete event simulation as an alternative. The difference appears to be that a process-oriented approach models represent individual entities in the system, rather than manipulating global variables. But the description given is confined to a few paragraphs; no example implementation is discussed. My very light searches of literature did not turn up any other work which describes implementation of process-oriented systems with the level of detail the textbook devotes to the switch/global implementation. The closest I found was tutorial literature for the SimPy simulation library. There is an extensive literature on parallel simulation using multiple event lists and what appear to be various kinds of consensus protocols. However I judged these as not immediately applicable to my work. My current approach retains a single FEL, managed by an As events occur, their name and timestamp is passed into the Most often, this involves scheduling a subsequent action for the same entity. For example, the Problems with the discrete event simulatorEvent orderingThe main problem is event ordering across multiple interacting entities with their own timelines. Take But consider the It violates the basic causality of the model if a Slight FSM clunkinessThe representation for FSMs provides the basic features I need (states, events, transitions). But I am uncertain whether it has the precise representation I would like. In particular, most states in the system will have limited lifetimes, but there's no inbuilt notion of "this state will last for X duration", without needing to do the event-creation plumbing myself. I'm not sure if this is a real difficulty or just a mild annoyance. What's next?The simulator harnessThe simulator harness itself needs to solve the ordering problem without coupling entities so closely. I have been considering how. One option would be to allow the event time to be resolved after scheduling has occurred -- a rule could be provided saying "X will happen after a currently-unknown Y, plus Z additional duration". This still requires entities to know about each other's event types, but will remove some of the conditional logic currently sprinkled across different Another option would be that entities could subscribe not just to events being popped from the FEL, but to events being pushed into the List. For example, a The autoscaler API surfaceAs noted above, the autoscaler keeps its own schedule for making scaling decisions. The simulation effort will need some way to control this perception of time so that intermediate time intervals may be skipped. Currently the simulator harness does not drive the autoscaler directly. It has been necessary to work on the more abstract part of the system before marrying the two elements. I expect that point will lead to discoveries of other impedance mismatches. StatisticsThe harness does not collect any statistics. I had deferred this question to focus on its design. However without collecting figures, it will not be useful for its original purpose of allowing easier study of behaviour under different Code qualityThe current codebase is consciously a spike: an exploratory effort intended to help me understand the problems better. When I feel confident I have solved the design problems above, it will be time to put the first codebase aside and reimplement the design in a proper test-first manner. |
A quick update from where I was last week. I made two major changes. Listening for events being scheduledI added a concept of listening for an event being scheduled, but not having yet occurred. This allows various entities to schedule their events ahead and behind other events from cooperating entities. Stock MovementI introduced a concept of "Stock Movement", where a It's currently an extension of the pre-existing This solved one class of problems (how does a At the start it seemed that some things are state-y and some things are movement-y. For example, a I'm now kicking around the idea that next week, I may try to push through to reunifying the event types and making everything movement-based. This may also simplify It scales!Here I bury the lede a bit. I now have the Knative Autoscaler receiving statistics from simulated components and making decisions which are acted on by other simulated components. Here's a trace of one of the first runs that worked end-to-end:
|
Basic question: what does the first column of the trace represent (the "G"s and "M"s)? |
Oh right! 'G' is a The trace output will probably change. I spent some time working up diagrams of how a more completely "stock-and-flow" approach might look (I'll attach some links momentarily). I'm hoping to perform that conversion this week. |
For those still following along, there's now an open repo that I'm working out of: https://github.com/pivotal/skenario The key evolution in the design is that the organising concept is the movement of entities between stocks. The future-scheduling concept is retained. So far I've managed to avoid introducing In terms of implementation I've discarded the previous prototype / spike code and worked my way back with tests. It's lacking fully outside-in testing (and most bugs have only surfaced at the feature level), but there's reasonably good unit coverage. |
I'm now working entirely out of the Skenario repo, so I think this issue can safely be closed. |
/area autoscale
/area dev
Introduction
Autoscaling is a surprisingly difficult problem with a number of nested and overlapping problems to solve. In developing solutions, we can aim to validate or invalidate our design hypotheses in three main ways:
This proposal addresses the third validation approach.
Why simulation?
Because each validation approach has different strengths and weaknesses. Empirical validation is the final word, but is a slow-moving feedback loop (hours to months). Theoretical validation can dramatically shrink the solution search space, but is less accessible to each of our key personae (developers, operators, contributors) and does not yet address problems that remain unsolved in the research literature.
Simulation splits the difference: it is faster than empirical validation with the risk of inaccuracy, simpler than theoretical validation with the cost of implementation. Simulation is intended to provide contributors with the ability to rapidly explore the design space and iterate on solutions. It is also intended to illuminate potential problems in advance of implementation. Simulation will probably provide input into autoscaling, routing, serving and upstream projects.
Previous and related work
Project riff
For Project riff, I and @glyn developed a simple workload simulator[0][1] for the original riff autoscaler implementation.
This simulator walks the autoscaler implementation through three basic traffic scenarios: a step function (to test behaviour under traffic spikes), a linear ramp-up and ramp-down (to test smoothness of response under ideal conditions), and a sinusoidal function (behaviour in the face of yo-yo attacks).
It then produces simple charts of behaviour to allow for manual inspection and characterisation of autoscaler reaction. We had intended to extend the design to allow for more advanced applications -- for example, statistical characterisations (volatility, P99 etc) to quickly detect regressions in performance.
Within Knative
Related issues include:
Related documents for which simulation may help test hypotheses:
In the literature
Autoscaling literature is quite energetic, but mostly focuses on predictive scaling. Using techniques such as fast fourier transforms, linear regressions, neural nets and the like, researchers have been able to forecast future demand under conditions of predictable variation.
For example:
Less well-treated (based on cursory searching) has been reactive autoscaling of the kind we're working on here. In Project riff we were looking at applications of basic control theory (inspired by Feedback Control for Computer Systems: Introducing Control Theory to Enterprise Programmers by Phillipp K. Janert). We had begun to investigate applications of queueing theory (for which see Performance Modeling and Design of Computer Systems: Queueing Theory in Action by Mor Harchol-Balter).
Proposed approach
Internals
It may be easier to reimplement a simulator than to adapt riff's existing simulator.
The existing design is a fixed time step simulation. It updates all parts of the simulation at each time slice.
A reimplementation should instead develop a next event time simulation. These run faster than continuous-time simulations as they are able to skip time slices during which nothing interesting has occurred.
Usage
The current graphs output by the riff simulator are generated by gnuplot. This was useful for rapid iteration, but gnuplot's language can be a bit arcane. It also stands somewhat aside from larger data science and analysis ecosystems. It may be easier to focus on spitting out raw data and then leaving graphing and analysis to other tools.
In particular, we should aim to have a uniform approach for analysing both simulated and empirical cases.
Further out
In the longer run, we may be able to continuously search the design and configuration space with Monte Carlo methods. We also ought to test for simulated regressions during local tests as a fast first feedback loop to contributors.
The text was updated successfully, but these errors were encountered: