Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp APIcast configuration loading #230

Closed
2 tasks done
mikz opened this issue Jan 17, 2017 · 21 comments · Fixed by #270
Closed
2 tasks done

Revamp APIcast configuration loading #230

mikz opened this issue Jan 17, 2017 · 21 comments · Fixed by #270

Comments

@mikz
Copy link
Contributor

mikz commented Jan 17, 2017

There are several ways how we want to load, store and expire configuration in different scenarios.

The main use cases are:

  1. load configuration on boot, crash otherwise
    Known as the production mode. Configuration is loaded only on boot and not reloaded.

  2. load configuration on demand and reload on every request
    Known as the staging mode. Every requests downloads new configuration, that is used
    only for that request. Does not load configuration on boot because it would be immediately reloaded on first request. It can either load whole configuration or just for one host. (Per request configuration store #224)

  3. load configuration on boot, reload every N seconds
    Known as the production mode. Every N seconds after boot, new configuration should be downloaded and applied.

  4. load configuration on demand, expire it after N seconds
    Known as cloud mode. Does not get configuration on boot, but only on demand for one host.
    Every configuration expires after N seconds and can be loaded on demand later again. (Store configuration with TTL #229)


The points 4 and possibly 2 need to pass host to the API call to get the configuration.
Point 3 needs to define what to do when configuration can't be downloaded.

Right now there are 3 flags that control this behaviour:

  • APICAST_RELOAD_CONFIG that reloads config on every request
  • APICAST_MISSING_CONFIGURATION that controls what to do on missing configuration (exit,log)
  • AUTO_UPDATE_INTERVAL that controls how often to reload config in the background

I'd like to unify this to one max two configuration options. There are some incompatibilities between some methods like:

  • loading config per host and automatic refresh (too heavy with many services)
  • loading config per host on boot

Proposal

Change AUTO_UPDATE_INTERVAL to work in a following way:

  • -1 - no auto reload, all configuration is persisted until restart
  • 0 - no cache, configuration is reloaded on every request
  • N - expire/reload record in N seconds

This variable would be used in all the use cases to control how often configuration is reloaded.

Introduce APICAST_CONFIGURATION (proposals for different name welcome) with following options:

  • boot - requires configuration on boot or crashes
  • lazy - download configuration on demand for each host

Please note that boot should crash on missing, not empty configuration.
Missing configuration can be when it fails to connect to the MT server and
empty is when it can connect but fetches 0 services.

Verification

  1. -1 + boot - load on boot or crash, cache forever
  2. 0 + lazy - load on demand, cache for one request
  3. N + boot - load on boot or crash, cache for N (undefined behaviour on config reload)
  4. N + lazy - load on demand, cache for N (undefined behaviour after expiration)

TODO

  • define what to do if reload fails in lazy mode (put the stale record back, just remove the configuration, retry next time, ...)
  • define what to do if reload fails in boot mode (keep the stale record, remove, retry, ...)

Concurrency and cache miss

In lazy loading we need to take care of concurrent loading of the same configuration for several requests in parallel.

There should be only one request to get configuration for one host at the same time.
There can be several requests for distinct hosts at the same time.

We probably should define this cross worker via shared memory lock and not just in one worker via a semaphore.

When there is a cache miss and configuration has to be loaded, for the interval between the first request comes and configuration is loaded, every request would trigger loading.

To limit number of concurrent loading configurations we have to wait for one inside the worker by using a semaphore and across workers using shared memory lock.

Changes

This changes several behaviours from 2.0:

  • staging environment would boot without access to the configuration, now crashes
  • staging could load just the configuration needed for one host, not for everything
  • production environment would boot with 0 services (but still requires MT connection)
@mikz mikz changed the title Lazy load configuration Revamp APIcast configuration loading Jan 19, 2017
@mikz mikz self-assigned this Jan 19, 2017
@MarkCheshire
Copy link
Contributor

Can't it be simplified by eliminating the difference between boot and lazy.
Under what circumstances would you not want to boot on load?

@mikz
Copy link
Contributor Author

mikz commented Jan 19, 2017

@MarkCheshire for example when you have cloud environment with 10000 services.
You want to load just the ones being used and keep then in memory just for the necessary time.

Also the configuration loaded on boot is discarded when the first request comes and needs new configuration (like staging). That just creates dependency on a service that has to be running, but is not used until later when the actual request comes.

@thomasmaas
Copy link
Member

thomasmaas commented Jan 19, 2017

@mikz what we could do is have 1 of the 2 behaviours as the default and have a 0/1 switch to enable the non-default behaviour.

i.e. default could be load on boot or crash but for saas we would would set LAZY_LOAD to 1

@thomasmaas
Copy link
Member

naming wise: In general it would be good if all APIcast configuration options would be in the same family of names: or all should be preceded with APICAST or none. I'd say none as it's pretty clear you are within APIcast context.

@mikz
Copy link
Contributor Author

mikz commented Jan 19, 2017

@thomasmaas the true default would probably be the production setting:
AUTO_UPDATE_INTERVAL=-1 and APICAST_CONFIGURATION=boot
Non reloading configuration that is loaded on boot or crashes.

Then there are other combinations like:

  • AUTO_UPDATE_INTERVAL=300 and APICAST_CONFIGURATION=boot = apicast hosted production
  • AUTO_UPDATE_INTERVAL=0 and APICAST_CONFIGURATION=lazy = apicast hosted staging
  • AUTO_UPDATE_INTERVAL=300 and APICAST_CONFIGURATION=lazy = apicast cloud hosted production
  • AUTO_UPDATE_INTERVAL=0 and APICAST_CONFIGURATION=lazy = apicast cloud hosted staging

Re the naming. I agree. #207

@MarkCheshire
Copy link
Contributor

I assume the caching is done at the per service level. Can't you just eliminate option 3? Then the only load on boot is option 1. While in option 3 the config for each service is loaded when the first request arrives. It would simply things.

@mikz
Copy link
Contributor Author

mikz commented Jan 19, 2017

@MarkCheshire that is how APIcast hosted works on-premises. It keeps all the configuration in memory so the first request is fast. It does not lazy load configuration to keep it production ready.
But also it has to reload every N seconds when configuration is updated.
Also it must crash when it can't get configuration (RHAMP-10) which has to happen only on boot.

@thomasmaas
Copy link
Member

thomasmaas commented Jan 19, 2017

@milkz so i would say we get rid of APICAST_CONFIGURATION and add LAZY_LOAD which defaults to 0. Than all we need to really talk about with customers is AUTO_UPDATE_INTERVAL (if I understand correctly)

@mikz
Copy link
Contributor Author

mikz commented Jan 19, 2017

@thomasmaas and when we need to introduce 3rd state just change it again to be enum ?

If the boot is the default, then we are just talking LAZY_LOAD=1 or APICAST_CONFIGURATION=lazy to turn the lazy mode on.
And I don't really see the value in the LAZY_LOAD.
It is less flexible and can't be extended in the future. It is more error prone to typos.
If there is just one way how to configure the value like APICAST_CONFIGURATION we can validate the value and crash when it is unknown. When you mistype LAZY_LAOD you are screwed and we won't notice.

@MarkCheshire
Copy link
Contributor

How much of a benefit it is for APIcast hosted on-prem to be able to load on boot? The penalty for lazy load is not that great if it is only the first call to each service that has to grab the config.

@hallelujah
Copy link
Collaborator

hallelujah commented Jan 19, 2017

AUTO_UPDATE_INTERVAL

I would change this as:

AUTO_UPDATE_INTERVAL to work in a following way:

  • "" (no value) - no auto reload, all configuration is persisted until restart
  • 0 - no cache, configuration is reloaded on every request
  • N - expire/reload record in N seconds

But reading the comment I think you want it to be set explicitly to -1 so we know the configuration, in this case I agree to let any negative number work (but -1 would be documented), and empty would be equivalent to 0 then. Maybe I make it worse ... just suggesting 😄

Lazy loading

About lazy loading, I think it should be the standard, I do not see the value of boot then crash if no configuration. I Agree with @MarkCheshire
Log it, then it will be up to a monitoring service to either shut down the server. Decision can be done externally and less variables to use.

Errors

If loading a configuration leads to an error
Then again log it, do not expire the old config
It would work like varnish does.

Summary

  • Remove APICAST_CONFIGURATION it will be only lazy loading. First request certainly would be slower because of download of config
  • Each N seconds, redownload the config file, keeping the old one. If no error in the new config, serve with the new config and switch with the older one. Or if config has not changed, just do not download it. If errors in the new config file, log it and keep the old one

My 2 💵

@mikz
Copy link
Contributor Author

mikz commented Jan 19, 2017

@MarkCheshire @hallelujah RHAMP-10 describes that APIcast should crash when it can't get config. That is initial requirement that came from debugging hard to reproduce network errors.

I definitely want my service to crash, when it gets invalid config. I think @3scale/operations could have opinion on this too.
Waiting for a request to see that there is no route to the server that provides the configuration is not good.

Loading the configuration on boot from the remote API is quite consistent with giving it configuration in a config file. It would also be loaded on boot and crash when invalid or missing.

Imo the semantics of lazy loading are defined like: start with empty configuration, load configuration on demand by host. Where the common scenario for any stable environment is: start with this configuration.

I think preloading the configuration is maybe a better term than boot. Would preload and lazy be clearer?

@andrewdavidmackenzie
Copy link
Member

Yes, a server that is started with a missing or invalid config - should fail health checks made to it - whether any client requests are made to it or not.

@mikz
Copy link
Contributor Author

mikz commented Jan 20, 2017

I guess that settles that preloading configuration is a requirement.

Here is my proposal for configuration reloading failures (both modes):

When AUTO_UPDATE_INTERVAL is set to:

  • 0 - request would give 404 as there is no service for that request, next request would try to load new configuration
  • -1 - no problem as the configuration is loaded already and not expired, so no need for reloading

For boot mode:

  • N - take stale value and put it back to the cache with the same N TTL, immediately start retry loop that would try to load new configuration until next N with exponential back off

For lazy mode:

  • N - take stale value and put it back to the cache with small (timeout of the auto update) TTL, trigger one auto update in the background to replace the value

Will be updating Concurrency section in the description.

@mikz
Copy link
Contributor Author

mikz commented Jan 20, 2017

@hallelujah re the values of auto update. The default should be the most stable production mode. That would mean cache forever. I guess the empty value could mean -1. Would that be ok?

@mayorova
Copy link
Contributor

mayorova commented Jan 20, 2017

Some questions/comments:

  • Is the auto-reload every N is actually downloading and trying to apply the conf? I think we first need to check what is the latest version available, and if it's the same that is already loaded and used by the gateway, do nothing.

  • In lazy load, how will apicast know what configuration to download on incoming request? Based on what? Hostname? What if path-based routing is used?

  • (maybe out of scope for this discussion, but I think it's important) In case there are multiple instances of apicast, how does the configuration get synchronized between the instances? Maybe we need to have a option where the cache can be stored (in-memory vs. Redis etc.)

Also, I'd prefer to have a way to "force" configuration reload rather than making it all automatic (and wait for the TTL until the conf is applied to test if all is OK). For example, extend the /boot endpoint of the management API to only reload a specific service. Although I understand, that in case of multiple apicasts this may be a problem also :/

@mikz
Copy link
Contributor Author

mikz commented Jan 20, 2017

Yes auto-reload every N is going to download the configuration and apply it. The idea is that the http_ng client will act as http cache, so it will just return cached configuration in case it did not change (which is going to be ensured by Last-Modified and ETag headers). That whole part would be pushed to the the HTTP layer where it belongs.

I guess you are afraid what happens between the time when the configuration expires and between new is downloaded. Lets go through the scenarios:

  • boot + N - stale configuration is going to get used until new one is reloaded, no performance hit should be visible
  • lazy + N - stale configuration is going to get used until ew one is reloaded, no performance hit on reload, small hit on the first call or after N*2 when configuration can't be loaded

To answer your points:

  • Yes by hostname. In case path based routing is used it loads all matching hosts.

  • That is being described in the Concurrency and cache miss description. For clusters running under same master process we should use shared memory. For cross machine clusters - orchestrate it yourself for now.

You can force configuration reload by signalling nginx to reload for example. Why would you need to reload the configuration? There are options to do all of the options right now:

  • disable reloading and reload it yourself (HUP signal IIRC)
  • reload on every request (for development)
  • reload every N seconds (same as now and in apicast hosted)

For self-managed apicast you can totally go with self reloading. Or by forcing the reload by /boot.
I'm sure in the future we will improve the management API, but right now there is not much use of it other than inspecting running configuration.

@hallelujah
Copy link
Collaborator

OK I understand! Thanks.
@mikz About the value of auto update, yes that is OK though unintuitive but if it is documented that's fine!

@mikz
Copy link
Contributor Author

mikz commented Jan 20, 2017

@hallelujah what is unintuitive? The default configuration should be as stable and production ready as possible that means no reloading and load configuration when starting or crash. That is the default.

@hallelujah
Copy link
Collaborator

Well it is just a dev thought, nothing really bad. I meant empty == -1 is kind of unusual to me :)
Anyway I am pretty OK with everything!

@mikz mikz mentioned this issue Feb 15, 2017
5 tasks
@thomasmaas
Copy link
Member

Update: decided that for on prem we want the following behaviour:

  • crash when no connection to system
  • don't crash when no configuration

(this will allow us not to have to save config to staging env and promote to production env but let the customer save the first config to )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants