Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

fleetctl: check systemd active state via client API #1667

Merged

Conversation

dongsupark
Copy link
Contributor

@dongsupark dongsupark commented Aug 19, 2016

It's not appropriate to check systemd active states with DBUS connection, as that will return correct states only on the same machine. So we should also introduce a check using cAPI.UnitStates(), to check states on other machines.

@dongsupark dongsupark self-assigned this Aug 19, 2016
@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch 3 times, most recently from 839d97f to 0cb4369 Compare August 19, 2016 15:22
@dongsupark dongsupark force-pushed the master branch 7 times, most recently from 3b60e93 to 875d938 Compare August 23, 2016 11:00
@jonboulle
Copy link
Contributor

I don't think it's ever really sane to fall back to D-Bus, is it?

@dongsupark
Copy link
Contributor Author

@jonboulle Good question.
In the beginning I also tried to get it working only via cAPI. But it didn't work as expected. For some units it's reported to be not active, which made functional tests fail. So I had to fall back to checking via DBUS too, to get it working. I'd also like to know the reason.

@dongsupark dongsupark force-pushed the master branch 2 times, most recently from bdc94e8 to fa5aa3a Compare August 30, 2016 12:26
@jonboulle
Copy link
Contributor

@dongsupark could you link to an example of test failures in that case?

@dongsupark
Copy link
Contributor Author

@jonboulle
Reverted fallback, printed out more debug messages:
https://gist.github.com/dongsupark/c444e10ad843090cc45750b85db51d7e

@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch from ffecd1e to ee5585d Compare August 31, 2016 13:05
@dongsupark dongsupark changed the title fleetctl: check systemd active state also with client API fleetctl: check systemd active state via client API Aug 31, 2016
@dongsupark
Copy link
Contributor Author

Finally I fixed a bug, and force-pushed.

It turns out, checking states via client API has been already working correctly, and "fleetctl list-units" has actually shown correct output. The only broken thing was the way of using loop around waitForSystemdActiveState(), where it didn't actually repeatedly try to retrieve states. So I fixed the logic to make it try to fetch the states repeatedly up to 15 seconds. The fallback routine to checking DBUS connection was removed.

Please have a look. Thanks.

@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch 2 times, most recently from 4712b14 to 88a0a30 Compare August 31, 2016 14:03
@@ -1067,78 +1072,87 @@ func assertUnitState(name string, js job.JobState, out io.Writer) (ret bool) {
// tryWaitForSystemdActiveState tries to wait for systemd units to reach an
// active state, making use of the systemd's DBUS interface. First it takes one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring needs updating now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle Done. Thanks for spotting it.

@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch from 88a0a30 to 6a64a4e Compare September 6, 2016 13:43

// getSingleUnitState returns a single uState of type suState, which consists
// of necessary systemd states, only for a given unit name.
func getSingleUnitState(unitName string) (suState, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about the efficiency of this - UnitStates() is a relatively expensive call, and we're only concerned about a single item. It's actually a little worse because we're potentially doing this in multiple goroutines simultaneously, per waitForUnitStates.

Two suggestions:

  1. introduce a UnitState.Get call (similar to how, for units themselves, we have both Unit.Get and Unit.List)
    or,
  2. use a single UnitState.List call and share the results across all goroutines (in waitForUnitStates)

thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle
First of all, getSingleUnitState() is being called only by waitForSystemdActiveState(), not by waitForUnitStates(). So it's not true that getSingleUnitState() is called from multiple goroutines, because waitForSystemdActiveState() simply iterates over units in a sequential way.

On the other hand, your concern is still valid, because it's basically not efficient to call cAPI.UnitStates() for each unit. I think I'm going to choose the 2nd option, which is calling only a single UnitState.List to share its results for all units. Let me see.

Thanks for pointing it out! 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongsupark ah! good point, sorry about that. But now you've pointed that out - wouldn't it make more sense to unify the behaviour of the two so that they're both parallelised? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle
I pushed a new commit to call cAPI.UnitStates() only once. Please have a look.

I actually tried to also introduce cAPI.UnitState(), to allow this code to fetch a single unit status. However, that's definitely a lot of work all over the place in fleet. (schema, etcdRegistry, rpcRegistry, etc) So I'll do that in the future, creating another PR.

As for using goroutines also for waitForSystemdActiveState(), I'll try to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle Force-pushed the commit, after changing the logic from sequential iteration to goroutines.

@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch 2 times, most recently from d7da0fc to d919ee8 Compare September 7, 2016 15:26
func checkSystemdActiveState(apiStates []*schema.UnitState, name string, maxAttempts int, wg *sync.WaitGroup, errchan chan error) {
defer wg.Done()

if maxAttempts < 1 {
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly fond of the duplication of these loop bodies - can we instead do something more clever with the loop conditional to collapse them into a single for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle Good point. I just split out the common part into assertFetchSystemdActiveState(), and let checkSystemdActiveState() go into a single loop. Such duplication was just a result of copy-and-paste.

@jonboulle
Copy link
Contributor

Looks pretty good overall

@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch from 32a88e2 to 7ce7460 Compare September 19, 2016 11:50
Dongsu Park added 3 commits September 19, 2016 13:50
To be as idiomatic as possible, do so:

  * make tryWaitForSystemdActiveState() return error instead of int
  * pass getBlockAttempts(cCmd) to tryWaitForSystemdActiveState()
    to handle such cases of --no-block or --block-attempts.
  * make assertSystemdActiveState(), waitForSystemdActiveState() return
    just error, not with 'found', as it's unnecessary.
  * remove unnecessary checking against conn == nil.
  * compare strings with '!=' instead of strings.Compare() != 0
Now that tryWaitForSystemdActiveState() returns error instead of int,
tryWaitForUnitStates() should also return error instead of int, to be
as idiomatic as possible.
It's not appropriate to check systemd active states with DBUS
connection, as that will return correct states only on the same machine.
So we should introduce a check using cAPI.UnitStates(), to check states
on other machines.
@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch from 7ce7460 to 413f6e3 Compare September 19, 2016 12:19
@dongsupark
Copy link
Contributor Author

@jonboulle Now I suppose everything was addressed. I'll merge it soon.

@dongsupark dongsupark force-pushed the dongsu/fleetctl-capi-systemd-states branch from 413f6e3 to 988fd84 Compare September 19, 2016 13:38
As it's basically not efficient to call cAPI.UnitStates() for each unit,
let's call it only once outside goroutines for processing units, i.e.
calling in waitForSystemdActiveState(), not in getSingleUnitState().
To do that, we need to share the result apiStates across all units.

Also make use of goroutines in waitForSystemdActiveState(), instead of
sequential iteration over each unit.

Note that we still need to keep calling UnitStates() when
assertSystemdActiveState() failed, because in the next attempt the old
apiState is not necessarily going to be valid. Ideally in that case, we
should be able to fetch the state only for a single unit. However, we
cannot do that for now, because cAPI.UnitState() is not available. In
the future we would need to implement cAPI.UnitState() and all
dependendent parts all over the tree in fleet, e.g. schema, etcdRegistry,
rpcRegistry, etc. to replace UnitStates() in this place with the new
method UnitState(). In practice, calling UnitStates() here is not as
badly inefficient as it looks, because it will be anyway rarely called
only when the assertion failed.
@dongsupark dongsupark merged commit 49420c5 into coreos:master Sep 19, 2016
@dongsupark dongsupark deleted the dongsu/fleetctl-capi-systemd-states branch September 19, 2016 14:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants