Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging: Remove linker 'magic' and just use statics for logging callsites #322

Merged
merged 3 commits into from
Sep 27, 2018

Conversation

chrissie-c
Copy link
Contributor

It is my (and several others') opinion that the linker 'magic' used to maintain callsite data in libqb is hugely over complicated and unnecessarily fragile. It's main purpose seems to have been to improve performance but empirical testing shows this to be tiny at best. The overhead of sprintf makes minor optimisations in this code pointless.

With this code removed, libqb allocates callsites using a C static variable at run-time. This sounds bad but in actuality it merely moves the allocation from program load time to the first few milliseconds of program run-time. Applications like corosync and pacemaker spend most of their time in small loops doing the same work over and over again so the overhead doesn't apply and jitter does not occur.

We've tested this with corosync and pacemaker under valgrind and massif and the differences are minimal and even then only show up under artificial stress testing.

For this change I've bumped the soname up to 20 to indicate this is an incompatible change. I'm open to suggestions as to a release number but am currently thinking of 1.2.0

@jnpkrn
Copy link
Contributor

jnpkrn commented Sep 12, 2018

Once more, opinion != hard data.

The justification is unsound to say the least.

it merely moves the allocation from program load time to the first few
milliseconds of program run-time

I beg your pardon?
/me still can't believe this doubly misled statement


Are there actual persistent issues around this feature? Let's fix them.

Is any client program too worry to run into problems?
Ok, libqb.pc explains how to compatibly drop that feature for that
very program, and not universally, which would break the compatibility
for everyone. Do you want a specific libqb-nostaticlogdata.pc to
ease that?


Come on, point me at any project and I am confident I can find a feature
that can be dropped while retaining the core. That's easy, very little
effort is needed (as opposed to pushing the limits smartly, which is
exactly how I'd identify the callsite section utilization).

But it always comes at cost of something else, here:

@fabbione
Copy link
Member

ack from me.

@jfriesse
Copy link
Member

count me in the queue of ackers, so ACK

@chrissie-c chrissie-c merged commit 633f262 into master Sep 27, 2018
@jnpkrn
Copy link
Contributor

jnpkrn commented Oct 5, 2018

Add one more disruption to already a long list why this is basically
reliability alienation:

  • when some call fails for out-of-memory condition and proceeds to
    at least log about this possibly recoverable problem, it will
    now exteremely likely fail doing it that as well

Congrats!

@gao-yan
Copy link
Member

gao-yan commented Oct 30, 2018

Thanks for the solution. Is there any plan when to release a new version with this?

It appears dlm_stonith is basically broken with libqb-1.0.3:

# dlm_stonith -n 172204701
dlm_stonith: utils.c:55: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed.
Aborted (core dumped)

@chrissie-c
Copy link
Contributor Author

Rebuilding dlm might fix that.

I'm going to start on a 2.0rc series shortly. Once we get the hi-res timestamps merged I think.

@gao-yan
Copy link
Member

gao-yan commented Oct 31, 2018

Thanks for the information.

It was rebuilt but didn't work. But now I realized dlm_stonith calls stonith_api_kick_helper() but it's not explicitly linked with libstonithd according to the Makefile. No idea why, but it's probably relevant...

@gao-yan
Copy link
Member

gao-yan commented Oct 31, 2018

Oh, well, stonith_api_kick_helper() is an inline function from pacemaker's stonith-ng.h header file which manipulates stuff with dlopen() and dlsym(). This doesn't seem to work with libqb-1.0.3 though.

@GitSoftwareNow
Copy link

@jfriesse which version of libqb is correct?

@GitSoftwareNow
Copy link

@jfriesse which version of libqb is the corosync 3.0.2 relayed?

@jfriesse
Copy link
Member

@GitSoftwareNow Corosync works with most of the versions, but you must keep in sync the installed version and version which is corosync linked with. Currently I wouldn't be to hesitate to recommend current git master.

Also please consider to use corosync issue you've opened for other questions, because commenting under already merged libqb PR is not optimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants