Add a message-id parameter for messages #433

aleksei-burlakov · 2021-01-07T15:03:31Z

DON'T MERGE

The message-ids are used by systemd catalogs. (see ClusterLabs/sbd#117)

To enable this feature the libqb should be configured with the
--enable-systemd-journal option.

IMPORTANT:

The libqb exposes the

struct qb_log_callsite
qb_log_callsite_get(...
qb_log_from_external_source_va(..

To make it completely backward compatible, they all three should come with
and without the message_id parameter. This would however require repeating a
good half of the libqb. We did only a half-measure: we provided the
qb_log_from_external_source_va2
to make it working with the corosync. It's not however backward compatible with the pacemaker.

@beekhof , @jfriesse , @jnpkrn , please give your feedback. (The PR done in the continuation of the conversation started here ClusterLabs/sbd#117 thought a bit later than planned). It's mostly interesting if it should be backward compatible and if so, then how to make it.

knet-ci-bot · 2021-01-07T15:03:34Z

Can one of the admins verify this patch?

kgaillot

I'm conflicted about the idea. The capability is nice but the code gets messy. I'm hesitant about adding target-specific arguments to otherwise generic logging functions (I'm sure there's extra info we could add to syslog or other targets).

include/qb/qblog.h

kgaillot · 2021-01-08T17:19:52Z

lib/log_dcs.c

-				cs = csl->cs;
-				break;
+			    strcmp(safe_filename, csl->cs->filename) == 0 &&
+			    strcmp(safe_message_id, csl->cs->message_id) == 0)


I'm not clear on how this code works -- is checking message ID necessary here? Maybe only if we allow filtering by ID?

lib/log_syslog.c

chrissie-c · 2021-01-13T15:13:39Z

I'm not against the idea in principle (I've worked in mainframe environments where Message-IDs have been very useful to me), but we need to be sure that a) people using libqb do actually want this and will use it and b) it is 100% backwards compatible.

aleksei-burlakov · 2021-01-21T15:30:01Z

Unfortunately that still breaks backwards compatibility. The signature of qb_log_callsite_get() can't be changed because it's already embedded in binaries that have linked with the existing qblog.h file - it's in the qb_logt (and hence also qb_log) macro. So there would have to be a new call that had the message ID in it which is called by new binaries and the exiting one left untouched that, probably, calls the new one with a NULL message_id.

I thought I was smart to simply use the conditional compilation, but unfortunately, it's not a solution. I will just do as #433 (comment) here (assuming struct qb_log_callsite can be altered backward compatibly).

kgaillot · 2021-01-21T15:39:36Z

I thought I was smart to simply use the conditional compilation, but unfortunately, it's not a solution. I will just do as #433 (comment) here (assumung #433 (comment) is correct).

Sadly ABI backward compatibility is very unforgiving. In Pacemaker we have tons of APIs I would love to change ...

The basic idea is that an application linked against a shared library with a particular version can continue to be used without requiring code changes, recompilation, or relinking if the library is upgraded to a newer version that is ABI compatible. New APIs can be added, new members can be added to the end of structs (if the struct can only be allocated by library functions), new enum values can be added to the end, etc., without breaking compatibility. But anything that changes a function signature, or changes an existing struct member type or order, etc., requires a shared object version bump to force applications to at least be re-linked (otherwise they could have undefined behavior if used with the newer library).

aleksei-burlakov · 2021-01-21T15:42:10Z

I thought I was smart to simply use the conditional compilation, but unfortunately, it's not a solution. I will just do as #433 (comment) here (assumung #433 (comment) is correct).

Sadly ABI backward compatibility is very unforgiving. In Pacemaker we have tons of APIs I would love to change ...

The basic idea is that an application linked against a shared library with a particular version can continue to be used without requiring code changes, recompilation, or relinking if the library is upgraded to a newer version that is ABI compatible. New APIs can be added, new members can be added to the end of structs (if the struct can only be allocated by library functions), new enum values can be added to the end, etc., without breaking compatibility. But anything that changes a function signature, or changes an existing struct member type or order, etc., requires a shared object version bump to force applications to at least be re-linked (otherwise they could have undefined behavior if used with the newer library).

I really missed the point that the structure is only allocated in the library. It should be backward compatible then=)

kgaillot · 2021-01-21T15:45:46Z

Also, it would be nice if we could just bump the shared object version anytime we wanted, so we could change the APIs, but that requires all clients to be rebuilt and possibly even require code changes, which is a burden to their developers and packagers. Many OSes/distributions avoid breaking ABI compatibility within a major release, so it also delays when they can adopt the new version. For Pacemaker I'm aiming for roughly at least 6 years between ABI breaks (the last one was 2.0.0 in 2018).

Of course APIs that aren't exposed in installed headers, and any that are but are explicitly documented as internal, aren't subject to any limitations and can be changed however needed.

aleksei-burlakov · 2021-01-22T15:18:12Z

@kgaillot @chrissie-c please have a loot at the latest change. I've made it backward compatible. It's only left to add new tests.

chrissie-c · 2021-01-28T09:21:54Z

This is starting to look good now, thanks!
One small thing while you're working on the unit tests, can you also update the doxygen tags in qblog.h please? I'm getting a warning that message_id is not documented.

chrissie-c · 2021-01-28T16:21:24Z

test this please

aleksei-burlakov · 2021-02-09T20:29:44Z

I'm getting a warning that message_id is not documented.

Fixed.

test this please.

I added a tiny change to the test_journal, but it should be enough.

chrissie-c · 2021-02-10T07:42:19Z

test this please

chrissie-c · 2021-02-10T07:43:30Z

@aleksei-burlakov sorry, when I say "test this please" it's an instruction to the CI to test the new code - not to you :)

aleksei-burlakov · 2021-02-12T14:33:27Z

@aleksei-burlakov sorry, when I say "test this please" it's an instruction to the CI to test the new code - not to you :)

So it's a magic spell). Could you please give your feedback about the latest changes?

kgaillot · 2021-02-16T19:48:48Z

I've sent a message to users@clusterlabs.org asking for user feedback on the idea.

chrissie-c · 2021-02-17T08:24:56Z

I've sent a message to users@clusterlabs.org asking for user feedback on the idea.

Thanks Ken

chrissie-c · 2021-02-19T10:15:34Z

The latest patch looks good to me, thanks. One tiny thing: I think it would be good to test the message ID that comes back in the check_log.c:1049 matches the one we sent at line 1029:

diff --git a/tests/check_log.c b/tests/check_log.c
index 1c0d942..039a4bb 100644
--- a/tests/check_log.c
+++ b/tests/check_log.c
@@ -1021,12 +1021,13 @@ START_TEST(test_journal)
        pid_t log_pid;
        sd_journal *jnl;
        int count = 0;
+       const char *msgid="f77379a8490b408bbe5f6940505a777b";
 
        qb_log_init("check_log", LOG_USER, LOG_DEBUG);
        qb_log_ctl(QB_LOG_SYSLOG, QB_LOG_CONF_ENABLED, QB_TRUE);
        rc = qb_log_ctl(QB_LOG_SYSLOG, QB_LOG_CONF_USE_JOURNAL, 1);
        ck_assert_int_eq(rc, 0);
-       qb_log2("f77379a8490b408bbe5f6940505a777b", LOG_ERR, "Test message 1 from libqb");
+       qb_log2(msgid, LOG_ERR, "Test message 1 from libqb");
 
        qb_log_ctl(QB_LOG_BLACKBOX, QB_LOG_CONF_ENABLED, QB_TRUE);
        rc = qb_log_ctl(QB_LOG_BLACKBOX, QB_LOG_CONF_USE_JOURNAL, 1);
@@ -1048,6 +1049,7 @@ START_TEST(test_journal)
                ck_assert_int_eq(rc, 0);
                rc = sd_journal_get_data(jnl, "MESSAGE_ID", (const void **)&msg, &len);
                ck_assert_int_eq(rc, 0);
+               ck_assert_str_eq(msg+11, msgid);
                break;
            }
            if (++count > 20) {

aleksei-burlakov · 2021-02-19T15:17:38Z

The latest patch looks good to me, thanks. One tiny thing: I think it would be good to test the message ID that comes back in the check_log.c:1049 matches the one we sent at line 1029:

diff --git a/tests/check_log.c b/tests/check_log.c
index 1c0d942..039a4bb 100644
--- a/tests/check_log.c
+++ b/tests/check_log.c
@@ -1021,12 +1021,13 @@ START_TEST(test_journal)
        pid_t log_pid;
        sd_journal *jnl;
        int count = 0;
+       const char *msgid="f77379a8490b408bbe5f6940505a777b";
 
        qb_log_init("check_log", LOG_USER, LOG_DEBUG);
        qb_log_ctl(QB_LOG_SYSLOG, QB_LOG_CONF_ENABLED, QB_TRUE);
        rc = qb_log_ctl(QB_LOG_SYSLOG, QB_LOG_CONF_USE_JOURNAL, 1);
        ck_assert_int_eq(rc, 0);
-       qb_log2("f77379a8490b408bbe5f6940505a777b", LOG_ERR, "Test message 1 from libqb");
+       qb_log2(msgid, LOG_ERR, "Test message 1 from libqb");
 
        qb_log_ctl(QB_LOG_BLACKBOX, QB_LOG_CONF_ENABLED, QB_TRUE);
        rc = qb_log_ctl(QB_LOG_BLACKBOX, QB_LOG_CONF_USE_JOURNAL, 1);
@@ -1048,6 +1049,7 @@ START_TEST(test_journal)
                ck_assert_int_eq(rc, 0);
                rc = sd_journal_get_data(jnl, "MESSAGE_ID", (const void **)&msg, &len);
                ck_assert_int_eq(rc, 0);
+               ck_assert_str_eq(msg+11, msgid);
                break;
            }
            if (++count > 20) {

Thanks! I have amended the PR.

aleksei-burlakov · 2021-02-24T08:47:14Z

@chrissie-c @kgaillot do you think it's mergeable?

chrissie-c · 2021-02-24T10:16:27Z

I'm happy to merge it, but I would like Ken's approval too as pacemaker is the major user of the logging subsystem.

kgaillot

Sorry, I had started a review and got distracted and forgot to send it. It looks good to me, just some minor comments.
I'm still not too sure how well this will work in practice, but we won't know until we try ...

kgaillot · 2021-02-16T19:05:44Z

include/qb/qblog.h

@@ -258,6 +259,10 @@ struct qb_log_callsite {
 	uint32_t lineno;
 	uint32_t targets;
 	uint32_t tags;
+	/* message_id must be the last argument to
+	 * make it compatible with the external
+	 * programs that don't use it */


I would drop this comment, since any new members in the future must be added after this one. It's always true that once a member is added, its order cannot be changed without breaking compatibility, so no special comment is needed. (If we did want to make that more explicit, it could be added to any developer documentation libqb has or should have, or as a comment at the top of each public header.)

kgaillot · 2021-02-16T19:06:13Z

include/qb/qblog.h

@@ -319,6 +324,39 @@ struct qb_log_callsite* qb_log_callsite_get(const char *function,
 					    uint32_t lineno,
 					    uint32_t tags);

+/**
+ * Get or create a callsite at the given position.
+ * The same that qb_log_callsite_ge but with the


missing a "t"

Thanks! Fixed.

kgaillot · 2021-02-26T16:11:54Z

lib/log_dcs.c

+{
+	return qb_log_dcs_get2(newly_created, NULL, function,
+				filename, format, priority, lineno, tags);
+}


qb_log_dcs_get() is an internal function and was only used in the one spot, so we can just drop it now (or use its name for the new function)

Indeed. I have added the message-id to the qb_log_dcs_get and removed the qb_log_dcs_get2.

jnpkrn · 2021-02-28T13:44:08Z

Note that this is just a stranded instance of generalized structured
logging I had in mind with #286.

I don't see much sense in bloating API piece-wise whenever new field
seems useful to have in journal (or whatever else backend suitable for
such structured format).

Btw. this is the origin of the process that added structured logging
into Linux kernel, for inspiration:

https://lwn.net/Articles/492125/
https://lore.kernel.org/lkml/1333569554.864.3.camel@mop/

Last I had a look, this provision appeared to be utilized heavily,
and for good reason, I think (the more what happened data you have,
the less likely you'll be patient enough to go through them all on
you own, to the point even plain greping becomes unscalable if
you are only after some patterns of surrounding circumstances,
for instance, you really need reliable data sifting device to move
forward than what mere textual pattern matching can provide).

(Of course, another inspiration can be sd_journal_send itself.)

In addition, there needs to be some set of rules that the keys need to
abide so as not to collide with precious ones that libqb passes on
by itself, and what precious means with systemd-journald, e.g.
prefixed with an underscore.

Also, qb-blackbox would rather gain a possibility to read and
present these structured logging data, following the extension of the
respective ring buffer binary format to contain them, indeed.

Another remark: when you open the ability to push MESSAGE_ID to
journal via libqqb, it can be tempting to log static-only
(unparameterized) messages just per that, with empty MESSAGE.

(EDIT: it is not as I thought, i.e., that MESSAGE_ID would be enough;
apparently it would not tell the observer anything unless augmented
output, journalctl -x, is requested, so it cannot be [ab]used to
"encode message just by its ID" this easily; nonetheless, leaving
the below idea, slightly adjusted, for posterity ...)

There is also a possibility to provide some postprocessing fix-ups
(qb-blackbox to parse message catalogs and present resolved
messages [just Subjects unless verbose mode of operation requested]
accordingly, newly added qb-logfile-msg-resolve to perform the same
resolution on specially marked message IDs within the log message itself,
and so on).

Yet another remark: if nobody is to bite the structured logging bullet
and the MESSAGE_ID is deemed top-most urgent (I can't see why it'd be
the case, since it can only augment the insight into the log, only with
one particular sink and for hardly any SW in practice[*]), please, at
least consider space efficiency you'll get with carrying that ID as
a byte array rather then exploded string that is meant to represent
these bytes, which should be easy when structs and macros from
systemd/sd-id128.h are used.
The string form may be expanded only on-demand when actually used,
assuming the augmenting is concerning only events of particular interest
(i.e. rather rare ones), but either way, it's still a relatively fast operation
(SD_ID128_CONST_STR/sd_id128_to_string),

[*] I can see catalogs only from 2 packages/0.05% of all packages here:

$ rpm -qf /usr/lib/systemd/catalog/* | sort -u
dbus-broker-22-1.fc33.x86_64
systemd-247.3-1.fc34.x86_64
$ rpm -qa | wc -l
3691

aleksei-burlakov · 2021-03-01T09:54:32Z

Note that this is just a stranded instance of generalized structured
logging I had in mind with #286.

That's good! Now we start implementing it.

please, at
least consider space efficiency you'll get with carrying that ID as
a byte array rather then exploded string that is meant to represent
these bytes

Are you sure this is a good idea? The string could be greped and they are easier to read.

chrissie-c · 2021-03-01T09:57:00Z

@aleksei-burlakov Don't worry about poki's comment.

The message-id parameter will enable systemd catalogs. To enable message-id's the libqb should be configured with the --enable-systemd-journal option.

jnpkrn · 2021-03-01T10:23:48Z

@aleksei-burlakov

Are you sure this is a good idea? The string could be greped
and they are easier to read.

You can grep it in the accompanying catalog files, that should
be fairly enough, shouldn't it? To connect the dots with code,
shall it be ever needed, you just find the occurrences of where
you set the identifier (either by MESSAGE_ID or by the particular
dedicated API function), chances are you will discern the ID in
question just per the first byte.

Do you have any more concerns?

@chrissie-c

Please assume good intentions, toxicity doesn't help.

aleksei-burlakov · 2021-03-01T10:35:12Z

@aleksei-burlakov

Are you sure this is a good idea? The string could be greped
and they are easier to read.

You can grep it in the accompanying catalog files, that should
be fairly enough, shouldn't it? To connect the dots with code,
shall it be ever needed, you just find the occurrences of where
you set the identifier (either by MESSAGE_ID or by the particular
dedicated API function), chances are you will discern the ID in
question just per the first byte.

Do you have any more concerns?

I don't think it would bring any space efficiency. Those message-ids are aimed to simplify debugging. I would be glad to see the message-id immediately together with the message itself in one line or on the next one, so that I could do like $ grep "f77379a84" -A1 and this would show me all info I want. Rather than grep it twice.

chrissie-c · 2021-03-01T12:50:07Z

test this please

aleksei-burlakov · 2021-03-01T13:52:14Z

The coverity scan doesn't seem to find new vulnerabilities.

chrissie-c · 2021-03-01T14:38:48Z

It's finding old one because covscan was updated last week I think :)

chrissie-c · 2021-03-01T15:59:20Z

Merged thanks. I'll do a release a little later this week to get the other new stuff out to people

aleksei-burlakov · 2021-03-01T18:12:27Z

Thanks to everyone who participated!

jnpkrn · 2021-03-11T00:26:21Z

I believe a wider context is owed since otherwise it may look like just
a stern break-in to just about to be finished PR -- sorry about the
timing, didn't check earlier.

I admit being too terse about my main concern of a premature single-use
API extension that is superseded on arrival with a more general concept
that shall be seriously considered if libqb doesn't want to become
itself obsolete for its unwillingness to follow the present world's
requirements on _manageable logging. That's why I've opened #286
back then, to collect feedback for what I still see is rather important
once the basic goals for logging (correctness, maxed out efficiency
slash predictable run-time [linker assisted static callsites was all
about this], reasonable basic flexibility) were practically reached
(long before any of my contributions).

It is specifically this striking disproportion between the gain with
this particular single-use (single purpose) change and the one from
enabling libqb users to deal with the whole class of structured
logging at once, which I think was accepted too carelessly [*]
{rather recklessly towards the future maintenance and cognitive load
imposed on library users, especially if/when the generalization is
eventually put in place, then this green flag would prove
short-sighted).

Why? Simply because it seems (to me) that MESSAGE_ID signaling in
the journal is just a minutia to help mostly inexpert users in rather
a narrow spectrum of cases when you want to provide some guidance,
especially to a non-English speaking person. AFAIK, the respective
effect is nowhere to be found in a non-customized execution of
systemd/journal management tools, so if you insist on user getting to
know of the recommended steps, you'd better emit that message as part
of the standard log, anyway. As I demonstrated, hardly any SW beyond
systemd suite itself uses it, but of course, this may be just because
of a lack of marketing or something other beyond the usability concern.

[*] EDIT:
: wasn't until now I've noticed Ken's spotted a possible
: conflict (hence counter-balancing the (lack of) prudence here, thanks),
: cannot hence but to suggest topmost discretion to the rest when
: there are any hesitations on his side, there's a merit into this
: (beside, there's a distinction between what and how)

Also note how the current support of MESSAGE_ID is self-limited,
without documenting this fact ahead of time (it will be realized
shortly, but nonetheless) -- catalog files to be combined with the
respective emitted IDs cannot currently source arbitrary interpolation
parameters (fields) since libqb offers no way to propagate them
(of course, if it could, it certainly wouldn't need to separate it as
an antisystemic standalone variable to be passed, but a universal
bag-of-parameters solution would be used, obviating the need for that
special-cased API, to start with).

Such a disaster, you may cynically say, but the problem will get real
and messy if libqb eventually gains that blessed ability to pass full
parameter vectors in the logging, since libqb's client code will start
to rely on that incl. in catalogs where various custom parameters will
started to be planted for interpolation ... and then it won't work as
nicely when such a piece of SW is not used in conjuction with new enough
libqb (perhaps the one in its current shape if there is no SONAME bump
otherwise).

This adds into the future burden that may be ahead, and could be
avoided with a careful and responsible functionality planning.

I think that libqb is already systemic and internally structured well
enough, e.g., with its tag behind-the-scenes annotations, so to kick off
at least something debatable as a realistic action plan, a very "easy"
route towards the goal of structured logging, which admittedly neglects
a lot from the whole picture, notably non-journal logging backends
(I've meant to point "ringbuffer" out specifically before, since it's
a nice fit here for being binary, as opposed to a possible textual
lines clutter), might be to offer the library user a way to register
"structured logging fields fetcher" callbacks per the particular bit
in tag (similar to what qb_log_tags_stringify_fn_set allows, but
enforcing the decision table being visible by libqb directly for
a definedness test efficiency, low error-proneness regarding memory
management, and overall ease of use [see also further example]).

/* -1 denotes error, -2 and less denotes the fetcher asks libqb's
   runtime to extend the buffer resulting in 2+ cell pairs available,
   and subsequently to call it again; positive denotes how many cell
   pairs have actually been used out of it; libqb's runtime will always
   call it with at least one cell pair available */
typedef size_t (qb_log_tag_struct_fetcher*)(struct qb_log_callsite *cs, va_list msg_interpolation_vars,
                                            void *passaround_data,
                                            char *inprep_structured_log_buf[], size_t cell_pairs_available);

/* also return previously registered fetcher */
qb_log_tag_struct_fetcher * qb_log_tag_struct_fetcher_register(unsigned char which_tag_bit,
                                                               qb_log_tag_struct_fetcher *fn,
                                                               void *passaround_data);

User would then dedicate a particular bit in tag to register
a callback that will simply prepare a string vector (designated
cell pair above) consisting of MESSAGE_ID and
<128 bit as hex string> when an eligible backend (journal)
is selected.

Yes, I don't like this design too much either :-)
But not to say it's totally the worst. It resembles this style of use:
https://github.com/ClusterLabs/libqb/blob/v2.0.3/examples/simplelog.c#L98-L112

It'd be preferred if a wider design sweep was made, considering
interoperability with remaining backends (and also amongst versions),
level of customizations at the libqb's client side, yet again the
mentioned efficiency (the above is weak here), etc.

Anyway, if you want to add a special treatment for MESSAGE_ID with
this provision already in place, instead of new bloating API,
you need to do something roughly like this:

#define MY_DEDICATED_MSGID_BIT 1 << 4;

static const char my_msgid_field = "MESSAGE_ID";

size_t my_msgid_fetcher(struct qb_log_callsite *cs, va_list msg_interpolation_vars,
                        void *passaround_data,
                        char *inprep_structured_log_buf[], size_t cell_pairs_available)
{
	/* msg_interpolation_vars can inform further metadata assignments */
	size_t consumed = 0;
	const char *msgid = get_msgid_for_cs(passaround_data, cs);
	if (msgid) {
		assert(cell_pairs_available >= 2);
		inprep_structured_log_buf[consumed++] = my_msgid_field;
		inprep_structured_log_buf[consumed++] = msgid;
	} else {
		fprintf(stderr, "MY_DEDICATED_MSGID_BIT tag contract violated/n");
	}
	return consumed;
}

qb_log_tag_struct_fetcher_register(MY_DEDICATED_MSGID_BIT, my_msgid_fetcher, msgid_table);

Finally, I hope to cast another perspective into richer logging
machine-friendly structures that will hopefully be motivating, since
it demonstrates how to elegantly and once for all resolve the
never precize second-guessing and especially user facing problem.

It leaves a lot to be expected/makes for an inferior user experience
when the sole responsibility for not leaking secrets while retaining
meaningful data for various post-mortem analyses is left as an exercise
to the user ... go figure what might be of secrecy for you, we may,
poorly, over-approximate it with regexps like .*pass.* (as for the
case at some point in the past) for you but that's about all we have
to offer, rest is up to you.

This was (and perhaps still is) a topic for pacemaker, e.g.
ClusterLabs/pacemaker#962

Now, with, e.g., a qualified help of proper metadata annotations
from particular agents, pacemaker could have a clear view what's
considered secret, and hence mark respective possibly tainted log
items as also sensitive/requiring proper care.

There is a standard for such marking being considered in systemd:
systemd/systemd#6432 (comment)
(under SECRECY_LEVEL field) so this would not be a complete offshoot.

(Of course, there could also be a more dynamic behaviour of marking
what constitutes some form of secrecy directly by the user,
borrowing the mechanics from a similar PCMK_trace* scheme
from the pacemaker's config.)

The point is, the sensitive lines will be marked as such on the go,
nothing will get suppressed from the eyes of the administrators, yet
the process of sharing collected data will be less prone to unintended
leaks. All thanks to embedding out-of-band metadata, i.e.,
structured logging.

Try to think about opportunities like this.

Sincerely hope we are on the same page on my best wishes to the project.

jnpkrn · 2021-03-11T12:00:40Z

@aleksei-burlakov

I don't think it would bring any space efficiency.

Literally speaking (but my intention was to rather touch efficiency
in general, see below, this is just to respond in a similar nit-picking
fashion): depending on the implementation details, very likely the
storage overhead is around 60% compared to compact byte representation.

But has practical aspects as well.

You need to realize that while HEX_STRING -> BYTES is indeed
a bijection (one domain perfectly mapped to the other and vice versa),
STRING -> BYTES is not (since STRING -> BYTES is not a surjection).

So if you start with STRING, you don't get a cheap (possibly
compile-time staticly bound) conversion to BYTES, since you need to
validate STRING along, indicate errors to be later checked for, etc.
Unlike with BYTES to STRING, and I've pointed to a macro that will
do exactly this conversion for you at compile-time if needed.

Hence, the conclusion here shall be that the BYTES representation is
the most universal ("source") from which it makes sense to start with,
even with space efficiency put aside.

It looks to me that dev's convenience aspect is at least mildly
overexaggerated (as I think was said single-use change driver here)
and possibly anecdotal as long as we talk about C codebases, something
that's down to bare bones, which leaves you on your own with everything.
Oh, and there's as a well little difference between grep af12
and grep af,12 when macros tuned for this legibility are used.

One needs to properly weigh in all aspects of data handling ergonomics.
If, for instance, there is a new format modifier for a log file sink
that instructs libqb to emit a full catalog message in there, it can
easily call sd_journal_get_catalog_for_message_id with the data
representation it immediately has when starting from BYTES.
Starting from STRING means calling a conversion routine first
(e.g., sd_id128_from_string), checking for the possible conversion
errors -- since one cannot be sure there's no accidental problem
in the original static string of origin in the code -- and only then
to pass to the stated function).

Moreover, you can be reasonably sure that you won't introduce such
an encoding issue with the structured bytes approach as type-checked
by the compiler during the build process.

Hopefully it clarifies why I think it'd be optimal to select BYTES
as the main/source representation, void of a convincing argument
(not refuting they do exist). Yep, systemd itself also does so.

aleksei-burlakov · 2021-03-11T12:14:32Z

@aleksei-burlakov

I don't think it would bring any space efficiency.

Literally speaking (but my intention was to rather touch efficiency
in general, see below, this is just to respond in a similar nit-picking
fashion): depending on the implementation details, very likely the
storage overhead is around 60% compared to compact byte representation.

But has practical aspects as well.

You need to realize that while HEX_STRING -> BYTES is indeed
a bijection (one domain perfectly mapped to the other and vice versa),
STRING -> BYTES is not (since STRING -> BYTES is not a surjection).

So if you start with STRING, you don't get a cheap (possibly
compile-time staticly bound) conversion to BYTES, since you need to
validate STRING along, indicate errors to be later checked for, etc.
Unlike with BYTES to STRING, and I've pointed to a macro that will
do exactly this conversion for you at compile-time if needed.

Hence, the conclusion here shall be that the BYTES representation is
the most universal ("source") from it makes sense to always start with,
even with space efficiency put aside.

It looks to me that dev's convenience aspect is at least mildly
overexaggerated (as I think was said single-use change driver here)
and possibly anecdotal as long as we talk about C codebases, something
that's down to bare bones and which leaves on your own with everything.
Oh, and there's as a well little difference between "grep af12
and grep af,12" when macros tuned for this legibility are used.

One needs to properly weigh in all aspects of data handling ergonomics.
If, for instance, there is a new format modifier for a log file sink
that instructs libqb to emit a full catalog message in there, it can
easily call sd_journal_get_catalog_for_message_id with the data
representation it immediately has when starting from BYTES.
Starting from STRING means calling a conversion routine first
(e.g., sd_id128_from_string), checking for the possible conversion
errors -- since one cannot be sure there's no accidental problem
in the original static string of origin in the code -- and only then
to pass to the stated function).

Moreover, you can be reasonably sure that you won't introduce such
an encoding issue with the structured bytes approach as type-checked
by the compiler during the build process.

Hopefully it clarifies why I think it'd be optimal to select BYTES
as the main/source representation, void of a convincing argument
(not refuting they do exist). Yep, systemd itself also does so.

Please consider this.

When you pass a string as a parameter, you pass only an address to the string, that is 64 bits only. (time efficiency)
If the string is a constant it would usually be stored in the heap and not litter the stack. (memory is ok)
If you pass the bytes-id you would have to convert it to the string anyway, just shortly before sending the message. (time-consumption).
String is grepable. (usability) (Though, one could add the corresponding string in comments)

jnpkrn · 2021-03-11T12:47:41Z

in this regard, why a byte array couldn't be treated by-ref as well
if needed? this is orthogonal
constant strings are usually in constant data section, not heap,
actually, but why do you bring this up?
yes, in case it is actually needed, it indeed has to be converted,
but that's really cheap as I mentioned in my first comment, and
various tricks exist if that would appear to be a bottleneck
(remember advice about premature optimization); one such can be
using a hash table with already evaluated strings behind the
get_msgid_for_cs function sketched above
did you read my mention of grep above? just a few bytes is needed
if ID generator in systemd is not broken, this is straw in my eyes

Nonetheless, I see no point in following up on this conversation.
Tried to sum up some reasonings and explanations, it is up here if
anyone is interested. I certainly don't steer this boat, though
I'd like to see it floating steadily in about right direction.

https://build.opensuse.org/request/show/924180 by user yan_gao + dimstar_suse - Update to version 2.0.3+20210303.404adbc (v2.0.3): - syslog: Add a message-id parameter for messages (gh#ClusterLabs/libqb#433) - timers: Add some locking (gh#ClusterLabs/libqb#436) - ipcc: Have a few goes at tidying up after a dead server (gh#ClusterLabs/libqb#434) - strlcpy: Check for maxlen underflow (gh#ClusterLabs/libqb#432) - doxygen2man: fix printing of lines starting with '.' (gh#ClusterLabs/libqb#431) - doxygen2man: ignore all-whitespace brief descriptions (gh#ClusterLabs/libqb#430) (forwarded request 924179 from yan_gao)

kgaillot reviewed Jan 8, 2021

View reviewed changes

aleksei-burlakov force-pushed the message-id branch from c775613 to 48b9bcf Compare January 19, 2021 17:06

aleksei-burlakov force-pushed the message-id branch from 48b9bcf to b76721f Compare January 22, 2021 15:14

aleksei-burlakov force-pushed the message-id branch from b76721f to df4236d Compare February 9, 2021 20:26

aleksei-burlakov force-pushed the message-id branch from df4236d to 94f352e Compare February 19, 2021 15:16

aleksei-burlakov force-pushed the message-id branch from 94f352e to 44cc52f Compare February 24, 2021 08:46

aleksei-burlakov force-pushed the message-id branch from 44cc52f to bf7e87f Compare February 24, 2021 11:26

kgaillot reviewed Feb 26, 2021

View reviewed changes

syslog: Add a message-id parameter for messages

a407a31

The message-id parameter will enable systemd catalogs. To enable message-id's the libqb should be configured with the --enable-systemd-journal option.

aleksei-burlakov force-pushed the message-id branch from bf7e87f to a407a31 Compare March 1, 2021 10:21

chrissie-c merged commit aae7a0a into ClusterLabs:master Mar 1, 2021

jnpkrn mentioned this pull request Mar 11, 2021

[RFE] Add support for structured logging to reflect the shift towards machine log processing #286

Closed

Add a message-id parameter for messages #433

Add a message-id parameter for messages #433

Conversation

aleksei-burlakov commented Jan 7, 2021

knet-ci-bot commented Jan 7, 2021

kgaillot left a comment

Choose a reason for hiding this comment

kgaillot Jan 8, 2021

Choose a reason for hiding this comment

chrissie-c commented Jan 13, 2021

aleksei-burlakov commented Jan 21, 2021 • edited Loading

kgaillot commented Jan 21, 2021

aleksei-burlakov commented Jan 21, 2021

kgaillot commented Jan 21, 2021

aleksei-burlakov commented Jan 22, 2021

chrissie-c commented Jan 28, 2021

chrissie-c commented Jan 28, 2021

aleksei-burlakov commented Feb 9, 2021

chrissie-c commented Feb 10, 2021

chrissie-c commented Feb 10, 2021

aleksei-burlakov commented Feb 12, 2021

kgaillot commented Feb 16, 2021

chrissie-c commented Feb 17, 2021 • edited Loading

chrissie-c commented Feb 19, 2021

aleksei-burlakov commented Feb 19, 2021

aleksei-burlakov commented Feb 24, 2021

chrissie-c commented Feb 24, 2021

kgaillot left a comment

Choose a reason for hiding this comment

kgaillot Feb 16, 2021

Choose a reason for hiding this comment

aleksei-burlakov Mar 1, 2021

Choose a reason for hiding this comment

kgaillot Feb 16, 2021

Choose a reason for hiding this comment

aleksei-burlakov Mar 1, 2021

Choose a reason for hiding this comment

kgaillot Feb 26, 2021

Choose a reason for hiding this comment

aleksei-burlakov Mar 1, 2021

Choose a reason for hiding this comment

jnpkrn commented Feb 28, 2021 • edited Loading

aleksei-burlakov commented Mar 1, 2021

chrissie-c commented Mar 1, 2021

jnpkrn commented Mar 1, 2021 • edited Loading

aleksei-burlakov commented Mar 1, 2021

chrissie-c commented Mar 1, 2021

aleksei-burlakov commented Mar 1, 2021

chrissie-c commented Mar 1, 2021

chrissie-c commented Mar 1, 2021

aleksei-burlakov commented Mar 1, 2021

jnpkrn commented Mar 11, 2021 • edited Loading

jnpkrn commented Mar 11, 2021 • edited Loading

aleksei-burlakov commented Mar 11, 2021

jnpkrn commented Mar 11, 2021 • edited Loading

aleksei-burlakov commented Jan 21, 2021 •

edited

Loading

chrissie-c commented Feb 17, 2021 •

edited

Loading

jnpkrn commented Feb 28, 2021 •

edited

Loading

jnpkrn commented Mar 1, 2021 •

edited

Loading

jnpkrn commented Mar 11, 2021 •

edited

Loading

jnpkrn commented Mar 11, 2021 •

edited

Loading

jnpkrn commented Mar 11, 2021 •

edited

Loading