-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(trace): use scope in logger #345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of this stuff isnt really scoped well - eg, gossip has output like:
[gossip.service.GossipService] time=2024-11-01T19:20:08.072Z level=debug from_endpoint=139.178.94.143:8001 from_pubkey=5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on gossip: recv ping
time=2024-11-01T19:20:18.085Z level=info gossip: recv 4494: 0 ping, 1 pong, 0 push, 0 pull request, 4493 pull response, 0 prune
accountsdb has things like
time=2024-11-01T19:44:57.135Z level=info n_threads_snapshot_load: 12
time=2024-11-01T19:44:57.135Z level=info using ram memory for account index
[accountsdb.db.AccountsDB] time=2024-11-01T19:44:57.297Z level=info loading from snapshot...
[accountsdb.db.AccountsDB] time=2024-11-01T19:44:57.297Z level=info found 405721 account files
as an aside, im thinking the type approach might be safer but restricts the usage and isnt very easy to use/change -- wdyt about changing it to be just ScopedLogger
(with no type saftey) -- would also save us from having .unscoped()
everywhere for general functions
Yeah, this is due to the approach I took to only scope the loggers that are on the struct state. As for those it is clear what the scope is. Some of the places you mentioned are either from functions without state, or init functions. I'll go over them again to add the scope to places it makes sense.
I did notice having to call |
43f6eda
to
4d4ed66
Compare
This is now
This is now
|
My view is that the only imposed restriction is that we're forced to do things that we already want to do. It only annoys you to let you know that you made a mistake.
The only time you need to call |
@dnut - not really, one thing we do often is print progress results (using |
@0xNineteen @dnut for this (and also for |
I think anytype makes sense (that seems to be the only way to do it unfortunately) - hopefully we dont run into too many of these methods and have anytypes everywhere |
The case where you want to inherit scope from another context is something I had not considered. But it seems unusual. Why wouldn't you want the scope to indicate where the actual log message comes from? The only case I can think of is if the code calling the logger is actually intended to be an extension to the logger itself, as in I still think it would be preferable for the default behavior to be each scope having its own distinct logging scope, and inheriting scope should be explicit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great to see. The logs are so much better with scopes.
I might just make a few tweaks here and there. I think some of the scopes can be broader, which would simplify things and reduce the amount of unscoped
, withScope
, and anytype
usages. I'd be cautious about using function-scoped loggers, since the names can be confusingly specific/vague, especially if the scope is just @src().fn_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
something i noticed yew do is use this format:
-- showing the scope after the info and with circle brackets -- i kinda pref it like that but curious what others think too |
To make it more concrete, in our case, instead of
we would then have
Both actually works for me, so no strong preference for one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nice work.
Regarding the log format, I personally prefer the time to be the first thing in the log line. So I would be happy if you want to move the scope to the right. I'm indifferent about parentheses vs brackets. These changes can come in a separate pr.
I think you need to sign some of your commits before github will let this merge. It'll require a force push and re-approval. |
fix(snapshot-download): url (#355) test(ledger): cleanup database testing (#353) * fix database testing * move `std.testing.allocator` usages into test blocks --------- Co-authored-by: Drew Nutter <dnut@users.noreply.github.com> fix(shred-collector): shred receiver metrics not being counted (#356) The following metrics were not being incremented: - shred_receiver_received_count - shred_receiver_satisfactor_shred_count This pr fixes it by incrementing them in the appropriate places. Fix after merge Scope MerkleRootValidator Remove unused import Scoping functions not attached to structs Update runService to take logger as anytype Update printTimeEstimate to take logger as anytype Remove unused imports Scope init function of AppBase to AppBase Scope RocksDB refactor(ledger): rename roots and orphans to rooted_slots and orphan_slots (#357) improve(benchmarks): support multiple values + visualizing runtimes (#316) * build: add a `no-run` option Useful for when you need to build a specific executable, but not install it. * speed up the snapshot downloader * bench: add verifySnapshot benchmark * download: return an error instead of panicing in `writeCallback` * free `loading_threads` if `AccountsDB.init` fails * benchmark progress * feat: csv benchmark outputs * fix: lint and remove unused * more fixes * add variance to single output * fix: path allocations * re-enable read/write benchmarks * fix: formatting * fix: column headers * fix: leaks in read/write test * fix benchmarks * ci fix: dont run all the benchmarks * attempt fix: OOM ci * CI check * fix leak in CI * fix and identify leaks * fix lint * feat: start agg print results * feat: add option to output all runtimes * remove extra bench * add script to view benchmark runtimes * more improvements on scripts * fix: update script for different lengths * fixes * fix formatting * add docs * fix: lint * reduce logger memory * fix script * remove commented out code * fix tests / build after merge * output data to files * support output for humans * improve formatting * support ledger benchs * merge cache changes * remove trailing commas * fix test leak * fix another test leak * address comments * support custom time units per benchmark * fix field output * fix post merge * fix lint * add back bincode benchs * change to switch and use tmpdir * remove assert type alias * address more comments * remove account_ref defaults * lower case enums * more comments * fix tests --------- Co-authored-by: David Rubin <daviru007@icloud.com> Remove unused import feat(ci) add line size checking to the CI (#360) * add line size checking to the CI * add exclude list + format the python scripts * lexicographically order the exclude list * add missing exclude files feat(ledger): bench getting code shreds (#354) * fix(ledger): backward merkle root chaining bugs It was ignoring result of checking chaining, and just always marked it as a duplicate (as if the check failed), and always returned true (as if the check succeeded). complete nonsense It was comparing chained root to chained root, which is wrong. it should be chained to actual root This change also refactors common logic into a single place. Also this now returns an error if the merkle root is missing, because that means there is a severe bug in the validator and it needs to be shutdown and patched immediately. * Update src/ledger/shred_inserter/merkle_root_checks.zig Co-authored-by: InKryption <59504965+InKryption@users.noreply.github.com> * fix(ledger): zig fmt * refactor(ledger): merkle root validator struct the four methods all have basically the same dependencies so this bundles them together to clarify their purpose and simplify their expression * refactor(ledger): use working stores to clarify and minimize dependencies, and to organize working state code * refactor(ledger): in merkle root checks, rename early and late shreds to old and new shreds * Added code shreds * Add BlockstoreReader.getCodeShred * Bench reader.getCodeShredsForSlot --------- Co-authored-by: Drew Nutter <dnut@users.noreply.github.com> Co-authored-by: InKryption <59504965+InKryption@users.noreply.github.com> feat(accountsdb,gossip): Initial snapshot propagation work (#333) * Refactor latest snapshot info state 1. Guard the full and incremental snapshot info under the same rwmux, both to deduplicate the full slot value, and to synchronize the existence of incremental snapshots relative to full snapshots. 2. Also do the same with the first snapshot info. * Improve, simplify, and fix some bincode code * Minor significant gossip/data rework & misc * Pass const slice instead of arraylist * Remove unused import * Big refactor on accountsdb fields and init & misc * Grouped fields into a few categories. * Change init to take only an `InitParams` struct. * Lowercase `AllocatorConfig` fields. * Share the `AllocatorConfig` enum tag across a few different places. * Inline `InitConfig`'s fields and remove it. * Fix code that potentially leaked a disk allocator pointer. * Notify gossip of new full snapshot * Re-group and alphabetize type aliases * Remove pointless `file_size` field * Fix `writeSnapshotTarWithFields` It was incorrectly reporting `file_info.length` as the file size, when it should have been using `account_file.memory.len`. This also adds a couple more runtime checks to assert the total written bytes are aligned to full 512 byte blocks. * Simplify allocation scheme in untar procedure * Set up test for snapshot gossip advertisement It's currently disabled because it overwrites the snapshot archives in the test data dir; can comment the skip statement in order to test it locally for now. As the TODO says, we need a mechanism for loading from a snapshot which isn't in the accountsdb's snapshot working directory. This also adds a simple way to test the equality of the `SnapshotHashes` structure in tests. * Re-simplify SnapshotHashes incremental list The potential memory savings were minimal at best, and came at a significant cost in complexity - a simple tagged union is sufficient to avoid the overhead of an allocation, whilst retaining simplicity across the rest of the code (especially wrt generic comparisons). * Defer gossip service shutdown Sequence it before deinit * Use workaround in gossip snapshot update unit test In the future there should be a better API for loading from a snapshot that's outside the accountsdb snapshot working directory. * Fixes possible invalid deinit * Remove unused imports * Remove unused default field values * Improve loading thread code & safety. * Various naming improvements * Gossip push queue & signing refactor * Refactor `initSigned`, and related init methods, and call sites. * Move methods into more scoped types. * Flatten the `Gossip` struct into `AccountsDB`, and only store `my_pubkey`, instead of the whole key pair. * Change `push_msg_queue` to queue unsigned gossip data, and then sign it in `drainPushQueueToGossipTable`. * Set the wallclock time in `drainPushQueueToGossipTable` as well. * Remove useless debug field * Affirm and improve warning logs * Put pubkey and queue back into combined optional * Remove old commented code, adjust comparison fix(scripts): unused import regex and misc refactoring (#362) * fix(scripts): check_style.py regex issues, avoid early exit, and refactoring * refactor(scripts) rename check_style.py to style.py since it's not just for checking, it does autofix as well. --check is a cli option and confusingly redundant with the filename * refactor: remove unused imports I noticed the regex for detecting unused imports was changed in a recent PR. This broke the detection of unused imports. The change made it so you would only detect unused lines that explicitly say `@import("...")`. But those are not the only lines we want to check. We also want to check for things like `const Thing = sig.utils.Thing;`. The recent change also broke detection of identifiers with numbers and underscores. So I fixed these issues. I also made the regex slightly better at detecting file paths within `@import` lines by including `/` as a valid character. This identified few more unused imports which I also removed in this PR. I consolidated the code to a main function so it's clearer what's actually running when you run the script. I used return values to indicate failed checks, instead of directly calling `exit(1)` within the check function. That way it always runs all the checks, instead of exiting early. I renamed the file to `style.py`. `check_style.py` is misleading because it's not just a check. By default this script will automatically edit the code to adjust its style. Only if you provide the `--check` flag does it limit itself to checks. At that point, the name is redundant with the flag. I see "style" as analogous to zig's "fmt" subcommand. By default `zig fmt` will autoformat the code, and then it limits itself to checks if you use `--check`. fix(gossip): send pull requests with ContactInfo instead of LegacyContactInfo (#374) perf(shred-collector): cache verified merkle roots (#376) This improves the performance of signature verification in the shred collector by a factor of up to 64. The shred signature signs the merkle root. Since all the shreds from an erasure set share the same merkle root, you actually only need to verify the signature once for that merkle root, then every shred from the erasure set should be considered "verified" as long as you can reproduce the merkle root from its merkle proof. Before this pr, the code was redundantly verifying the signature of the merkle root for every single shred. This leads to a performance bottleneck in the shred verifier because signature verification is computationally expensive. This change removes the redundancy. It uses an lru cache to keep track of the last 1024 merkle roots to be verified. add fix (#377) feat(ledger): user-selectable db, hashmapdb fixes, add tests, safer memory management (#372) * feat(ledger): user-selectable db, hashmapdb fixes, add tests, safer memory management * fix(ledger): hashmapdb thread safety problem when copied * fix(ledger): potential deadlock in hashmap db * refactor(allocators): remove dead code noAlloc feat(bincode): config default values (#380) refactor(merkle_tree): general purpose nested list (#378) feat(shred-collector): add metrics and grafana dashboard (#375) * feat(shred-collector): distinguish turbine vs repair packets in metrics * feat(grafana): shred collector dashboard refactor(bincode): file_id config (#379) * refactor(bincode): file_id config * move config into file_id.zig feat(ci) start correcting files to fit the new style guide (#361) * start correcting files to fit the new style * continue formatting files * remove non-style change * some more refactors * fix some try precedence issues fix(accountsdb): inf loop (#383) Remove unused import scope runService logger to ServiceManager Un-scope logger on ShredCollectorDependencies docs(research): add lmdb/accountsdb writing to docs/research (#390) * add lmdb details to docs/research feat(dashboards): add hardware metrics (#392) fix(rocksdb): zig build fails on linux intel systems (#394) This upgrades rocksdb to a new commit from our fork of rocksdb. That commit includes some fixes, and upgrades rocksdb from 9.5 to 9.7. ---- Normally, I build sig for my work machine, which is a macbook, and it builds fine. I tried to build sig on my personal laptop and my desktop, and it did not build on either system. Both are running linux with intel cpus (tiger lake and arrow lake). There are two errors. Both are failures to build rocksdb. ---- The first error is due to incorrect syntax from our commit to our fork of rocksdb. ``` /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/port/lang.h:79:16: error: extra tokens at end of #ifdef directive ^ /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:18:10: note: in file included from /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:18: ^ ``` Commit introducing the bug: facebook/rocksdb@a9c9eee Fix I committed today: facebook/rocksdb@7c32e83 ---- The next error is due to a bug in zig. Zig should be able to detect that crc32 is supported by any architecture supporting sse4, but it does not. So zig fails to activate the crc32 feature even though it is available. Meanwhile, RocksDB simply checks to see if sse4 is supported, then it proceeds to use crc32 functions. The error occurs because zig has failed to activate the crc32 feature, despite it being supported by the cpu. ``` /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:611:13: error: always_inline function '_mm_crc32_u64' requires target feature 'crc32', but would be inlined into function 'crc32c_3way' that is compiled without support for 'crc32' CRCtriplet(crc, next, -128); ^ /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:432:12: note: expanded from macro 'CRCtriplet' crc##0 = _mm_crc32_u64(crc##0, *(buf##0 + offset)); \ ``` To be more precise, RocksDB should check for crc32 itself, instead of checking for sse4, since it's actually only interested in whether crc32 is supported. So I committed that change here: facebook/rocksdb@d879aa8 This avoids the compilation error because the compiler thinks crc32 is not supported, so it will not attempt to use the code without activating it. Ideally the zig compiler should also be fixed so we can actually use crc32 on platforms that support it. feat(allocator): add recycle buffer (#381) feat(accountsdb): integrate improved reference allocator (#382) * private recycle method * add unsafe suffix to unsafe methods * feat(accountsdb): improved reference allocator improve(benchmarks): better cli parsing (#391) * improve(benchmarks): better cli parsing * update usage to by dynamic with enum Use same scope identier in logger in cleanup_service.zig Scope to rockdb in rocksdb.zig remove usage of @src().fn_name in log scope
45affc0
to
ffe7495
Compare
This PR adds scope to logger in appropriate places.
The kinds of scoping that was added are:
@src().fn_name
anytype
this is because, without that, scoping to@src().fn_name
will produce this output:using anytype, the output becomes
anytype
. This is to allow the scope to be set by the caller (instead of it being set within the function)TODO