Skip to content

Commit

Permalink
Mas i370 patch e (#385)
Browse files Browse the repository at this point in the history
Improvement to monitoring for efficiency and improved readability of logs and stats.

As part of this, where possible, tried to avoid updating loop state on READ messages in leveled processes (as was the case when tracking stats within each process). 

No performance benefits found with change, but improved stats has helped discover other potential gains.
  • Loading branch information
martinsumner authored Dec 16, 2022
1 parent 7c9904b commit 0c337b8
Show file tree
Hide file tree
Showing 21 changed files with 1,896 additions and 1,597 deletions.
6 changes: 6 additions & 0 deletions docs/STARTUP_OPTIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,9 @@ There are two snapshot timeouts that can be configured:
These set the period in seconds before a snapshot which has not shutdown, is declared to have been released - so that any file deletions which are awaiting the snapshot's completion can go ahead.

This covers only silently failing snapshots. Snapshots that shutdown neatly will be released from locking deleted files when they shutdown. The 'short' timeout is used for snapshots which support index queries and bucket listing. The 'long' timeout is used for all other folds (e.g. key lists, head folds and object folds).

## Statistic gathering

Leveled will gather monitoring statistics on HEAD/GET/PUT requests, with timing points taken throughout the store. These timings are gathered by the `leveled_monitor`, and there are three configuration options. The two primary options are: `stats_percentage` is an integer between 0 and 100 which informs the store of the proprtion of the requests which should be timed at each part; and `stats_logfrequency` which controls the frequency (in seconds) with which the leveled_monitor will write a log file (for one of the stats types in its queue).

The specific stats types logged can be found in the ?LOG_LIST within the leveled_monitor. If a subset only is of interest, than this list can be modified by setting `monitor_loglist`. This can also be used to repeat the frequency of individual log types by adding them to the list multiple times.
12 changes: 8 additions & 4 deletions include/leveled.hrl
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,17 @@
binary_mode = false :: boolean(),
sync_strategy = sync,
log_options = leveled_log:get_opts()
:: leveled_log:log_options()}).
:: leveled_log:log_options(),
monitor = {no_monitor, 0} :: leveled_monitor:monitor()}).

-record(sst_options,
{press_method = native
:: leveled_sst:press_method(),
log_options = leveled_log:get_opts()
:: leveled_log:log_options(),
max_sstslots = 256 :: pos_integer(),
pagecache_level = 1 :: pos_integer()}).
pagecache_level = 1 :: pos_integer(),
monitor = {no_monitor, 0} :: leveled_monitor:monitor()}).

-record(inker_options,
{cdb_max_size :: integer() | undefined,
Expand All @@ -73,7 +75,8 @@
singlefile_compactionperc :: float()|undefined,
maxrunlength_compactionperc :: float()|undefined,
score_onein = 1 :: pos_integer(),
snaptimeout_long :: pos_integer() | undefined}).
snaptimeout_long :: pos_integer() | undefined,
monitor = {no_monitor, 0} :: leveled_monitor:monitor()}).

-record(penciller_options,
{root_path :: string() | undefined,
Expand All @@ -88,7 +91,8 @@
compression_method = native :: lz4|native,
levelzero_cointoss = false :: boolean(),
snaptimeout_short :: pos_integer() | undefined,
snaptimeout_long :: pos_integer() | undefined}).
snaptimeout_long :: pos_integer() | undefined,
monitor = {no_monitor, 0} :: leveled_monitor:monitor()}).

-record(iclerk_options,
{inker :: pid() | undefined,
Expand Down
19 changes: 16 additions & 3 deletions priv/leveled.schema
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,20 @@
hidden
]}.

%% @doc Statistic monitoring proportion
%% The proportion of requests to be convered by stats, an integer between
%% 0 and 100. There is no flow control, so setting this too high could
%% possibly overflow the leveled_monitor mailbox.
{mapping, "leveled.stats_percentage", "leveled.stats_percentage", [
{default, 10},
{datatype, integer},
{validators, ["range:0-100"]}
]}.




%% @doc Statistic log frequency (seconds)
%% The wait in seconds between logs from each leveled_monitor (there is one
%% monitor per vnode)
{mapping, "leveled.stats_logfrequency", "leveled.stats_logfrequency", [
{default, 30},
{datatype, integer}
]}.
18 changes: 18 additions & 0 deletions priv/leveled_multi.schema
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,25 @@
hidden
]}.

%% @doc Statistic monitoring proportion
%% The proportion of requests to be convered by stats, an integer between
%% 0 and 100. There is no flow control, so setting this too high could
%% possibly overflow the leveled_monitor mailbox.
{mapping, "multi_backend.$name.leveled.stats_percentage", "riak_kv.multi_backend", [
{default, 10},
{datatype, integer},
{validators, ["range:0-100"]},
hidden
]}.

%% @doc Statistic log frequency (seconds)
%% The wait in seconds between logs from each leveled_monitor (there is one
%% monitor per vnode)
{mapping, "multi_backend.$name.leveled.stats_logfrequency", "riak_kv.multi_backend", [
{default, 30},
{datatype, integer},
hidden
]}.



Loading

0 comments on commit 0c337b8

Please sign in to comment.