VAST 2020.09.30
We’re happy to announce the monthly release 2020.09.30 of VAST.
YAML Config
The VAST configuration file received a makeover: it now uses YAML syntax, the ops-friendly and industry standard. We ensured that the configuration and command line behave exactly the same by aligning the CLI hierarchy with the config file structure. VAST now looks for a vast.yaml
configuration file instead of vast.conf
. Every installation of VAST ships with a vast.yaml.example
file that illustrates the new layout and serves as a reference for documentation options.
During startup, VAST looks for configuration files in the following places, and merges their content with the more specific files taking a higher precedence:
<sysconfdir>/vast/vast.yaml
for system-wide configuration, where<sysconfdir>
is the platform-specific directory for configuration files, e.g.,/etc/vast
.~/.config/vast/vast.yaml
for user-specific configuration. VAST respects the XDG base directory specification and its environment variables.- The command line option
--config=path/to/vast.yaml
.
The top-level configuration file section vast
bundles all options affecting VAST. Similarly, the top-level section caf
contains all options that affect the underlying actor system framework CAF directly, allowing for more complex and sophisticated configurations.
Adding YAML support resulted in a new depency for VAST: yaml-cpp (≥0.6.2). This robust library provides a YAML 1.2 spec-compliant parser and printer, plus it enjoys wide availability on most platforms and package managers.
Index Optimizations
The layout of the on-disk data structures used for the index has changed. VAST divides the index state into horizontal partitions (aka. shards). Instead of creating one file per record field per partition, the index now creates only a single file per partition and dynamically maps the required parts into memory. Additionally, VAST no longer relies on the binary serialization protocol of CAF. Instead, a new FlatBuffers framing with better state versioning enables a reliable upgrade path when the on-disk format changes.
Moreover, VAST used to periodically re-write the whole state of the meta index to disk into a separate file. The rationale was that the contents of the meta index are much smaller than the contents of the index. However, for large databases even the much smaller meta index can grow to a size where this can disrupt disk I/O and slow down the indexing process. To prevent that, we’ve split up the information contained in the meta index and distributed it over all partitions, so every write is now limited to the incremental state since the previous partition.
Because I/O is such a delicate topic in data-intensive applications that must keep up with high-volume data sources, we also added a new asynchronous I/O abstraction to avoid blocking threads when they don’t have to. We’ve added a new filesystem actor that centralizes I/O operations, such as reads and writes. A nice side-effect is that it makes it dead-simply to support new filesystems in the future, e.g., HDFS or S3, by merely adding a new actor implementation that adheres to the same type-safe messaging API.
Better Introspection
We re-designed the output of the vast status
command in a push for a better user experience. vast status
now shows information about the system, grouped by its major components. By adding more flags, the command shows more details: vast status --detailed
offers slightly more context, and --debug
exposes a lot of internal state that is well-suited for developers.
Smaller Things
- The new
vast get <id> [ids...]
command enables direct queries to the archive. - The JSON export format now renders the VAST
duration
andport
as strings instead of numbers. - A new utility
lsvast
now ships with every VAST installation. It allows for inspecting the contents of the VAST database without running VAST.
Changelog Highlights
As always, you can find the full technical scoop of what changed in our changelog.
🎁 Features
- The output of the
status
command was restructured with a strong focus on usability. The new flags--detailed
and--debug
add additional content to the output. #995 - VAST now merges the contents of all used configuration files instead of using only the most user-specific file. The file specified using
--config
takes the highest precedence, followed by the user-specific path${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.yaml
, and the compile-time path<sysconfdir>/vast/vast.yaml
#1040 - VAST now ships with a new tool
lsvast
to display information about the contents of a VAST database directory. Seelsvast --help
for usage instructions. #863 - VAST now supports the XDG base directory specification: The
vast.yaml
is now found at${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.yaml
, and schema files at${XDG_DATA_HOME:-${HOME}/.local/share}/vast/schema/
. The user-specific configuration file takes precedence over the global configuration file in<sysconfdir>/vast/vast.yaml
. #1036
🧬 Experimental Features
- The
vast get
command has been added. It retrieves events from the database directly by their IDs. #938
⚠️ Changes
- All configuration options are now grouped into
vast
andcaf
sections, depending on whether they affect VAST itself or are handed through to the underlying actor framework CAF directly. Take a look at the bundledvast.yaml.example
file for an explanation of the new layout. #1073 - Data exported in the Apache Arrow format now contains the name of the payload record type in the metadata section of the schema. #1072
- The JSON export format now renders
duration
andport
fields using strings as opposed to numbers. This avoids a possible loss of information and enables users to re-use the output in follow-up queries directly. #1034 - The delay between the periodic log messages for reporting the current event rates has been increased to 10 seconds. #1035
- The global VAST configuration now always resides in
<sysconfdir>/vast/vast.yaml
, and bundled schemas always in<datadir>/vast/schema/
. VAST no longer supports reading a configuration file in the current working directory. #1036 - The options that affect batches in the
import
command received new, more user-facing names:import.table-slice-type
,import.table-slice-size
, andimport.read-timeout
are now calledimport.batch-encoding
,import.batch-size
, andimport.batch-timeout
respectively. #1058 - The persistent storage format of the index now uses FlatBuffers. #863
- The prioprietary VAST configuration file has changed to the more ops-friendly industry standard YAML. This change introduced also a new dependency: yaml-cpp version 0.6.2 or greater. The top-level
vast.yaml.example
illustrates how the new YAML config looks like. Please rename existing configuration files fromvast.conf
tovast.yaml
. VAST still readsvast.conf
but will soon only look forvast.yaml
orvast.yml
files in available configuration file paths. #1045 #1055 #1059 #1062 - We refactored the index architecture to improve stability and responsiveness. This includes fixes for several shutdown issues. #863
🐞 Bug Fixes
- Stalled sources that were unable to generate new events no longer stop import processes from shutting down under rare circumstances. #1058