Skip to content

Releases: tenzir/tenzir

VAST 2020.04.29

29 Apr 19:49
2020.04.29
df8e4c5
Compare
Choose a tag to compare

We are happy to announce our release 2020.04.29. Similar to last month’s release, this release mainly included bug and robustness fixes, along with adding more documentation over at docs.tenzir.com.

  • Data Sets. we have begun to add experimental support for data sets, a deterministic abstraction for pinning a query result as a working set. Because they are deterministic, they support paging, sorting, and other blocking operations. For now, we prototype the API in Python. We may move the implementation back to C++ at some later point if we encounter performance issues. Data Sets will be fully implemented on top of Apache Arrow so that they can benefit from zero-copy data sharing across multiple processes. VAST will structurally share it’s data with the Data Set Manager in a copy-on-write fashion. For example, if 80% of all data is shared among 10 queries, this will result in 800% memory reduction. Data Sets will also be the intermediate data representation for our upcoming web UI. Stay tuned.
  • Archive performance. Requests to the archive are now interruptible, reducing the latency of exports for large databases especially. First results now appear quicker for queries matching many results, and the overall export performance has improved.
  • IoC Matching. We reworked the user interface to the intelligence matching feature: you can use the new vast matcher start subcommand to start a new matcher and get a stream of matches on the standard output. Multiple matchers can run in parallel this way, each of which operates on a subset of the ingested data. See the documentation for more details.

Improvements

  • 🎁 Bash autocompletion for vast is now available via the autocomplete script located at scripts/vast-completions.bash in the VAST source tree. #833

  • 🎁 Packet drop and discard statistics are now reported to the accountant for PCAP import, and are available using the keys pcap-reader.recv, pcap-reader.drop, pcap-reader.ifdrop, pcap-reader.discard, and pcap-reader.discard-rate in the vast.statistics event. If the number of dropped packets exceeds a configurable threshold, VAST additionally warns about packet drops on the command line. #827 #844

Improvements (Pro Only)

  • 🎁 Added a new subcommand vast matcher start. It is now possible to create multiple matchers running in parallel using the new subcommand, each of which can be configured to match a subset of the input stream. The old --enable-matcher flag was removed.

  • 🎁 Matchers can now load existing IoCs from VAST on startup using the --ioc-query flag.

  • 🎁 Matchers now support live removal of single IoCs using the vast matcher remove-ioc subcommand.

Changes

  • 🔄 The option --skip-candidate-checks / -s for the count command was renamed to --estimate / -e. #843

  • 🔄 The index specific options --max-partition-size, --max-resident-partitions, --max-taste-partitions, and --max-queries can now be specified on the command line when starting a node. #728

  • 🔄 The default bind address has been changed from :: to localhost. #828

Bug Fixes

  • 🪲 For some queries, the index evaluated only a subset of all relevant partitions in a non-deterministic manner. Fixing a violated evaluation invariant now guarantees deterministic execution. #842

  • 🪲 The stop command always returned immediately, regardless of whether it succeeded. It now blocks until the remote node shut down properly or returns an error exit code upon failure. #849

  • 🪲 Fixed a crash when importing data while a continuous export was running for unrelated events. #830

  • 🪲 Fixed a bug that could cause stalled input streams not to forward events to the index and archive components for the JSON, CSV, and Syslog readers, when the input stopped arriving but no EOF was sent. This is a follow-up to #750. A timeout now ensures that the readers continue when some events were already handled, but the input appears to be stalled. #835

  • 🪲 Queries of the form x != 80/tcp were falsely evaluated as x != 80/? && x != ?/tcp. (The syntax in the second predicate does not yet exist; it only illustrates the bug.) Port inequality queries now correctly evaluate x != 80/? || x != ?/tcp. E.g., the result now contains values like 80/udp and 80/?, but also 8080/tcp. #834

  • 🪲 Archive lookups are now interruptible. This change fixes an issue that caused consecutive exports to slow down the node, which improves the overall performance for larger databases considerably. #825

As always, see the CHANGELOG for a full list of changes.

VAST 2020.03.26

26 Mar 16:35
2020.03.26
0532d2d
Compare
Choose a tag to compare

We are happy to announce VAST 2020.03.26. In this release we mainly worked on bug fixes and UI improvements.

  • Syslog Import. The vast import subcommand now is able to natively import Syslog as defined in RFC 5424. These are produced by popular logging tools such as journald. Thanks to Maximilian Knapperzbusch for the contribution! Max added this new feature as part of the master project for advanced topics in IT security. We have an ongoing collaboration with the security group at the University of Hamburg, led by Prof. Mathias Fischer, and are excited for more contributions of this kind.

  • Documentation Page. We rebuilt our documentation with Docusaurus and relaunched it at docs.tenzir.com. Docusaurus gives us the flexibility to easily add entire documentation sites for our different projects like Threat Bus. The page is updated daily to reflect the latest state of development.

  • User Interface. We introduced a new user-facing log level called verbose, which is an intermediate between the existing user-facing info and developer-facing debug log levels. Additionally, we reworked the behaviour of VAST to not create files in the current working directory every time a command is invoked.

Improvements

  • 🎁 The new vast import syslog command allows importing Syslog messages as defined in RFC 5424. #770

  • 🎁 The hash index has been re-enabled after it was outfitted with a new high-performance hash map implementation that increased performance to the point where it is on par with the regular index, while delivering up to 3x improvement in disk usage. #796

  • 🎁 The option --disable-community-id has been added to the vast import pcap and vast import netflow commands for disabling the automatic computation of Community IDs. #777

  • 🎁 The verbose log level has been added between info and debug. This level is enabled at build time for all build types, making it possible to get more detailed logging output from release builds. #787

  • 🎁 The config option system.log-directory was deprecated and replaced by the new option system.log-file. All logs will now be written to a single file, and by the node only. #803 #806

Changes

  • 🔄 The MRT/bgpdump integrations were temporarily disabled (for now), and will be fixed at a later point in time. #808

  • 🔄 The short option -c for setting the configuration file has been removed. The long option --config= must now be used instead. #781

Bug Fixes

  • 🪲 An under-the-hood change to our parser-combinator framework makes sure that we do not discard possibly invalid input data up to the end of input. #791 #808

  • 🪲 The short option -c now works as expected for continuous exports and imports, and for setting the cutoff for PCAP. #781

  • 🪲 Continuous export processes can now be stopped correctly. #779

As always, see the CHANGELOG for a full list of changes.

VAST 2020.02.27

27 Feb 16:10
2020.02.27
bffaead
Compare
Choose a tag to compare

We're happy to announce VAST 2020.02.27, a maintenance release focusing on robustness and an improved UX.

The most notable changes concern the user interface and adjust the command line interface to match expectations from users that are familiar with common UNIX command line tools.

Breaking Changes

  • 🚧 VAST now requires Apache Arrow 0.16.0, and no longer supports previous versions of Apache Arrow.

Improvements

  • 🎁 The filesystem layout was changed so that log and database files are no longer written to the same directory. Log files are now stored in vast.log/, a path that can be independently specified with the --log-directory option. The long option for the database directory has changed to --db-directory. #758

  • 🎁 VAST now works with CAF 0.17.4.

Bug Fixes

  • 🐞 Due to a newly discovered performance regression in the hash index, all fields that defaulted to this index were changed to use their type-specific regular index type. #765

  • 🐞 A continuous import of Zeek logs will now always report all incoming events. Previously, the import could stall indefinitely if the volume of imported events was too low. #750

For a complete list of changes, please consult the CHANGELOG.

VAST 2020.01.31

04 Feb 09:35
879c41a
Compare
Choose a tag to compare

TL;DR

  • Apache Arrow export (+ experimental Python shim)
  • Optimized index implementation with 3x space reduction
  • Switch to CalVer and monthly release schedule

Notes

Dear community, we are pleased to announce the release of VAST 2020.01.31! As you can see, we switched to CalVer versioning. Moving forward, we plan to cut a release at the end of every month. In the last week before the release, we will focus on testing.

On the feature side, we have added support exporting data in the Apache Arrow format. This effectively connects VAST with the data science world. The focus is on in-memory columnar analytics. We also wrote a small Python shim to demonstrate this. See the /examples directory for a notebook.

Additionally, we implemented a new optimized index type based on hashing that yields a 3x space reduction, compared to normal string indexes. This index type can be selected by adding #index=hash to any type in the schema. The index supports (in)equality comparison only and does not work with substring queries. Internally, it computes a list of fingerprints over the data, plus a small satellite data structure resolves false positives and allows for building the complement.

Changelog

The following items reflect notable changes in reverse-chronological order:

  • 🔄 VAST is switching to a calendar-based versioning scheme starting with this release. #739

  • 🎁 When a record field has the #index=hash attribute, VAST will choose an optimized index implementation. This new index type only supports (in)equality queries and is therefore intended to be used with opaque types, such as unique identifiers or random strings. #632, #726

  • 🎁 An experimental new Python module enables querying VAST and processing results as pyarrow tables. #685

  • 🐞 A bug in the quoted string parser caused a parsing failure if an escape character occurred in the last position. #685

  • 🔄 Record field names can now be entered as quoted strings in the schema and expression languages. This lifts a restriction where JSON fields with whitespaces or special characters could not be ingested. #685

  • 🔄 Two minor modifications were done in the parsing framework: (i) the parsers for enums and records now allow trailing separators, and (ii) the dash (-) was removed from the allowed characters of schema type names. #706

  • 🐞 The example configuration file contained an invalid section vast. This has been changed to the correct name system. #705

  • 🐞 A race condition in the index logic was able to lead to incomplete or empty result sets for vast export. #703

  • 🔄 Build configuration defaults have been adapted for a better user experience. Installations are now relocatable by default, which can be reverted by configuring with --without-relocatable. Additionally, new sets of defaults named --release and --debug (renamed from --dev-mode) have been added. #695

  • 🎁 On FreeBSD, a VAST installation now includes an rc.d script that simplifies spinning up a VAST node. CMake installs the script at PREFIX/etc/rc.d/vast. #693

  • 🎁 The long option --config, which sets an explicit path to the VAST configuration file, now also has the short option -c. #689

  • 🎁 Added Apache Arrow as new export format. This allows users to export query results as Apache Arrow record batches for processing the results downstream, e.g., in Python or Spark. #633

  • 🐞 The import process did not print statistics when importing events over UDP. Additionally, warnings about dropped UDP packets are no longer shown per packet, but rather periodically reported in a readable format. #662

  • 🐞 Importing events over UDP with vast import <format> --listen :<port>/udp failed to register the accountant component. This caused an unexpected message warning to be printed on startup and resulted in losing import statistics. VAST now correctly registers the accountant. #655

  • 🐞 PCAP ingestion failed for traces containing VLAN tags. VAST now strips IEEE 802.1Q headers instead of skipping VLAN-tagged packets. #650

  • 🐞 In some cases it was possible that a source would connect to a node before it was fully initialized, resulting in a hanging vast import process. #647

  • 🎁 The import pcap command now takes an optional snapshot length via --snaplen. If the snapshot length is set to snaplen, and snaplen is less than the size of a packet that is captured, only the first snaplen bytes of that packet will be captured and provided as packet data. #642

  • 🔄 The import pcap command no longer takes interface names via --read,-r, but instead from a separate option named --interface,-i. This change has been made for consistency with other tools. #641

VAST 0.2 - 2019-10-30

30 Oct 12:38
Compare
Choose a tag to compare

This release contain numerous bugfixes, new features, and a few enhancements of existing functionality. The highlights include (1) a new command pivot that makes it possible to correlate data of different data types, (2) native Suricata support, (3) a new command infer to deduce VAST types from less-typed formats, (4) argus import, and (5) JSON import.

  • 🎁 The default schema for Suricata has been updated to support the new
    suricata.smtp event type in Suricata 5.

  • 🎁 The export null command retrieves data, but never prints anything. Its
    main purpose is to make benchmarking VAST easier and faster.

  • 🔄 The query language has been extended to support expression of the form
    X == /pattern/, where X is a compatible LHS extractor. Previously,
    patterns only supports the match operator ~. The two operators have the
    same semantics when one operand is a pattern.

  • 🎁 The new pivot command retrieves data of a related type. It inspects each
    event in a query result to find an event of the requested type. If a common
    field exists in the schema definition of the requested type, VAST will
    dynamically create a new query to fetch the contextual data according to the
    type relationship. For example, if two records T and U share the same
    field x, and the user requests to pivot via T.x == 42, then VAST will
    fetch all data for U.x == 42. An example use case would be to pivot from a
    Zeek or Suricata log entry to the corresponding PCAP packets.
    VAST uses the field community_id to pivot between the logs and the packets.
    Pivoting is currently implemented for Suricata, Zeek (with community ID
    computation
    enabled), and
    PCAP.

  • 🎁 The new infer command performs schema inference of input data. The
    command can deduce the input format and creates a schema definition that is
    sutable to use with the supplied data. Supported input types include Zeek TSV
    and JSONLD.

  • 🐞 The user environments LDFLAGS were erroneously passed to ar. Instead,
    the user environments ARFLAGS are now used.

  • 🐞 Exporting data with export -n <count> crashed when count was a
    multiple of the table slice size. The command now works as expected.

  • 🎁 The newly added count comman allows counting hits for a query without
    exporting data.

  • 🎁 Commands now support a --documentation option, which returns
    Markdown-formatted documentation text.

  • 🔄 CAF and Broker are no longer required to be installed prior to building
    VAST. These dependencies are now tracked as git submodules to ensure version
    compatibility. Specifying a custom build is still possible via the CMake
    variables CAF_ROOT_DIR and BROKER_ROOT_DIR.

  • 🔄 When exporting data in pcap format, it is no longer necessary to
    manually restrict the query by adding the predicate #type == "pcap.packet"
    to the expression. This now happens automatically because only this type
    contains the raw packet data.

  • 🐞 Queries of the form #type ~ /pattern/ used to be rejected erroneously.
    The validation code has been corrected and such queries are now working
    as expected.

  • 🐞 When specifying enum types in the schema, ingestion failed because there
    did not exist an implementation for such types. It is now possible to use
    define enumerations in schema as expected and query them as strings.

  • 🐞 Queries with the less < or greater > operators produced off-by-one
    results for the duration when the query contained a finer resolution than
    the index. The operator now works as expected.

  • 🎁 A new schema for Argus CSV output has been added. It parses the output of
    ra(1), which produces CSV output when invoked with -L 0 -c ,.

  • 🔄 When defining schema attributes in key-value pair form, the value no
    longer requires double-quotes. For example, #foo=x is now the same as
    #foo="x". The form without double-quotes consumes the input until the next
    space and does not support escaping. In case an attribute value contains
    whitespace, double-quotes must be provided, e.g., #foo="x y z".

  • 🎁 The schema language now supports comments. A double-slash (//) begins a
    comment. Comments last until the end of the line, i.e., until a newline
    character (\n).

  • 🔄 The PCAP packet type gained the additional field community_id that
    contains the Community ID
    flow hash. This identifier facilitates pivoting to a specific flow from data
    sources with connnection-level information, such Zeek or Suricata logs.

  • 🐞 Timestamps were always printed in millisecond resolution, which lead to
    loss of precision when the internal representation had a higher resolution.
    Timestamps are now rendered up to nanosecond resolution - the maximum
    resolution supported.

  • 🎁 The import command now supports CSV formatted data. The type for each
    column is automatically derived by matching the column names from the CSV
    header in the input with the available types from the schema definitions.

  • 🐞 All query expressions in the form #type != X were falsely evaluated as
    #type == X and consequently produced wrong results. These expressions now
    behave as expected.

  • 🐞 Parsers for reading log input that relied on recursive rules leaked memory
    by creating cycling references. All recursive parsers have been updated to
    break such cycles and thus no longer leak memory.

  • 🔄 Log files generally have some notion of timestamp for recorded events. To
    make the query language more intuitive, the syntax for querying time points
    thus changed from #time to #timestamp. For example,
    #time > 2019-07-02+12:00:00 now reads #timestamp > 2019-07-02+12:00:00.

  • 🎁 Configuring how much status information gets printed to STDERR previously
    required obscure config settings. From now on, users can simply use
    --verbosity=<level>,-v <level>, where <level> is one of quiet, error,
    warn, info, debug, or trace. However, debug and trace are only
    available for debug builds (otherwise they fall back to log level info).

  • 🎁 The query expression language now supports data predicates, which are a
    shorthand for a type extractor in combination with an equality operator. For
    example, the data predicate 6.6.6.6 is the same as :addr == 6.6.6.6.

  • 🐞 The Zeek reader failed upon encountering logs with a double column, as
    it occurs in capture_loss.log. The Zeek parser generator has been fixed to
    handle such types correctly.

  • 🐞 Some queries returned duplicate events because the archive did not filter
    the result set properly. This no longer occurs after fixing the table slice
    filtering logic.

  • 🎁 The index object in the output from vast status has a new field
    statistics for a high-level summary of the indexed data. Currently, there
    exists a nested layouts objects with per-layout statistics about the number
    of events indexed.

  • 🎁 The accountant object in the output from vast status has a new field
    log-file that points to the filesystem path of the accountant log file.

  • 🔄 Default schema definitions for certain import formats changed from
    hard-coded to runtime-evaluated. The default location of the schema
    definition files is $(dirname vast-executable)/../share/vast/schema.
    Currently this is used for the Suricata JSON log reader.

  • 🔄 The default directory name for persistent state changed from vast to
    vast.db. This makes it possible to run ./vast in the current directory
    without having to specify a different state directory on the command line.

  • 🔄 Nested types are from now on accessed by the .-syntax. This means
    VAST now has a unified syntax to select nested types and fields.
    For example, what used to be zeek::http is now just zeek.http.

  • 🎁 Data extractors in the query language can now contain a type prefix.
    This enables an easier way to extract data from a specific type. For example,
    a query to look for Zeek conn log entries with responder IP address 1.2.3.4
    had to be written with two terms, #type == zeek.conn && id.resp_h == 1.2.3.4,
    because the nested id record can occur in other types as well. Such queries
    can now written more tersely as zeek.conn.id.resp_h == 1.2.3.4.

  • 🎁 VAST gained support for importing Suricata JSON logs. The import command
    has a new suricata format that can ingest EVE JSON output.

  • 🎁 The data parser now supports count and integer values according to the
    International System for Units (SI). For example, 1k is equal to 1000
    and 1Ki equal to 1024.

  • 🐞 The map data parser did not parse negative values correctly. It was not
    possible to parse strings of the form "{-42 -> T}" because the parser
    attempted to parse the token for the empty map "{-}" instead.

  • 🎁 VAST can now ingest JSON data. The import command gained the json
    format, which allows for parsing line-delimited JSON (LDJSON) according to a
    user-selected type with --type. The --schema or --schema-file options
    can be used in conjunction to supply custom types. The JSON objects in
    the input must match the selected type, that is, the keys of the JSON object
    must be equal to the record field names and the object values must be
    convertible to the record field types.

  • 🐞 The CSV printer of the export command used to insert 2 superfluous
    fields when formatting an event: The internal event ID and a deprecated
    internal timestamp value. Both fields have been removed from the output,
    bringing it into line with the other output formats.

  • 🔄 The (internal) option --node for the import and export commands
    has been renamed from -n to -N, to allow usage of -n for
    --max-events.

  • 🎁 For symmetry to the export...

Read more

VAST 0.1 - 2019-02-28

28 Feb 09:10
0.1
Compare
Choose a tag to compare

This release has been tested to work with CAF rev abcf0df4df9bd313e82b3d641fe6c947d2cc1c86.