Skip to content

v0.47.0

Compare
Choose a tag to compare
@neilotoole neilotoole released this 29 Jan 19:33
· 92 commits to master since this release

This is a significant release, focused on improving i/o, responsiveness, and performance. The headline features are caching of ingested data for document sources such as CSV or Excel, and download caching for remote document sources. There are a lot of under-the-hood changes, so please open an issue if you encounter any weirdness.

Added

  • Long-running operations (such as data ingestion, or file download) now result in a progress bar being displayed. Display of the progress bar is controlled by the new config options progress and progress.delay. You can also use the --no-progress flag to disable the progress bar.
    • 👉 The progress bar is rendered on stderr and is always zapped from the terminal when command output begins. It won't corrupt the output.
  • #307: Ingested document sources (such as CSV or Excel) now make use of an ingest cache DB. Previously, ingestion of document source data occurred on each sq command. It is now a one-time cost; subsequent use of the document source utilizes the cache DB. Until, that is, the source document changes: then the ingest cache DB is invalidated and ingested again. This is a significantly improved experience for large document sources.
  • There are several new commands to interact with the cache (although you shouldn't need to):
  • #24: The download mechanism for remote document sources (e.g. a CSV file at https://sq.io/testdata/actor.csv) has been completely overhauled. Previously, sq would re-download the remote file on every command. Now, the remote file is downloaded and cached locally. Subsequent sq invocations check for staleness of the cached download, and re-download if necessary.
  • As part of the download revamp, new config options have been introduced:
    • http.request.timeout is the timeout for the initial response from the server, and http.response.timeout is the timeout for reading the entire response body. We separate these two timeouts because it's possible that the server responds quickly, but then for a large file, the download takes too long.
    • https.insecure-skip-verify controls whether HTTPS connections verify the server's certificate. This is useful for remote files served with a self-signed certificate.
    • download.cache controls whether remote files are cached locally.
    • download.refresh.ok-on-err controls whether sq should continue with a stale cached download if an error occurred while trying to refresh the download. This is a sort of "Airplane Mode" for remote document sources: sq continues with the cached download when the network is unavailable.
  • There are two more new config options introduced as part of the above work.
    • cache.lock.timeout controls the time that sq will wait for a lock on the cache DB. The cache lock is introduced for when you have multiple sq commands running concurrently, and you want to avoid them stepping on each other.
    • Similarly, config.lock.timeout controls the timeout for acquiring the (newly-introduced) lock on sq's config file. This helps prevent issues with multiple sq processes mutating the config concurrently.
  • sq's own logs previously outputted in JSON format. Now there's a new log.format config option that permits setting the log format to json or text. The text format is more human-friendly, and is now the default.

Changed

  • Ingestion performance for json and jsonl sources has been significantly improved.

Fixed