Skip to content
This repository has been archived by the owner on Jun 30, 2022. It is now read-only.

Version 0.2.1

Compare
Choose a tag to compare
@silviulica silviulica released this 21 Mar 18:39
· 122 commits to master since this release

The 0.2.1 release includes the following changes:

  • Optimized performance for the following features:
    • Logging
    • Shuffle Writing
    • Using Coders
    • Compiling some of the worker modules with Cython
  • Changed the default behavior for Cloud execution: Instead of downloading the SDK from a Cloud Storage bucket, you now download the SDK as a tarball from GitHub. When you run jobs using the Dataflow service, the SDK version used will match the version you've downloaded (to your local environment). You can use the --sdk_location pipeline option to override this behavior and provide an explicit tarball location (Cloud Storage path or URL).
  • Fixed several pickling issues related to how Dataflow serializes user functions and data.
  • Fixed several worker lease expiration issues experienced when processing large datasets.
  • Improved validation to detect various common errors, such as access issues and invalid parameter combinations, much earlier in time.