Skip to content

Releases: tarantool/cartridge-spark

[0.7.0] - 2023-08-01

01 Aug 21:21
Compare
Choose a tag to compare

Features

  • Added support for Spark's Timestamp field type. Note: this field type requires datetime field type in Tarantool, which is available in Tarantool server versions 2.11+. For older Tarantool server versions, it is recommended to use fields of number and string types for storing dates.

What's Changed

Full Changelog: v0.6.0...v0.7.0

[0.6.0] - 2023-04-15

15 Apr 13:52
Compare
Choose a tag to compare

Features

  • Added support for Spark 3.x and Scala 2.13
  • Versions for Scala 2.12 and 2.13 are now built with Spark 3.x, and for Spark 2.x only Scala 2.11 is supported.

What's Changed

  • Add support for Apache Spark 3.x by @akudiyar in #48
  • Fix local test pipeline for all Scala versions by @akudiyar in #51

Full Changelog: v0.5.3...v0.6.0

[0.5.3] - 2023-03-13

13 Mar 23:46
Compare
Choose a tag to compare

Bugfixes

  • Fix race condition when different threads changed the tuple mapper while it was being serialized, leading to the corrupted serialized task closure data - #45

What's Changed

  • Fix race condition when serializing the write task closure by @akudiyar in #46
  • Update driver to v0.10.1 by @akudiyar in #47

Full Changelog: v0.5.2...v0.5.3

[0.5.2] - 2023-02-20

13 Mar 23:43
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.1...v0.5.2

[0.5.1] - 2022-12-29

29 Dec 23:08
Compare
Choose a tag to compare

Bugfixes

  • Update driver version to 0.10.0, containing batch operation errors handling
  • Switch all batch write operations to use "replace" calls instead of "insert". That doesn't change the write operation semantics but helps to avoid spurious duplicate key errors when Spark tasks are automatically repeated.

What's Changed

Full Changelog: v0.5.0...v0.5.1

[0.5.0] - 2022-11-03

03 Nov 23:48
Compare
Choose a tag to compare

Features

  • Added support for batch insert and replace operations (see tarantool/crud README for more information).
  • Added an option for using automatic request retries for specific network errors (see tarantool/cartridge-java README for more information).
  • Added field name transformation options for dataset write requests.
  • Added an option for specifying the number of cartridge-java connections per Tarantool host.

Limitations

  • Batch operations are not completely consistent without existing support of interactive transactions and MVCC in the cartridge-java driver. In some cases of failed write operations the cluster state may require either controlling of the identifiers of written data for restarting the process or a full cleanup.
  • Currently there is a lack of error reporting from the cartridge-java driver when any errors occur during the batch write operations. Better error handling will be added in the nearest driver versions.
  • When using the automatic request retries it may very easily become a source of load storms for the target Tarantool cluster, so it is not a mean of mitigating significant network outages or any problems related to the cluster overload. This tool is supposed to allow continuing the normal business operations in case of small discrepancies in a well monitored and handled production system.

Bugfixes

  • Fixed conversion of decimals from Tarantool type to Spark DecimalType.

Misc

  • Improved integration tests speed and stability.

This release is available in Maven Central. See README for more information about installation and supported Scala versions.

What's Changed

  • Improve integration tests time and stability by @akudiyar in #27
  • Fix BigDecimal conversion to Spark DecimalType by @akudiyar in #28
  • Use different clusters for parallel pipelines by @akudiyar in #30
  • Fix network interfaces and wait for all nodes to come up by @akudiyar in #32
  • Allow passing the connections option by @akudiyar in #34
  • Support batch dataset writing by @akudiyar in #37
  • Add options for configuring automatic network error retries by @akudiyar in #39
  • Add an option for specifying the field names transformation by @akudiyar in #40

Full Changelog: v0.4.0...v0.5.0

[0.4.0] - 2022-05-30

31 May 18:38
Compare
Choose a tag to compare

Features

  • Added support for inserting fields using the specified Dataset schema for each row for matching with the Tarantool space schema. This allows setting the field values by their names, changing their order in a row, and skipping the optional fields.

Misc

  • Added tests for saving Spark fields of decimal type into Tarantool

This release is available in Maven Central. See README for more information about installation and supported Scala versions.
Full Changelog: v0.3.0...v0.4.0

[0.3.0] - 2022-04-04

05 Apr 00:32
Compare
Choose a tag to compare

Features

  • Revamped TarantoolConnection with support for a pool of Tarantool clients - adds an ability for working with several Tarantool clusters in one Spark application.
  • Changed the semantics of DataSet write modes. See README for the actual meaning of each mode.

Bugfixes

  • Fixed several cases of stale client connections when a Spark application was finished.
  • Removed excess log messages in some cases.

This release is available in Maven Central. See README for more information about installation and supported Scala versions.

Full Changelog: v0.2.0...v0.3.0