Skip to content

Latest commit

 

History

History
458 lines (365 loc) · 43.4 KB

CHANGELOG.md

File metadata and controls

458 lines (365 loc) · 43.4 KB

Changelog

rust-v0.16.0 (2023-09-27)

Full Changelog

Implemented enhancements:

  • Expose Optimize option min_commit_interval in Python #1640
  • Expose create_checkpoint_for #1513
  • integration tests regularly fail for HDFS #1428
  • Add Support for Microsoft OneLake #1418
  • add support for atomic rename in R2 #1356

Fixed bugs:

  • Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
  • [python] Different stringification of partition values in reader and writer #1653
  • Unable to interface with data written from Spark Databricks #1651
  • get_last_checkpoint does some unnecessary listing #1643
  • PartitionWriter's buffer_len doesn't include incomplete row groups #1637
  • Slack community invite link has expired #1636
  • delta-rs does not appear to support tables with liquid clustering #1626
  • Internal Parquet panic when using a Map type. #1619
  • partition_by with "$" on local filesystem #1591
  • ProtocolChanged error when perfoming append write #1585
  • Unable to cargo update using git tag or rev on Rust 1.70 #1580
  • NoMetadata error when reading detlatable #1562
  • Cannot read delta table: Delta protocol violation #1557
  • Update the CODEOWNERS to capture the current reviewers and contributors #1553
  • [Python] Incorrect file URIs when partition values contain escape character #1533
  • add documentation how to Query Delta natively from datafusion #1485
  • Python: write_deltalake to ADLS Gen2 issue #1456
  • Partition values that have been url encoded cannot be read when using deltalake #1446
  • Error optimizing large table #1419
  • Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
  • ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
  • Invalid JSON in log record missing field schemaString for DLT tables #1302
  • Special characters in partition path not handled locally #1299

Merged pull requests:

  • chore: bump rust crate version #1675 (rtyler)
  • fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
  • feat: allow to set large dtypes for the schema check in write_deltalake #1668 (ion-elgreco)
  • docs: small consistency update in guide and readme #1666 (ion-elgreco)
  • fix: exception string in writer.py #1665 (sebdiem)
  • chore: increment python library version #1664 (wjones127)
  • docs: fix some typos #1662 (ion-elgreco)
  • fix: more consistent handling of partition values and file paths #1661 (roeap)
  • docs: add docstring to protocol method #1660 (MrPowers)
  • docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
  • fix: enable offset listing for s3 #1654 (eeroel)
  • chore: fix the incorrect Slack link in our readme #1649 (rtyler)
  • fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
  • chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
  • feat: expose min_commit_interval to optimize.compact and optimize.z_order #1645 (ion-elgreco)
  • fix: avoid excess listing of log files #1644 (eeroel)
  • fix: introduce support for Microsoft OneLake #1642 (rtyler)
  • fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
  • fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
  • chore: relax chrono pin to 0.4 #1635 (houqp)
  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
  • docs: update Readme #1633 (dennyglee)
  • chore: pin the chrono dependency #1631 (rtyler)
  • feat: pass known file sizes to filesystem in Python #1630 (eeroel)
  • feat: implement parsing for the new domainMetadata actions in the commit log #1629 (rtyler)
  • ci: fix python release #1624 (wjones127)
  • ci: extend azure timeout #1622 (wjones127)
  • feat: allow multiple incremental commits in optimize #1621 (kvap)
  • fix: change map nullable value to false #1620 (cmackenzie1)
  • Introduce the changelog for the last couple releases #1617 (rtyler)
  • chore: bump python version to 0.10.2 #1616 (wjones127)
  • perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
  • fix: don't re-encode paths #1613 (wjones127)
  • feat: use url parsing from object store #1592 (roeap)
  • feat: buffered reading of transaction logs #1549 (eeroel)
  • feat: merge operation #1522 (Blajda)
  • feat: expose create_checkpoint_for to the public #1514 (haruband)
  • docs: update Readme #1440 (roeap)
  • refactor: re-organize top level modules #1434 (roeap)
  • feat: integrate unity catalog with datafusion #1338 (roeap)

rust-v0.15.0 (2023-09-06)

Full Changelog

Implemented enhancements:

  • Configurable number of retries for transaction commit loop #1595

Fixed bugs:

  • Unable to read table using VM Managed Identity on Azure #1462
  • Unable to query by partition column #1445

Merged pull requests:

rust-v0.14.0 (2023-08-01)

Full Changelog

Implemented enhancements:

  • Define common dependencies in Cargo Workspace #1572
  • Make delta_datafusion::find_files public #1559

Fixed bugs:

  • Excessive integration test sizes causing builds to fail #1550
  • Slack invite link is not working #1530

Merged pull requests:

rust-v0.13.1 (2023-07-18)

Fixed bugs:

  • Revert premature merge of an attempted fix for binary column statistics #1544

rust-v0.13.0 (2023-07-15)

Full Changelog

Implemented enhancements:

  • Add nested struct supports #1518
  • Support FixedLenByteArray UUID statistics as a logical scalar #1483
  • Exposing create_add in the API #1458
  • Update features table on README #1404
  • docs(python): show data catalog options in Python API reference #1347
  • Add optimization to only list log files starting at a certain name #1252
  • Support configuring parquet compression #1235
  • parallel processing in Optimize command #1171

Fixed bugs:

  • get_add_actions() MAX is not showing complete value #1534
  • Can't get stats's minValues in add actions #1515
  • Pyarrow is_null filter not working as expected after loading using deltalake #1496
  • Can't write to table that uses generated columns #1495
  • Json error: Binary is not supported by JSON when writing checkpoint files #1493
  • _last_checkpoint size field is incorrect #1468
  • Error when Z Ordering a larger dataset #1459
  • Timestamp parsing issue #1455
  • File options are ignored when writing delta #1444
  • Slack Invite Link No Longer Valid #1425
  • cleanup_metadata doesn't remove .checkpoint.parquet files #1420
  • The test of reading the data from the blob storage located in Azurite container failed #1415
  • The test of reading the data from the bucket located in Minio container failed #1408
  • Datafusion: unreachable code reached when parsing statistics with missing columns #1374
  • vacuum is very slow on Cloudflare R2 #1366

Closed issues:

  • Expose Compression Options or WriterProperties for writing to Delta #1469
  • Support out-of-core Z-order using DataFusion #1460
  • Expose Z-order in Python #1442

Merged pull requests:

rust-v0.12.0 (2023-05-30)

Full Changelog

Implemented enhancements:

  • Release delta-rs 0.11.0 (next release after 0.10.0) #1362
  • Support writing statistics for date columns in Rust #1209

Fixed bugs:

  • Rust writer in operations makes a lot of data copies #1394
  • Unable to read timestamp fields from column statistics #1372
  • Unable to write custom metadata via configuration since version 0.9.0 #1353
  • .get_add_actions() returns wrong column statistics when dataSkippingNumIndexedCols property of the table was changed #1223
  • Ensure decimal statistics are written correctly in Rust #1208

Merged pull requests:

  • feat: add list_with_offset to DeltaObjectStore #1410 (ognis1205)
  • chore: type-check friendlier exports #1407 (roeap)
  • chore: remove ancillary crates from the git tree #1406 (rtyler)
  • chore: bump the version for the next release #1405 (rtyler)
  • feat: more efficient parquet writer and more statistics #1397 (wjones127)
  • perf: improve record batch partitioning #1396 (roeap)
  • chore: bump datafusion to 25 #1389 (roeap)
  • refactor!: remove DeltaDataType aliases #1388 (cmackenzie1)
  • feat: vacuum with concurrent requests #1382 (wjones127)
  • feat: add datafusion storage catalog #1381 (roeap)
  • docs: updated schema.rs to use the right signature for decimal data type in documentation #1377 (rahulj51)
  • fix: delete operation when partition and non partition columns are used #1375 (Blajda)
  • fix: add conversion for string for Field::TimestampMicros (#1372) #1373 (cmackenzie1)
  • fix: allow user defined config keys #1365 (roeap)
  • ci: disable full debug symbol generation #1364 (roeap)
  • fix: include stats for all columns (#1223) #1342 (mrjoe7)

rust-v0.11.0 (2023-05-12)

Full Changelog

Implemented enhancements:

  • Implement simple delete case #832

Merged pull requests:

  • chore: update Rust package version #1346 (rtyler)
  • fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
  • feat: delete operation #1176 (Blajda)
  • feat: add wasbs to known schemes #1345 (iajoiner)
  • test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
  • feat: write command improvements #1267 (roeap)
  • feat: added support for Databricks Unity Catalog #1331 (nohajc)
  • fix: double url encode of partition key #1324 (mrjoe7)

rust-v0.10.0 (2023-05-02)

Full Changelog

Implemented enhancements:

  • Support Optimize on non-append-only tables #1125

Fixed bugs:

  • DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
  • Datafusion: SQL projection returns wrong column for partitioned data #1292
  • Unable to query partitioned tables #1291

Merged pull requests:

  • chore: add deprecation notices for commit logic on DeltaTable #1323 (roeap)
  • fix: handle local paths on windows #1322 (roeap)
  • fix: scan partitioned tables with datafusion #1303 (roeap)
  • fix: allow special characters in storage prefix #1311 (wjones127)
  • feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
  • Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
  • Enable the json feature for the parquet crate #1300 (rtyler)

rust-v0.9.0 (2023-04-14)

Full Changelog

Implemented enhancements:

  • hdfs support #300
  • Add decimal primitive type to document #1280
  • Improve error message when filtering on non-existant partition columns #1218

Fixed bugs:

  • Datafusion table provider: issues with timestamp types #441
  • Not matching column names when creating a RecordBatch from MapArray #1257
  • All stores created using DeltaObjectStore::new have an identical object_store_url #1188

Merged pull requests:

  • Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
  • chore: df / arrow changes after update #1288 (roeap)
  • feat: read schema from parquet files in datafusion scans #1266 (roeap)
  • HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
  • Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
  • Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
  • Simplify the Store Backend Configuration code #1265 (mrjoe7)
  • feat: optimistic transaction protocol #632 (roeap)
  • Write support for additional Arrow datatypes #1044(chitralverma)
  • Unique delta object store url #1212 (gruuya)
  • improve err msg on use of non-partitioned column #1221 (marijncv)

rust-v0.8.0 (2023-03-10)

Full Changelog

Implemented enhancements:

  • feat(rust): support additional types for partition values #1170

Fixed bugs:

  • File pruning does not occur on partition columns #1175
  • Bug: Error loading Delta table locally #1157
  • Deltalake 0.7.0 with s3 feature compliation error due to rusoto_dynamodb version conflict #1191
  • Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186

Merged pull requests:

rust-v0.7.0 (2023-02-11)

Full Changelog

Implemented enhancements:

  • Support FSCK REPAIR TABLE Operation #1092
  • Expose the Delta Log in a DataFrame that's easy for analysis #1031
  • Provide case-insensitive storage options in backend #999
  • Support local file path in CreateBuilder::with_location() #998
  • Save operational params in the same way with delta io #1054 (ismoshkov)

Fixed bugs:

  • DeltaTable DataFusion TableProvider does not support filter pushdown #1064
  • DeltaTable DataFusion scan does not prune files properly #1063
  • deltalake.DeltaTable constructor hangs in Jupyter #1093
  • Transaction log JSON formatting issue when writing data via Python bindings #1017
  • crates.io entry is missing link to rustdoc documentation #1076
  • URL Registered with ObjectStore registry is different from url in DeltaScan #1018
  • Not able to connect to Azure Storage with client id/secret #977
  • Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
  • Overwrite mode does not work with Azure #939
  • Use Chrono without default features #914
  • cargo test does not run due to tls conflict #985
  • Azure SAS authorization fails with <AuthenticationErrorDetail>Signature fields not well formed. #910

Merged pull requests:

  • Make rustls default across all packages #1097 (wjones127)
  • Implement filesystem check #1103 (Blajda)
  • refactor: move vacuum command to operations module #1045 (roeap)
  • feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
  • feat: improve storage location handling #1065 (roeap)
  • Fix to support UTC timezone #1022 (andrei-ionescu)
  • feat: harmonize and simplify storage configuration #1052 (roeap)
  • feat: expose function to get table of add actions #1033 (wjones127)
  • fix: change unexpected field logging level to debug #1112 (houqp)
  • fix: datafusion predicate pushdown and dependencies #1071 (roeap)
  • fix: azure sas key url encoding #1036 (roeap)
  • Add provisional workaround to support CDC #1039 #1042 (Fazzani)
  • improve debuggability of json ser/de errors #1119 (houqp)
  • Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
  • minor: optimize partition lookup for vacuum loop #1120 (houqp)
  • Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
  • add test for null_count_schema_for_fields #1135 (marijncv)
  • add test for min_max_schema_for_fields #1122 (marijncv)
  • add test for get_boolean_from_metadata #1121 (marijncv)
  • add test for left_larger_than_right #1110 (marijncv)
  • Add test for: to_scalar_value #1086 (marijncv)
  • Fix typo in delta-inspect #1072 (byteink)
  • chore: update datafusion #1114 (roeap)

rust-v0.6.0 (2022-12-16)

Full Changelog

Implemented enhancements:

  • Support Apache Arrow DataFusion 15 #1020
  • Python package: Loosen version requirements for maturin #1004
  • Remove Cargo.lock from library crates and add Cargo.lock to binary ones #1000
  • More frequent Rust releases #969
  • Thoughts on adding read_delta to pandas #869
  • Add the support of the AWS_PROFILE environment variable for S3 #986 (fvaleye)

Fixed bugs:

  • Azure SAS signatures ending in "=" don't work #1003
  • Fail to compile deltalake crate, need to update dynamodb_lock in crates.io #1002
  • error reading delta table to pandas: runtime dropped the dispatch task #975
  • MacOS arm64 wheels are generated incorrectly #972
  • Overwrite creates new file #960
  • The written delta file has corrupted structure #956
  • Write mode doesn't work with Azure storage #955
  • Python: We don't error on reader protocol v2 #886
  • Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator