Skip to content

Commit

Permalink
Merge branch 'delta-io:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanaston authored Oct 9, 2023
2 parents 2302bfb + ab6b0cf commit 1ec7c78
Show file tree
Hide file tree
Showing 33 changed files with 1,037 additions and 236 deletions.
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[workspace]
members = ["rust", "python"]
exclude = ["proofs", "delta-inspect"]
members = ["delta-inspect", "rust", "python"]
exclude = ["proofs"]
resolver = "2"

[profile.release-with-debug]
Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ API that lets you query, inspect, and operate your Delta Lake with ease.

- [Quick Start](#quick-start)
- [Get Involved](#get-involved)
- [Integartions](#integrations)
- [Integrations](#integrations)
- [Features](#features)

## Quick Start
Expand Down Expand Up @@ -138,22 +138,22 @@ of features outlined in the Delta [protocol][protocol] is also [tracked](#protoc
| S3 - R2 | ![done] | ![done] | requires lock for concurrent writes |
| Azure Blob | ![done] | ![done] | |
| Azure ADLS Gen2 | ![done] | ![done] | |
| Microsoft OneLake | ![done] | ![done] | |
| Microsoft OneLake | ![done] | ![done] | |
| Google Cloud Storage | ![done] | ![done] | |

### Supported Operations

| Operation | Rust | Python | Description |
| --------------------- | :-----------------: | :-----------------: | ------------------------------------- |
| Create | ![done] | ![done] | Create a new table |
| Read | ![done] | ![done] | Read data from a table |
| Vacuum | ![done] | ![done] | Remove unused files and log entries |
| Delete - partitions | | ![done] | Delete a table partition |
| Delete - predicates | ![done] | | Delete data based on a predicate |
| Optimize - compaction | ![done] | ![done] | Harmonize the size of data file |
| Optimize - Z-order | ![done] | ![done] | Place similar data into the same file |
| Merge | [![semi-done]][merge-rs]| [![open]][merge-py] | Merge two tables (limited to full re-write) |
| FS check | ![done] | | Remove corrupted files from table |
| Operation | Rust | Python | Description |
| --------------------- | :----------------------: | :-----------------: | ------------------------------------------- |
| Create | ![done] | ![done] | Create a new table |
| Read | ![done] | ![done] | Read data from a table |
| Vacuum | ![done] | ![done] | Remove unused files and log entries |
| Delete - partitions | | ![done] | Delete a table partition |
| Delete - predicates | ![done] | ![done] | Delete data based on a predicate |
| Optimize - compaction | ![done] | ![done] | Harmonize the size of data file |
| Optimize - Z-order | ![done] | ![done] | Place similar data into the same file |
| Merge | [![semi-done]][merge-rs] | [![open]][merge-py] | Merge two tables (limited to full re-write) |
| FS check | ![done] | | Remove corrupted files from table |

### Protocol Support Level

Expand All @@ -172,7 +172,7 @@ of features outlined in the Delta [protocol][protocol] is also [tracked](#protoc

| Reader Version | Requirement | Status |
| -------------- | ----------------------------------- | ------ |
| Version 2 | Column Mapping | |
| Version 2 | Column Mapping | |
| Version 3 | Table Features (requires reader V7) | |

[datafusion]: https://github.com/apache/arrow-datafusion
Expand Down
4 changes: 2 additions & 2 deletions delta-inspect/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ edition = "2021"

[dependencies]
anyhow = "1"
chrono = { workspace = true, default-features = false, features = ["clock"] }
clap = { version = "3", features = ["color"] }
tokio = { version = "1", features = ["fs", "macros", "rt", "io-util"] }
env_logger = "0"
Expand All @@ -19,8 +20,7 @@ path = "../rust"
version = "0"
features = ["azure", "gcs"]


[features]
default = ["rustls"]
native-tls = ["deltalake/s3-native-tls", "deltalake/glue-native-tls"]
rustls = ["deltalake/s3", "deltalake/glue"]
rustls = ["deltalake/s3", "deltalake/glue"]
25 changes: 13 additions & 12 deletions delta-inspect/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use chrono::Duration;
use clap::{App, AppSettings, Arg};

#[tokio::main(flavor = "current_thread")]
Expand Down Expand Up @@ -79,21 +80,21 @@ async fn main() -> anyhow::Result<()> {
Some(("vacuum", vacuum_matches)) => {
let dry_run = !vacuum_matches.is_present("no_dry_run");
let table_uri = vacuum_matches.value_of("uri").unwrap();
let mut table = deltalake::open_table(table_uri).await?;
let files = table
.vacuum(
vacuum_matches.value_of("retention_hours").map(|s| {
s.parse::<u64>()
.expect("retention hour should be an unsigned integer")
}),
dry_run,
true
)
let table = deltalake::open_table(table_uri).await?;
let retention = vacuum_matches
.value_of("retention_hours")
.map(|s| s.parse::<i64>().unwrap())
.unwrap();
let (_table, metrics) = deltalake::operations::DeltaOps(table)
.vacuum()
.with_retention_period(Duration::hours(retention))
.with_dry_run(dry_run)
.await?;

if dry_run {
println!("Files to deleted: {files:#?}");
println!("Files to deleted: {metrics:#?}");
} else {
println!("Files deleted: {files:#?}");
println!("Files deleted: {metrics:#?}");
}
}
_ => unreachable!(),
Expand Down
33 changes: 33 additions & 0 deletions docs/python_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Python API Reference

## DeltaTable

::: deltalake.table

## Writing Delta Tables

::: deltalake.write_deltalake

## Delta Lake Schemas

Schemas, fields, and data types are provided in the ``deltalake.schema`` submodule.

::: deltalake.schema.Schema

::: deltalake.schema.PrimitiveType

::: deltalake.schema.ArrayType

::: deltalake.schema.MapType

::: deltalake.schema.Field

::: deltalake.schema.StructType

## Data Catalog

::: deltalake.data_catalog

## Delta Storage Handler

::: deltalake.fs
3 changes: 3 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mkdocs
mkdocstrings[python]
mkdocs-autorefs
23 changes: 10 additions & 13 deletions docs/usage/examining-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The delta log maintains basic metadata about a table, including:
to have data deleted from it.

Get metadata from a table with the
`DeltaTable.metadata` method:
[DeltaTable.metadata()][] method:

``` python
>>> from deltalake import DeltaTable
Expand All @@ -27,12 +27,12 @@ Metadata(id: 5fba94ed-9794-4965-ba6e-6ee3c0d22af9, name: None, description: None

The schema for the table is also saved in the transaction log. It can
either be retrieved in the Delta Lake form as
`deltalake.schema.Schema` or as a
[deltalake.schema.Schema][] or as a
PyArrow schema. The first allows you to introspect any column-level
metadata stored in the schema, while the latter represents the schema
the table will be loaded into.

Use `DeltaTable.schema` to retrieve the delta lake schema:
Use [DeltaTable.schema][] to retrieve the delta lake schema:

``` python
>>> from deltalake import DeltaTable
Expand All @@ -43,14 +43,14 @@ Schema([Field(id, PrimitiveType("long"), nullable=True)])

These schemas have a JSON representation that can be retrieved. To
reconstruct from json, use
`deltalake.schema.Schema.from_json()`.
[deltalake.schema.Schema.from_json()][].

``` python
>>> dt.schema().json()
'{"type":"struct","fields":[{"name":"id","type":"long","nullable":true,"metadata":{}}]}'
```

Use `deltalake.schema.Schema.to_pyarrow()` to retrieve the PyArrow schema:
Use [deltalake.schema.Schema.to_pyarrow()][] to retrieve the PyArrow schema:

``` python
>>> dt.schema().to_pyarrow()
Expand All @@ -65,15 +65,12 @@ table, when, and by whom. This information is retained for 30 days by
default, unless otherwise specified by the table configuration
`delta.logRetentionDuration`.

::: note
::: title
Note
:::
!!! note

This information is not written by all writers and different writers may
use different schemas to encode the actions. For Spark\'s format, see:
<https://docs.delta.io/latest/delta-utility.html#history-schema>

This information is not written by all writers and different writers may
use different schemas to encode the actions. For Spark\'s format, see:
<https://docs.delta.io/latest/delta-utility.html#history-schema>
:::

To view the available history, use `DeltaTable.history`:

Expand Down
2 changes: 1 addition & 1 deletion docs/usage/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Usage

A `DeltaTable` represents the state of a
A [DeltaTable][] represents the state of a
delta table at a particular version. This includes which files are
currently part of the table, the schema of the table, and other metadata
such as creation time.
Expand Down
14 changes: 5 additions & 9 deletions docs/usage/loading-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,8 @@ version number or datetime string:
>>> dt.load_with_datetime("2021-11-04 00:05:23.283+00:00")
```

::: warning
::: title
Warning
:::

Previous table versions may not exist if they have been vacuumed, in
which case an exception will be thrown. See [Vacuuming
tables](#vacuuming-tables) for more information.
:::
!!! warning

Previous table versions may not exist if they have been vacuumed, in
which case an exception will be thrown. See [Vacuuming
tables](#vacuuming-tables) for more information.
24 changes: 23 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,26 @@ nav:
- Examining a Delta Table: usage/examining-table.md
- Querying a Delta Table: usage/querying-delta-tables.md
- Managing a Delta Table: usage/managing-tables.md
- Writing Delta Tables: usage/writing-delta-tables.md
- Writing Delta Tables: usage/writing-delta-tables.md
- API Reference: python_api.md

plugins:
- autorefs
- mkdocstrings:
handlers:
python:
path: [../python]
rendering:
heading_level: 4
show_source: false
show_symbol_type_in_heading: true
show_signature_annotations: true
show_root_heading: true
members_order: source
import:
# for cross references
- https://arrow.apache.org/docs/objects.inv
- https://pandas.pydata.org/docs/objects.inv

markdown_extensions:
- admonition
2 changes: 1 addition & 1 deletion python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ check-rust: ## Run check on Rust
.PHONY: check-python
check-python: ## Run check on Python
$(info Check Python black)
black --check .
black --check --diff .
$(info Check Python ruff)
ruff check .
$(info Check Python mypy)
Expand Down
Loading

0 comments on commit 1ec7c78

Please sign in to comment.