Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop table semantics #1203

Open
gruuya opened this issue Mar 1, 2023 · 4 comments
Open

Drop table semantics #1203

gruuya opened this issue Mar 1, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@gruuya
Copy link
Contributor

gruuya commented Mar 1, 2023

Description

Use Case

Something high-ish level that allows Rust lib users to erase the root table directory completely, perhaps a DropTableBuilder/action/DeltaOperation/drop method on DeltaTable. So effectively an interface that can be used to execute DROP TABLE my_table SQL.

Alternatively, do a lazy drop by delaying physically deleting anything and instead flag the table somehow, so that CreateTableBuilder doesn't throw CreateError::TableAlreadyExists even in SaveMode::ErrorIfExists mode.

Related Issue(s)

Perhaps worthy of including in #1128

@gruuya gruuya added the enhancement New feature or request label Mar 1, 2023
@gruuya
Copy link
Contributor Author

gruuya commented Mar 2, 2023

Just to give a bit more context: while doing list on a given prefix and then delete for each path returned works, I suspect it will be sub-optimal for tables with a lot of files.

@roeap
Copy link
Collaborator

roeap commented Mar 5, 2023

As you mentioned, in delta we can delete tables by simply removing all files from the log, but this will not allow for the create scenario you mentioned. I guess listing and deleting the files is the way to go here... this can be made quite efficient once we get to implementing the batch delete apis offered by the major object stores.

Having a special flag to ignore existing data could work, but there i would be quite hesitant, as it would be special to delta-rs, and not honored by other writers etc..

@wjones127
Copy link
Collaborator

This would be good for cleaning up tables. But we should be clear that using DROP TABLE isn't recommended if the intention is to replace the table. Eventual consistency in S3 and other object stores will cause problems when trying to create and read the transaction log. We should support overwriting and schema evolution for those cases.

@gruuya
Copy link
Contributor Author

gruuya commented Mar 6, 2023

this can be made quite efficient once we get to implementing the batch delete apis offered by the major object stores

Nice, looking forward to this!

Eventual consistency in S3 and other object stores will cause problems when trying to create and read the transaction log.

Makes sense, thanks.

For the record I'll circumvent this problem by changing the intended storage layout. Instead of the table uri being dependent on database/schema/table names, it'll be dependent solely on a uuid uniquely tied to that table. This solution may not be general enough for other people, because it relies on us keeping a separate metadata store (e.g. SQLite) to track all the table uuids (in our case, we need one anyway to track the databases/schemas/functions etc.).

However, this guarantees that e.g. renaming a table is just a minor op, or that dropping a schema/table can be done lazily (since subsequent re-use of the schema/table names will result in a different uuid, so there won't be conflicts). I also intend on GC-ing the dropped (stale) schema/table directories via some variant of the VACUUM command, ideally using the batch delete api.

Therefore this is no longer a (major) issue for me, AFAIC you can close it. I am still very much blocked by #1188 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants