Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: copy to support for lance #2342

Merged
merged 26 commits into from
Jan 18, 2024
Merged

feat: copy to support for lance #2342

merged 26 commits into from
Jan 18, 2024

Conversation

tychoish
Copy link
Collaborator

@tychoish tychoish commented Jan 4, 2024

closes #2066

@universalmind303
Copy link
Contributor

universalmind303 commented Jan 4, 2024

can we add some tests. Ideally, we should test the round trip to/from lance in a .slt

@tychoish tychoish marked this pull request as draft January 8, 2024 20:21
@tychoish tychoish marked this pull request as ready for review January 13, 2024 16:02
@tychoish
Copy link
Collaborator Author

This is ready and has tests now (both SLT and python).

I've apparently had the code for this correct in every case, except that the local write path pre-creates the directory for us, which the lance writer doesn't like.

The lance API is kinda wack: and in the end doesn't really use our object store at all (and I think can't) not for technical reasons but for API reasons, so eventually we should be able to get that ironed out but requires sort of major surgery the lance types.

crates/datasources/src/common/sink/lance.rs Show resolved Hide resolved
crates/datasources/src/common/sink/lance.rs Show resolved Hide resolved
Comment on lines +22 to +29
copy (select * from lance_tbl) to 'file://${TMP}' format lance;

query IT
select * from lance_scan('file://${TMP}') order by point.lat;
----
0.2,1.8 {lat:42.1,long:-74.1}
1.1,1.2 {lat:45.5,long:-122.7}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do relative paths work too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add extra tests. no clue, lance's file parsing is a bit picky relative to other methods, but it should.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took out the file::// and it works fine, which is what we test with every data format; I believe these are still absolute paths (/tmp/etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relative paths are very important from a usability perspective. I'd hate having to write an absolute path every time.

If both these don't work, I think we should open a follow up issue to get it working:

copy (select 1) to './path/to/table' format lance;
select * from lance_scan('./path/to/table');

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it

Copy link
Contributor

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly suggestions, but a few important things

  • opts is never used inside stream_into_inner
  • no docs/tests explaining what the copy options do.

tests/tests/test_lance.py Outdated Show resolved Hide resolved
crates/protogen/src/metastore/types/options.rs Outdated Show resolved Hide resolved
crates/datasources/src/common/sink/lance.rs Outdated Show resolved Hide resolved
crates/datasources/src/common/sink/lance.rs Outdated Show resolved Hide resolved
@universalmind303
Copy link
Contributor

sorry, one more note. Let's update the README support matrix.

@tychoish tychoish enabled auto-merge (squash) January 18, 2024 22:20
@tychoish tychoish merged commit 9323041 into main Jan 18, 2024
20 checks passed
@tychoish tychoish deleted the tycho/lance-copy-to branch January 18, 2024 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support writing to lance tables
3 participants