-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fixed-list DataType to System #1298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but @ray6080 should take another look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high level suggestion is we should not introduce TENSOR as a logical data type to users. Instead, LIST is a more generic data type that covers what we need already.
From the users' perspective, only LIST is exposed. We support defining fixed-sized and variable-sized LIST in the front-end. Internally, we have two separate physical types to hand this, which are FIXED_LIST and VAR_LIST.
We can discuss the changes in more details.
Currently, what's missing from this pr are two things:
- we need to check if the defined LIST size is larger than a default page size. In that case, we don't support it for now.
- we need to disable all functions over fixed sized LIST. maybe in the binder or somewhere else, @andyfengHKU can help with this.
@@ -320,6 +320,62 @@ std::unique_ptr<Value> CopyStructuresArrow::getArrowList(std::string& l, int64_t | |||
DataType(LIST, std::make_unique<DataType>(childDataType)), std::move(values)); | |||
} | |||
|
|||
std::unique_ptr<uint8_t[]> CopyStructuresArrow::getArrowTensor(std::string& l, int64_t from, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not reuse getArrowList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some differences between fixed/var list for now:
- getArrowVarList returns a vector of values, and then convert that vector to ku_list_t, and store the elements in the overflow.
- getArrowFixedList should return a pointer to a block of memory which stores the elements of the list. We should copy that block of memory to inMemColumn directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Let's try to reuse the part of parsing lists.
@@ -320,6 +320,62 @@ std::unique_ptr<Value> CopyStructuresArrow::getArrowList(std::string& l, int64_t | |||
DataType(LIST, std::make_unique<DataType>(childDataType)), std::move(values)); | |||
} | |||
|
|||
std::unique_ptr<uint8_t[]> CopyStructuresArrow::getArrowTensor(std::string& l, int64_t from, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Let's try to reuse the part of parsing lists.
FIXED-LIST DataType
1. Grammar Rule:
We can define a colum with FIXED-LIST type using the following grammar:
E.g. INT64[3] => This means the list size is 3 and type is INT64.
2. Note:
a. The number of elements in the FIXED-LIST must be a positive number.
b. Supported element types: INT64, FLOAT (numeric types)