Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Support Struct data type #346

Open
jtaylor-sfdc opened this issue Jul 27, 2013 · 6 comments
Open

Support Struct data type #346

jtaylor-sfdc opened this issue Jul 27, 2013 · 6 comments

Comments

@jtaylor-sfdc
Copy link
Contributor

Although supporting an ARRAY type helps, there are some use cases in which data is not homogenous. We should support a Struct data type for these cases.

@anoopsjohn
Copy link
Contributor

Thanks James for filing this.

@ghost ghost assigned anoopsjohn Jul 30, 2013
@elilevine
Copy link
Contributor

+1
This would extend Phoenix's support for semi-structured data.

@nmaillard
Copy link
Contributor

+1
structs would be great
how would this work with #19 and #239

@apurtell
Copy link
Contributor

how would this work with #19 and #239

I have the same question.

To what extent could this be similar to (or borrow from?) kiji-schema?

@jtaylor-sfdc
Copy link
Contributor Author

This one is a dup of #239. I think it'd be similar in concept kiji-schema, as it would define the structure of a single KeyValue column in your schema (i.e. an instantiation of the struct would be stored in a single KeyValue), but there's could be other sibling KeyValue columns that aren't structs.

I think the schema of the struct would be defined in the Phoenix metadata table (SYSTEM.TABLE) using a new struct type to differentiate it. We'd need to allow references in queries using a dotted notation. At upsert/insert time, you'd need to provide the struct in it's entirety.

Other than using less space, since you don't have the overhead of an entire KeyValue with each value. I'm not convinced this adds a whole lot of value. You can essentially model the same thing with multiple columns. I'd rather see HBase come up with better/more condensed block encodings and have a condensed memory model that can better leverage these encodings.

As far as #19, that one is different. It's for cases where you'd want to have very wide rows in which value information is encoded in the column qualifier. In this case, you'd define these set of columns as a "nested table" which you could join against the row that contains them. So a set of column qualifiers would look like another row to Phoenix.

@jtaylor-sfdc
Copy link
Contributor Author

We should investigate using Parquet as our underlying storage format for these structs (and potentially for JSON as well, #497)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants