Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Provide multiple versions support in external HBase tables #459

Open
VladRodionov opened this issue Oct 4, 2013 · 3 comments
Open

Provide multiple versions support in external HBase tables #459

VladRodionov opened this issue Oct 4, 2013 · 3 comments

Comments

@VladRodionov
Copy link

This was from user group mailing list:

I have a table, which has MAX_VERSIONS > 1. Is it possible in Phoenix to get all the versions of a particular cell?
Example (HBase table):

rowkey:cf:col:ts1 -> value1
rowkey:cf:col:ts2 -> value2
rowkey:cf:col:ts3 -> value2

I want to get all values for: rowkey:cf:col?

Its a mapping:
rowkey -> ID
cf:col -> PROFILE

I want execute:
select PROFILE from TABLE where ID= x and get all 3 profiles

James response:

That'd be a good contribution. The simplest way I can see that done would be:

  • support a new MAX_VERSIONS connection property where you can specify how many version of a row you want to get back. In the PhoenixConnection constructor, you'd grab this in the same way we do for CURRENT_SCN and store it in a member variable. Then from BasicQueryPlan.newScanner, you'd set scan.setMaxVersions() just like we're setting scan.setTimeRange().
  • add a built-in function like ROW_TIME() that returns the DATE of the first KeyValue in the Tuple (see my blog here for an example on how to add a new built-in function). Slightly fancier would be ROW_TIME() that would return the DATE representing the timestamp of the KeyValue in Tuple representing the passed in.
@VladRodionov
Copy link
Author

Having thought a little bit, I am afraid that mapping of a versioned HBase table to Phoenix SQL relational model is not that straightforward, as since HTable is 3 - dimensional (ROWKEY, CF:COL, version) and relational table is 2D.

@jtaylor-sfdc
Copy link
Contributor

What? Don't like a challenge? :-)

We have some code lying around that will help by materializing a row for each unique timestamp. That makes it much more feasible.

@vijayarajanm
Copy link

This would be very much helpful to load multiple version of data in a cell and use the same for some of the aggregation function. May i know the status of this requirement ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants