Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out some integration with Cloud Bigtable? #707

Closed
jgeewax opened this issue Jul 7, 2015 · 29 comments
Closed

Figure out some integration with Cloud Bigtable? #707

jgeewax opened this issue Jul 7, 2015 · 29 comments
Assignees
Labels
api: bigtable Issues related to the Bigtable API.

Comments

@jgeewax
Copy link
Contributor

jgeewax commented Jul 7, 2015

@stephenplusplus : Any ideas ? Are there any popular existing HBase libraries for NodeJS?

https://www.npmjs.com/package/hbase-client (https://github.com/alibaba/node-hbase-client) seems to be the most popular.

https://www.npmjs.com/package/node-hbase (https://github.com/wdavidw/node-hbase) is also up to date, but not as popular on npm

@jgeewax jgeewax mentioned this issue Jul 15, 2015
@callmehiphop
Copy link
Contributor

Can we get some more clarification on how we communicate with the remote server? Most of the documentation I'm seeing talks about setting up a local server

@jgeewax
Copy link
Contributor Author

jgeewax commented Jul 15, 2015

/cc @dhermes @maxluebbe @carterpage

@lesv
Copy link

lesv commented Jul 15, 2015

Do you mean bigtable.googleapis.com? Or something else? I think all you should need is the server name and use the gRPC package. (HTTP/2) Am I misunderstanding the Q?

@callmehiphop
Copy link
Contributor

@lesv yeah, I'm referring to bigtable.googleapis.com, the node clients we're looking at have some default settings I'm not sure apply to us and I didn't see any documentation specific to communicating with the remote server outside of command line usage.

@lesv
Copy link

lesv commented Jul 15, 2015

Take a look at https://io2015codelabs.appspot.com/codelabs/gRPC it's a local critter, but I think it has how to query another server from it. If not, I can ask for you.

@callmehiphop
Copy link
Contributor

@lesv thanks, will do!

@lesv
Copy link

lesv commented Jul 15, 2015

@callmehiphop
Copy link
Contributor

@lesv that definitely helps, thanks for linking me to that.

@jgeewax I don't think either of the node libraries mentioned support gRPC, however, one of them linked to a coffeescript library that has protobuf support. Do you know if the .proto file we need to use already exists and if so where?

@jgeewax
Copy link
Contributor Author

jgeewax commented Jul 15, 2015

Those libraries definitely won't support gRPC -- we'd need to swap out the transport layer to be gRPC.

Protos are here: https://github.com/GoogleCloudPlatform/cloud-bigtable-client/tree/master/bigtable-protos (I think those are the right ones). You'll need protoc v3 :-/ ... aka, get ready to type make and such...

@dhermes
Copy link
Contributor

dhermes commented Jul 15, 2015

FYI

I've done a bunch of work around getting the gRPC C/C++ core installed as well as making the calls to BigTable. I'm happy to have a chat if needed.

@callmehiphop
Copy link
Contributor

@dhermes awesome, thanks! I may just take you up on that 😄

@callmehiphop
Copy link
Contributor

I'm running into an issue (could be a user error) where I try to load bigtable_service.proto but the grpc module does not know how to interpret the rpc options.

JS

var grpc = require('grpc');

var bigtable = grpc.load({
  root: __dirname,
  file: '/google/bigtable/v1/bigtable_service.proto'
});

Ouput

Error: Illegal option value in message undefined, option (google.api.http) at line 37: {
    at Error (native)
    at ProtoBuf.DotProto.ParserPrototype._parseOption 

If I remove the rpc options everything works properly.

I'm sort of at a loss on how to proceed -- I was able to build the proto files in python, etc. via protoc but there's not a JavaScript option, I'm assuming because the standard is to just pass the .proto files directly to the module.

@dhermes
Copy link
Contributor

dhermes commented Jul 16, 2015

@nathanielmanistaatgoogle @tbetbetbe Can you refer us to the folks working on the grpc Node library?

@stephenplusplus
Copy link
Contributor

Thanks for all of the help, everyone!

Looks like protobuf.js doesn't understand this syntax:

option (google.api.http) = { post: "/v1/{table_name=projects/*/zones/*/clusters/*/tables/*}/rows:read" body: "*" };

But it does understand this syntax:

option (google.api.http).post = "/v1/{table_name=projects/*/zones/*/clusters/*/tables/*}/rows:read";

I had to change that in the proto files just to get grpc not to crash. I haven't made it further than that, so I don't know if it's going to cause problems down the line. Put it up in a repo here: https://github.com/stephenplusplus/bigtable-grpc-playground so we can keep charging forward.

@jgeewax
Copy link
Contributor Author

jgeewax commented Jul 16, 2015 via email

@callmehiphop
Copy link
Contributor

Latest version of protobuf.js (grpc uses this under the hood) is supposed to support proto3
https://github.com/dcodeIO/ProtoBuf.js/wiki/Changes-in-ProtoBuf.js-4.0

@stephenplusplus
Copy link
Contributor

Grpc has a dependency on protobufjs 4.0 which is supposed to. @callmehiphop
can you link those issues you found that might relate?

On Thursday, July 16, 2015, JJ Geewax notifications@github.com wrote:

This might be an issue with proto2 v proto3...

Do we have Javascript proto3 ?


Reply to this email directly or view it on GitHub
#707 (comment)
.

@callmehiphop
Copy link
Contributor

@callmehiphop
Copy link
Contributor

@jgeewax going back to those HBase libraries, neither one really exposes an elegant way of swapping out transports. Is something you'd want to maintain a fork of or perhaps try to PR?

@nathanielmanistaatgoogle

@dhermes: @murgatroid99 is leading the Node effort with @tbetbetbe a good secondary contact.

@dhermes
Copy link
Contributor

dhermes commented Jul 17, 2015

Thanks. @callmehiphop you can probably report the issue with the parser to them.

@murgatroid99
Copy link

It looks like that issue with the parser was resolved, at least at the head of the repo. If you want to use that immediately, you have a couple of options: after installing the grpc npm package, you can cd into its directory and run npm install https://github.com/dcodeIO/ProtoBuf.js.git. Alternatively, if grpc is a dependency of a larger package, you can make that a dependency of your package (using dcodeIO/Protobuf.js as a dependency). Then you can load the proto file using that package's loadProtoFile function and then pass the result of that into grpc's loadObject function.

@stephenplusplus
Copy link
Contributor

Thanks for the insight, @murgatroid99!

Going back to the original topic of the issue (sorry to everyone who will now be pinged who might not be interested!), those HBase libraries linked don't seem to offer a great solution in terms of their APIs. Neither seem to be used enough to call them the standard integration with HBase from Node. Additionally, to integrate with our library, we would have to maintain a fork with our transport behind, and use that as gcloud-node's dependency.

I think a better solution is designing a gcloud-node-friendly API like @callmehiphop worked on (#722).

Also, very importantly, grpc has system dependencies which if not available, npm install will fail. (See https://github.com/grpc/homebrew-grpc/blob/master/scripts/install#L97). With this behavior, we will have to develop a separate module (i.e. gcloud-node-bigtable), as it's not expected to force these steps on an app that only needs the Storage API, for example.

@jgeewax
Copy link
Contributor Author

jgeewax commented Jul 27, 2015

npm install will fail.

There's no way to have an optional dependency ?

I think a better solution is designing a gcloud-node-friendly API

A lot of the potential users of this service have their own HBase deployment, and want to just "point their code at Cloud Bigtable"... Are we sure there's no good way to do this at the transport level?

As an FYI, Cloud Bigtable is API-level compatible with the HBase Java API (in your code, you can change an import statement and your existing code works), so I'm trying to get my head around how our situation differs.

@stephenplusplus
Copy link
Contributor

There's no way to have an optional dependency ?

We actually can (https://docs.npmjs.com/files/package.json#optionaldependencies), but it's not great. The console will log out a warning which makes the install look like it failed. We could go this route, though, and wait to see if anyone is bothered.

A lot of the potential users of this service have their own HBase deployment, and want to just "point their code at Cloud Bigtable"

As an FYI, Cloud Bigtable is API-level compatible with the HBase Java API (in your code, you can change an import statement and your existing code works), so I'm trying to get my head around how our situation differs.

I hope I'm not missing something terribly obvious here, but maybe they're not pointing to it from Node? Just using my own research as a base, it doesn't seem like Node + HBase is a common thing. As an example, the libraries you linked have 59 downloads per week combined. I think if it was common, frankly, we would see better options at the library level, in which case I would absolutely support the modularity of letting users continue to use their library without forcing changes to their code.

Are we sure there's no good way to do this at the transport level?

If we have to choose a library to fork, we're really looking at https://github.com/falsecz/hbase-rpc-client which supports HBase >= 0.9.6 and protocol buffers or https://github.com/wdavidw/node-hbase (both written in CoffeeScript).

I'm happy to hear other thoughts on the best approach. Do we know anyone who our solution would help who could provide some insight?

@jgeewax
Copy link
Contributor Author

jgeewax commented Jul 27, 2015

but it's not great

Understood -- separate artifact is fine given the overall goal of "awesome user experience".

it doesn't seem like Node + HBase is a common thing

I think we need to look at this sliiiightly differently. In our other APIs we're thinking about primarily number of users, and the amount of data per user is relatively small. For HBase it's the opposite: not a lot of people use it, but those that do use it heavily (several TB of data).

So from a business perspective, we're dealing with a handful of people who will have TB-PB of data, and maybe want to get some data out of it using Node.. And we want those people to switch to Cloud Bigtable so they don't have to manage their own HBase cluster, and they get better performance per dollar...

So to ease that transition, using an already existing API that we can either monkey-patch the connection, or match the API on purpose... would be nice.

Maybe this is really:

Make gcloud-node-bigtable a library and then have different APIs that match the existing libraries so they are drop-in (swap require() statement) replacements?

@stephenplusplus
Copy link
Contributor

Make gcloud-node-bigtable a library and then have different APIs that match the existing libraries so they are drop-in (swap require() statement) replacements?

I understand the desire to make conversion as easy as possible. I would agree "making Bigtable easy to use from Node" and "make it easy for existing Node+HBase users to switch to Cloud Bigtable" are two different goals with unfortunately different solutions.

The answer for the first problem in the current state of Node and HBase options is design our own API (#722).

The answer for the second is "do whatever it takes to make the ones out there work". If we want to focus on solving this problem, using monkey patches/forks in gcloud-node-bigtable sounds like a good option to me:

- var hbase = require('hbase');
+ var hbase = require('gcloud-bigtable/hbase')({ /* credentials */ });

If we can solve the first problem at the same time, though, maybe we can still develop #722 before/during/after the monkey patching:

var bigtable = require('gcloud-bigtable')({ /* credentials */ });
// console.log(Object.keys(bigtable)); => Dave's API.

@jgeewax
Copy link
Contributor Author

jgeewax commented Jul 27, 2015

I'm all for the idea of having our own, and then having adapters that match the API methods.

However, HBase has been around for a super long time, so if we're deviating from existing APIs, it makes me wonder why -- just because the underlying product (HBase / Bigtable) hasn't changed, so why would we need a new set of API methods...

Anyway -- fine with doing our own thing, and then having adaptors so that existing users of other libraries can just swap imports....

@stephenplusplus stephenplusplus added the api: bigtable Issues related to the Bigtable API. label Nov 30, 2015
@callmehiphop
Copy link
Contributor

It would appear that HBase is not very popular in the Node world, so we've decided that we'll roll our own API instead of leveraging existing libraries.

sofisl added a commit that referenced this issue Sep 13, 2023
* chore(deps): update dependency sinon to v15

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: sofisl <55454395+sofisl@users.noreply.github.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API.
Projects
None yet
Development

No branches or pull requests

7 participants