Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH Issue Summarization] Create a model server #11

Closed
texasmichelle opened this issue Feb 21, 2018 · 8 comments
Closed

[GH Issue Summarization] Create a model server #11

texasmichelle opened this issue Feb 21, 2018 · 8 comments
Assignees

Comments

@texasmichelle
Copy link
Member

texasmichelle commented Feb 21, 2018

Create a model server using TFServing.

Component of #14.

@jlewi
Copy link
Contributor

jlewi commented Mar 6, 2018

/assign @ankushagarwal

Ankush can you describe the problems you were running into turning the model into a model that can be served with TF serving?

Would it be easier to serve the model using Seldon?

@ankushagarwal
Copy link

ankushagarwal commented Mar 6, 2018

The model used for issue summarization is very different from the examples that we've been using. For our image models, the model prediction looks something like this: output = model(input)

But for the issue summarization model, it looks something like this

output = '<START>'

intermediate_result = encoder_model(input)

while True:
  intermediate_result, next_char = decoder_model(intermediate_result, output)
  if next_char == '<STOP>':
    return output
  output += next_char

The first issue that I had was - exporting Keras models as Tensorflow models which can be used by TFServing - this is mostly done.

The second challenge that I have is understanding how TFServing works with

  1. multiple models (encoder_model and decoder_model)
  2. models with multiple inputs and multiple outputs (decoder_model)

Would it be easier to serve the model using Seldon?
I am not familiar enough with Seldon...

@ankushagarwal
Copy link

I am having issues with the model exported from keras imported into tfserving.

I get this error when I send a Prediction request to the TFServing server

AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Expected multiples argument to be a vector of length 3 but got length 2
[[Node: Encoder-Last-GRU_1/Tile = Tile[T=DT_FLOAT, Tmultiples=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Encoder-Last-GRU_1/ExpandDims, Encoder-Last-GRU_1/Tile/multiples)]]")

Could not find a workaround for this. Will give seldon or tornado a shot to serve this Keras model.

We can probably illustrate serving a model with TFServing in another example which trains a tensorflow model directly.

jlewi added a commit that referenced this issue Mar 8, 2018
Create a simple tornado server to serve the model

TODO: Create a docker image for the server and deploy on kubeflow

Related to #11
@jlewi
Copy link
Contributor

jlewi commented Mar 8, 2018

@cliveseldon @gsunner Do you think we should try to use Seldon here?

Could we use the existing Seldon model server rather than creating our own Tornado stub?

Should we deploy the model using Seldon Core rather than deploying it directly with K8s resources?

@ukclivecox
Copy link
Contributor

@jlewi For a sklearn model, seldon-core would seem to be a good choice.

@ankushagarwal You specify a seq-to-seq model, but does the external business app send the whole sequence of characters in a single request to get a sequence back? if so then that should fit fine into the seldon-core prediction payload using NDArray. Your prediction component would need to split the request and then do as you specify in pseudo-code above.

Suggest you look at https://github.com/kubeflow/example-seldon which contains a sklearn model in the example code.

@jlewi Not sure I follow your last two questions. It would seem preferable to use the most appropriate serving solution TfServing or seldon-core rather than starting to build a new serving solution.

@ankushagarwal
Copy link

Hi @cliveseldon I have followed the instructions at https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python.md and wrapped my model into a docker image. I am able to run the image locally and it is serving a REST API server at port 5000.

My question is what is the API to send a prediction request to the server. I could not find docs on that.

@ukclivecox
Copy link
Contributor

Hi @ankushagarwal

  • See here for definition. You can send a Tensor or NDArray or custom string or binary. NDArray would seem to make sense for your case.
  • For example see some of the notebooks, for example in the kubeflow-seldon example so you send something like
payload = {"data":{"ndarray":["the","cat,"sat","on","the", "mat"]}}

k8s-ci-robot pushed a commit that referenced this issue Mar 9, 2018
… the model (#36)

* Create a end-to-end kubeflow example using seq2seq model (4/n)

* Move from a custom tornado server to a seldon-core model

Related to #11

* Update to use gcr.io registry for serving image
k8s-ci-robot pushed a commit that referenced this issue Mar 15, 2018
Update the issue summarization end to end tutorial
to deploy the seldon core model to the k8s cluster

Update the sample request and response

Related to #11
@ankushagarwal
Copy link

Closing since we have a seldon model server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants