Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[code_search] Unittest for similarity and loss computation using TF.Eager #258

Closed
jlewi opened this issue Sep 30, 2018 · 2 comments
Closed

Comments

@jlewi
Copy link
Contributor

jlewi commented Sep 30, 2018

Here:

string_embedding_norm = tf.nn.l2_normalize(string_embedding, axis=1)

We construct a simple network to compute the loss for the query and code embeddings.

We can pull that code out into a class method and then write a unittest for it using TF.eager

@jlewi jlewi added priority/p1 area/example/code_search The code search example labels Sep 30, 2018
@cwbeitel
Copy link
Contributor

cwbeitel commented Oct 3, 2018

jlewi pushed a commit to jlewi/examples that referenced this issue Nov 2, 2018
Fix Model export to support computing code embeddings: Fix kubeflow#260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on kubeflow#233 by @activatedgeek

Loss function improvements

* See kubeflow#259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in kubeflow#233
  and this pulls in some of those changes.

Add manual tests

* Related to kubeflow#258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.
jlewi added a commit to jlewi/examples that referenced this issue Nov 2, 2018
Fix Model export to support computing code embeddings: Fix kubeflow#260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on kubeflow#233 by @activatedgeek

Loss function improvements

* See kubeflow#259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in kubeflow#233
  and this pulls in some of those changes.

Add manual tests

* Related to kubeflow#258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.
k8s-ci-robot pushed a commit that referenced this issue Nov 2, 2018
* Fix model export, loss function, and add some manual tests.

Fix Model export to support computing code embeddings: Fix #260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on #233 by @activatedgeek

Loss function improvements

* See #259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in #233
  and this pulls in some of those changes.

Add manual tests

* Related to #258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.

* * Skip the test when running under prow because its a manual test.
* Fix some lint errors.

* * Fix lint and skip tests.

* Fix lint.

* * Fix lint
* Revert loss function changes; we can do that in a follow on PR.

* * Run generate_data as part of the test rather than reusing a cached
  vocab and processed input file.

* Modify SimilarityTransformer so we can overwrite the number of shards
  used easily to facilitate testing.

* Comment out py-test for now.
yixinshi pushed a commit to yixinshi/examples that referenced this issue Nov 30, 2018
* Fix model export, loss function, and add some manual tests.

Fix Model export to support computing code embeddings: Fix kubeflow#260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on kubeflow#233 by @activatedgeek

Loss function improvements

* See kubeflow#259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in kubeflow#233
  and this pulls in some of those changes.

Add manual tests

* Related to kubeflow#258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.

* * Skip the test when running under prow because its a manual test.
* Fix some lint errors.

* * Fix lint and skip tests.

* Fix lint.

* * Fix lint
* Revert loss function changes; we can do that in a follow on PR.

* * Run generate_data as part of the test rather than reusing a cached
  vocab and processed input file.

* Modify SimilarityTransformer so we can overwrite the number of shards
  used easily to facilitate testing.

* Comment out py-test for now.
Svendegroote91 pushed a commit to Svendegroote91/examples that referenced this issue Dec 6, 2018
* Fix model export, loss function, and add some manual tests.

Fix Model export to support computing code embeddings: Fix kubeflow#260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on kubeflow#233 by @activatedgeek

Loss function improvements

* See kubeflow#259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in kubeflow#233
  and this pulls in some of those changes.

Add manual tests

* Related to kubeflow#258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.

* * Skip the test when running under prow because its a manual test.
* Fix some lint errors.

* * Fix lint and skip tests.

* Fix lint.

* * Fix lint
* Revert loss function changes; we can do that in a follow on PR.

* * Run generate_data as part of the test rather than reusing a cached
  vocab and processed input file.

* Modify SimilarityTransformer so we can overwrite the number of shards
  used easily to facilitate testing.

* Comment out py-test for now.
Svendegroote91 pushed a commit to Svendegroote91/examples that referenced this issue Apr 1, 2019
* Fix model export, loss function, and add some manual tests.

Fix Model export to support computing code embeddings: Fix kubeflow#260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on kubeflow#233 by @activatedgeek

Loss function improvements

* See kubeflow#259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in kubeflow#233
  and this pulls in some of those changes.

Add manual tests

* Related to kubeflow#258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.

* * Skip the test when running under prow because its a manual test.
* Fix some lint errors.

* * Fix lint and skip tests.

* Fix lint.

* * Fix lint
* Revert loss function changes; we can do that in a follow on PR.

* * Run generate_data as part of the test rather than reusing a cached
  vocab and processed input file.

* Modify SimilarityTransformer so we can overwrite the number of shards
  used easily to facilitate testing.

* Comment out py-test for now.
@stale
Copy link

stale bot commented Jun 27, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot closed this as completed Jul 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants