Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject delete model request if model is in Training #424

Conversation

naveentatikonda
Copy link
Member

Description

This PR includes the following changes:

  • As we cannot safely cancel a job(when delete model request is sent) in which the model is training. We are failing that delete model request until the model completes training.
  • When we receive a new Train model request, we are validating if the same modelId is already used by checking the model metadata. But, there is an edge case while performing a delete model request, the model is removed from system index after clearing the model metadata and there is a possibility that the train model request validation succeeds in this short duration between the above mentioned steps. To fix this, we are adding a locking mechanism to add that modelId into a blocked list (which is stored in cluster metadata) before deleting the model metadata. We will remove it from blocked list after deleting the model or even when an exception occurs during the process of delete model request.
  • We are adding a new validation in Train Model Request to validate against this blocked list.

Issues Resolved

#365

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
…tion

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
@naveentatikonda naveentatikonda requested a review from a team June 17, 2022 01:16
@naveentatikonda naveentatikonda self-assigned this Jun 17, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2022

Codecov Report

Merging #424 (4633db8) into main (714ddf3) will increase coverage by 0.29%.
The diff coverage is 89.39%.

@@             Coverage Diff              @@
##               main     #424      +/-   ##
============================================
+ Coverage     84.02%   84.31%   +0.29%     
- Complexity      911      945      +34     
============================================
  Files           130      134       +4     
  Lines          3880     4062     +182     
  Branches        359      370      +11     
============================================
+ Hits           3260     3425     +165     
- Misses          458      473      +15     
- Partials        162      164       +2     
Impacted Files Coverage Δ
...main/java/org/opensearch/knn/indices/ModelDao.java 81.94% <78.26%> (-1.71%) ⬇️
...ava/org/opensearch/knn/indices/ModelGraveyard.java 93.33% <93.33%> (ø)
...n/transport/UpdateBlockedModelTransportAction.java 93.93% <93.93%> (ø)
...main/java/org/opensearch/knn/plugin/KNNPlugin.java 100.00% <100.00%> (ø)
.../knn/plugin/transport/GetModelTransportAction.java 100.00% <100.00%> (+20.00%) ⬆️
...rch/knn/plugin/transport/TrainingModelRequest.java 84.53% <100.00%> (+0.84%) ⬆️
...knn/plugin/transport/UpdateBlockedModelAction.java 100.00% <100.00%> (ø)
...nn/plugin/transport/UpdateBlockedModelRequest.java 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 714ddf3...4633db8. Read the comment docs.

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few general comments:

  • did you check why test coverage went down? seems your changes are testable, maybe spend some time on adding/extending tests to at least keep it at the same level.
  • I suggest make smaller PR in future, it makes review easier/quicker and with higher quality, for example here you could brake something like this: change the model in one PR and API changes in another.

@naveentatikonda naveentatikonda force-pushed the reject_delete_model_request branch 2 times, most recently from ce0a791 to 53d1a95 Compare June 20, 2022 23:54
@naveentatikonda naveentatikonda force-pushed the reject_delete_model_request branch from 53d1a95 to 55e2496 Compare June 21, 2022 17:32
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/indices/ModelDao.java Outdated Show resolved Hide resolved
@naveentatikonda naveentatikonda force-pushed the reject_delete_model_request branch from 55e2496 to 84c1402 Compare June 27, 2022 20:59
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
@naveentatikonda naveentatikonda force-pushed the reject_delete_model_request branch from 84c1402 to e2061d6 Compare June 27, 2022 22:15
@naveentatikonda naveentatikonda force-pushed the reject_delete_model_request branch 2 times, most recently from a5e5056 to de8a5bb Compare July 12, 2022 23:00
@naveentatikonda naveentatikonda requested a review from VijayanB July 12, 2022 23:01
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
@naveentatikonda naveentatikonda force-pushed the reject_delete_model_request branch from de8a5bb to a03f0a3 Compare July 14, 2022 04:20
jmazanec15
jmazanec15 previously approved these changes Jul 14, 2022
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @naveentatikonda great work!

@jmazanec15 jmazanec15 added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch backport 2.x labels Jul 14, 2022
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, great work Naveen

@naveentatikonda naveentatikonda merged commit 46f85a0 into opensearch-project:main Jul 14, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 14, 2022
* Reject Delete Model Request if Model is in Training

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Locking mechanism to block modelId if model is in the process of Deletion

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Blocked modelIds Validation for new Train Model Request

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* spotless fix and other changes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Bug fix for copying ModelGraveyard reference

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Integration Tests for ModelGraveyardDiff

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Refactoring and Addressing other review comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Remove Model from Model Graveyard even if model does not exist

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit 46f85a0)
naveentatikonda added a commit that referenced this pull request Jul 14, 2022
* Reject Delete Model Request if Model is in Training

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Locking mechanism to block modelId if model is in the process of Deletion

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Blocked modelIds Validation for new Train Model Request

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* spotless fix and other changes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Bug fix for copying ModelGraveyard reference

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Integration Tests for ModelGraveyardDiff

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Refactoring and Addressing other review comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Remove Model from Model Graveyard even if model does not exist

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit 46f85a0)
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
naveentatikonda added a commit that referenced this pull request Jul 15, 2022
* Reject Delete Model Request if Model is in Training

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Locking mechanism to block modelId if model is in the process of Deletion

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Blocked modelIds Validation for new Train Model Request

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* spotless fix and other changes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Bug fix for copying ModelGraveyard reference

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Integration Tests for ModelGraveyardDiff

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Refactoring and Addressing other review comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Remove Model from Model Graveyard even if model does not exist

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit 46f85a0)
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

Co-authored-by: Naveen Tatikonda <navtat@amazon.com>
@heemin32 heemin32 added v2.2.0 and removed 2.2.0 labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants