-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inf2 example #2399
Inf2 example #2399
Conversation
* fix INF2 example handler * Add logging for padding in inf2 handler * update response timeout and model * Update documentation to show opt-6.7b as the example model * Update model batch log --------- Co-authored-by: Naman Nandan <namannan@amazon.com>
Codecov Report
@@ Coverage Diff @@
## master #2399 +/- ##
=======================================
Coverage 72.01% 72.01%
=======================================
Files 78 78
Lines 3648 3648
Branches 58 58
=======================================
Hits 2627 2627
Misses 1017 1017
Partials 4 4 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @namannandan LGTM
model_name = ctx.model_yaml_config["handler"]["model_name"] | ||
|
||
# allocate "tp_degree" number of neuron cores to the worker process | ||
os.environ["NEURON_RT_NUM_CORES"] = str(tp_degree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you make sure neuron has enough number of cores to support tp_degree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Here, if the required number of neuron cores, i.e torch-neuronx
currently does not have an API that provides the number of available(unallocated) neuron cores.tp_degree
are not available then the model loading will fail with error of the form:
ERROR TDRV:db_vtpb_get_mla_and_tpb Could not find VNC id 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that torch-neuronx
does have a method to query the number of available unallocated cores using torch_neuronx.xla_impl.data_parallel.device_count()
. Updated the handler to verify that the necessary number of cores are available before proceeding with model loading
ecc5e02
to
50668c5
Compare
Successfully tested the example:
|
Description
Inferentia2 example based on
opt-6.7b
modelType of change
Feature/Issue validation/testing
125m
parameter variant of theopt
model6.7b
parameter variant ofopt
model