-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TritonService and workflows #32576
Conversation
…p dir, model handling, dry run
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32576/20562
|
A new Pull Request was created by @kpedro88 (Kevin Pedro) for master. It involves the following packages: Configuration/ProcessModifiers @jordan-martins, @chayanit, @wajidalikhan, @kpedro88, @cmsbuild, @makortel, @franzoni, @silviodonato, @fwyzard, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b0561a/11844/summary.html Comparison SummarySummary:
|
def _addTritonService(process): | ||
process.load("HeterogeneousCore.SonicTriton.TritonService_cff") | ||
from Configuration.ProcessModifiers.enableSonicTriton_cff import enableSonicTriton | ||
modifyConfigurationStandardSequencesServicesAddTritonService_ = enableSonicTriton.makeProcessModifier(_addTritonService) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle loading of TrigonService_cff
could be done in the customize function, but to me doing it here with a Modifier does not make a big difference. (and now it demonstrates how "adding more Services like these" would look like)
(this is my last comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep it here if that's acceptable.
@makortel any further comments? |
+heterogeneous |
@jordan-martins, @chayanit, @wajidalikhan, @srimanob, @silviodonato, @qliphy please take a look |
@cms-sw/pdmv-l2 @cms-sw/upgrade-l2 the only relevant changes for your signing categories are the addition of a placeholder special workflow. Please review and sign ASAP (the PR has already been open for more than a month). |
+1 |
+Upgrade |
@silviodonato @qliphy @smuzaffar this should be merged along with cms-data/HeterogeneousCore-SonicTriton#1 (and any associated cmsdist update) |
cms-data/HeterogeneousCore-SonicTriton#1 has been merged and should be available in next IB. |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged. |
PR description:
This PR introduces a new
edm::Service
,TritonService
, as a central point for the management of Triton inference servers. The service does the following:TritonService
parameter set includes a list of known servers. For each server, a name, address, and port should be provided.SonicTriton
modules, the service is informed of the model requested by each module. Any requested model not provided by a known server is kept in a separate list.beginJob
, if the "fallback" option is enabled and the list of unserved models is non-empty, a local server is launched using Singularity. This fallback server can use a local GPU if one is available (currently detected vianvidia-smi
); otherwise, it uses the local CPU.TritonClient
construction is moved to occur duringbeginJob
; at this time, the service provides a server address that serves the model for the client. (The client can request a specific server by name.) If the CPU fallback server is being used, the client is forced intoSync
mode to prevent contention.This behavior is now automatically tested in a unit test. The actual unit test only runs if the necessary ingredients (AVX instructions, model file, and Singularity runtime) are available; otherwise it trivially returns true. The "graph" test modules are chosen for the unit test because they a) use more features (multiple named inputs, variable dimensions) and b) use a smaller model file.
The creation of a
cms-data
area forHeterogeneousCore-SonicTriton
is requested, in order to host the model files necessary for the unit test. (Once this is provided, those model files will be removed from thefetch_model.sh
script.)Other miscellaneous changes in this PR:
ProcessModifier
s forSonicTriton
are introduced. This workflow will enable more testing ofSonicTriton
modules.enableSonicTriton
adds theTritonService
to the process and enables the local fallback server.allSonicTriton
is intended to aggregate any other modifiers needed to enable SonicTriton modules.SonicTriton
module base classes, following the types defined inSonicCore
:TritonEDProducer
,TritonEDFilter
,TritonOneEDAnalyzer
. These types are all tested in the new unit test.TritonGraphProducer
is refactored to create instances of all module types with minimal duplication, with the amount of artificially-generated input data reduced for the case of the unit test.GlobalCache
withSonicTriton
modules, which are discussed in the documentation.triton
to start and stop the local servers is moved toHeterogeneousCore/SonicTriton/scripts
, in order to make it available in the path. A number of new options are added and described in the documentation, notably:The
TritonClient
API is modified:batchSize
,address
,port
modelConfigPath
,preferredServer
Caveats and future work:
setuid
should be disabled and unprivileged user namespaces should be enabled (as far as I understand, this is the current recommended best practice for CMS sites)TritonService
could be more efficiently implemented usingboost::bimap
. However, the necessary libraries to use some of the support classes (such asboost::bimaps::unordered_multiset_of
) are not distributed with CMSSW. If theboost
external is updated, this option could be pursued again in a followup PR.triton
shell script could be reimplemented in Python to be more maintainable and robust. However, Python's built-in shell interaction capabilities are limited and cumbersome. As a future development item, thesh
Python library could be added as a CMS external and then used in this script.The noted development items above are intended for future PRs because they don't impact functionality or APIs, and therefore further delaying this PR (on which several other PRs and working branches will need to be rebased) is not warranted.
The initial presentation of this idea in the core software meeting can be found here: https://indico.cern.ch/event/983377/#16-design-of-the-tritonservice.
PR validation:
Ran the unit test successfully (many times), using both CPU and GPU for the fallback server.
For reference, Singularity-within-Singularity instructions (tested on cmslpc at FNAL and lxplus at CERN) are provided.
Setup:
Running test in Singularity†‡:
† The fake eos directory is needed because
cmssw-cc7
binds/eos
and the Triton server'snvidia_entrypoint.sh
does afind /
, which can take forever especially on CERN EOS.‡ In the course of this PR, the
cmssw-cc7
Singularity container was updated to include the Singularity runtime natively. For future reference, if it did not contain the Singularity runtime, the following bind arguments should be added to the invocation, in order to include the runtime + dependencies from the host: