Skip to content

Commit

Permalink
Update doc strings (#290)
Browse files Browse the repository at this point in the history
Update doc strings to correct "colocated".

[ committed by @mellis13 ]
[ reviewed by @MattToast ]
  • Loading branch information
mellis13 authored May 30, 2023
1 parent 8a5e940 commit 99958a0
Show file tree
Hide file tree
Showing 8 changed files with 44 additions and 26 deletions.
4 changes: 4 additions & 0 deletions doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Description

A full list of changes and detailed notes can be found below:

- Update full test suite to not require a TF wheel at test time
- Update doc strings
- Remove deprecated code
- Relax the coloredlogs version
- Update Fortran tutorials for SmartRedis
Expand All @@ -34,6 +36,7 @@ Detailed notes

- Update full test suite to no longer require a tensorflow wheel to be available
at test time. (PR291_)
- Correct spelling of colocated in doc strings (PR290_)
- Deprecated launcher-specific orchestrators, constants, and ML utilities
were removed. (PR289_)
- Relax the coloredlogs version to be greater than 10.0 (PR288_)
Expand All @@ -45,6 +48,7 @@ codes. These have now all been updated. (PR284_)
argument name is still `interface` for backward compatibility reasons. (PR281_)

.. _PR291: https://github.com/CrayLabs/SmartSim/pull/291
.. _PR290: https://github.com/CrayLabs/SmartSim/pull/290
.. _PR289: https://github.com/CrayLabs/SmartSim/pull/289
.. _PR288: https://github.com/CrayLabs/SmartSim/pull/288
.. _PR285: https://github.com/CrayLabs/SmartSim/pull/285
Expand Down
8 changes: 4 additions & 4 deletions doc/orchestrator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ The cluster deployment is optimal for high data throughput scenarios such as
online analysis, training and processing.


Co-located Orchestrator
Colocated Orchestrator
=======================

A co-located Orchestrator is a special type of Orchestrator that is deployed on
A colocated Orchestrator is a special type of Orchestrator that is deployed on
the same compute hosts an a ``Model`` instance defined by the user. In this
deployment, the database is *not* connected together in a cluster and each
shard of the database is addressed individually by the processes running
Expand All @@ -72,7 +72,7 @@ process and the ``Orchestrator`` is deployed locally on each compute host where
the distributed application is running.


To create a co-located model, first, create a ``Model`` instance and then call
To create a colocated model, first, create a ``Model`` instance and then call
the ``Model.colocate_db_tcp`` or ``Model.colocate_db_uds`` function.

.. currentmodule:: smartsim.entity.model
Expand All @@ -83,7 +83,7 @@ the ``Model.colocate_db_tcp`` or ``Model.colocate_db_uds`` function.
.. automethod:: Model.colocate_db_uds
:noindex:

Here is an example of creating a simple model that is co-located with an
Here is an example of creating a simple model that is colocated with an
``Orchestrator`` deployment

.. code-block:: python
Expand Down
14 changes: 7 additions & 7 deletions smartsim/_core/entrypoints/colocated.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ def main(
except Exception as e:
cleanup()
logger.error(f"Failed to start database process: {str(e)}")
raise SSInternalError("Co-located process failed to start") from e
raise SSInternalError("Colocated process failed to start") from e

try:
if sys.platform != "darwin":
Expand All @@ -206,7 +206,7 @@ def main(
cpus_to_use = "CPU pinning disabled on MacOS"

logger.debug(
"\n\nCo-located database information\n"
"\n\nColocated database information\n"
+ "\n".join(
(
f"\tIP Address(es): {' '.join(ip_addresses + [lo_address])}",
Expand Down Expand Up @@ -242,15 +242,15 @@ def main(

except Exception as e:
cleanup()
logger.error(f"Co-located database process failed: {str(e)}")
raise SSInternalError("Co-located entrypoint raised an error") from e
logger.error(f"Colocated database process failed: {str(e)}")
raise SSInternalError("Colocated entrypoint raised an error") from e


def cleanup():
global DBPID
global LOCK
try:
logger.debug("Cleaning up co-located database")
logger.debug("Cleaning up colocated database")
# attempt to stop the database process
db_proc = psutil.Process(DBPID)
db_proc.terminate()
Expand All @@ -259,7 +259,7 @@ def cleanup():
logger.warning("Couldn't find database process to kill.")

except OSError as e:
logger.warning(f"Failed to clean up co-located database gracefully: {str(e)}")
logger.warning(f"Failed to clean up colocated database gracefully: {str(e)}")
finally:
if LOCK.is_locked:
LOCK.release()
Expand Down Expand Up @@ -303,7 +303,7 @@ def cleanup():

LOCK = filelock.FileLock(tmp_lockfile)
LOCK.acquire(timeout=0.1)
logger.debug(f"Starting co-located database on host: {socket.gethostname()}")
logger.debug(f"Starting colocated database on host: {socket.gethostname()}")

os.environ["PYTHONUNBUFFERED"] = "1"

Expand Down
8 changes: 4 additions & 4 deletions smartsim/entity/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def colocate_db_uds(
:type db_cpus: int, optional
:param limit_app_cpus: whether to limit the number of cpus used by the app, defaults to True
:type limit_app_cpus: bool, optional
:param debug: launch Model with extra debug information about the co-located db
:param debug: launch Model with extra debug information about the colocated db
:type debug: bool, optional
:param kwargs: additional keyword arguments to pass to the orchestrator database
:type kwargs: dict, optional
Expand Down Expand Up @@ -243,7 +243,7 @@ def colocate_db_tcp(
:type db_cpus: int, optional
:param limit_app_cpus: whether to limit the number of cpus used by the app, defaults to True
:type limit_app_cpus: bool, optional
:param debug: launch Model with extra debug information about the co-located db
:param debug: launch Model with extra debug information about the colocated db
:type debug: bool, optional
:param kwargs: additional keyword arguments to pass to the orchestrator database
:type kwargs: dict, optional
Expand All @@ -261,12 +261,12 @@ def colocate_db_tcp(
def _set_colocated_db_settings(self, connection_options, common_options, **kwargs):
"""
Ingest the connection-specific options (UDS/TCP) and set the final settings
for the co-located database
for the colocated database
"""

if hasattr(self.run_settings, "mpmd") and len(self.run_settings.mpmd) > 0:
raise SSUnsupportedError(
"Models co-located with databases cannot be run as a mpmd workload"
"Models colocated with databases cannot be run as a mpmd workload"
)

if hasattr(self.run_settings, "_prep_colocated_db"):
Expand Down
4 changes: 2 additions & 2 deletions smartsim/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,14 +487,14 @@ def create_model(
model.attach_generator_files(to_configure="./train.cfg")
exp.generate(model)
New in 0.4.0, ``Model`` instances can be co-located with an
New in 0.4.0, ``Model`` instances can be colocated with an
Orchestrator database shard through ``Model.colocate_db``. This
will launch a single ``Orchestrator`` instance on each compute
host used by the (possibly distributed) application. This is
useful for performant online inference or processing
at runtime.
New in 0.4.2, ``Model`` instances can now be co-located with
New in 0.4.2, ``Model`` instances can now be colocated with
an Orchestrator database over either TCP or UDS using the
``Model.colocate_db_tcp`` or ``Model.colocate_db_uds`` method
respectively. The original ``Model.colocate_db`` method is now
Expand Down
2 changes: 1 addition & 1 deletion tests/test_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def test_catch_colo_mpmd_model():

model = exp.create_model("bad_colo_model", rs)

# make it co-located which should raise and error
# make it colocated which should raise and error
with pytest.raises(SSUnsupportedError):
model.colocate_db()

Expand Down
28 changes: 21 additions & 7 deletions tutorials/ml_inference/Inference-in-SmartSim.ipynb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "487eb264-3c79-434f-842f-a11a8601ae7b",
"metadata": {},
Expand All @@ -13,6 +14,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f3604189-d438-4702-9aba-89161ebc4554",
"metadata": {},
Expand Down Expand Up @@ -47,6 +49,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e4b21b41-bd2d-412f-b8a7-8011b154d23b",
"metadata": {},
Expand Down Expand Up @@ -121,6 +124,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7df6c3dd-6cc5-46e6-9c58-6ba2333d7045",
"metadata": {},
Expand All @@ -135,17 +139,17 @@
"Therefore, to perform inference, you must first create an Orchestrator database and\n",
"launch it. There are two methods to couple the database to your application in\n",
"order to add inference capability to your application.\n",
" - standard (not co-located)\n",
" - co-located\n",
" - standard (not colocated)\n",
" - colocated\n",
" \n",
"`standard` mode launches an optionally clustered (across many compute hosts) database instance\n",
"that can be treated as a single storage device for many clients (possibly the many ranks\n",
"of an MPI program) where there is a single address space for keys across all hosts.\n",
"\n",
"`co-located` mode launches a orchestrator instance on each compute host used by a,\n",
"`colocated` mode launches a orchestrator instance on each compute host used by a,\n",
"possibly distributed, application. each instance contains their own address space\n",
"for keys. In SmartSim, `Model` instances can be launched with a co-located orchetrator\n",
"through `Model.colocate_db_tcp` or `Model.colocate_db_udp`. Co-located `Model`s are used for\n",
"for keys. In SmartSim, `Model` instances can be launched with a colocated orchetrator\n",
"through `Model.colocate_db_tcp` or `Model.colocate_db_udp`. Colocated `Model`s are used for\n",
"highly scalable inference where global aggregations aren't necessary for inference.\n",
"\n",
"The code below launches the `Orchestrator` database using the `standard` deployment\n",
Expand Down Expand Up @@ -199,6 +203,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "58615a9e-bb53-4025-90de-c1eee4e315eb",
"metadata": {},
Expand Down Expand Up @@ -254,6 +259,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0ab54aba-f6d7-4ecb-907e-1efdba9657a9",
"metadata": {},
Expand Down Expand Up @@ -295,6 +301,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "209a8db0-249a-4345-b3de-9c309ae5ebe6",
"metadata": {},
Expand Down Expand Up @@ -382,6 +389,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "37c6f801-deb5-463c-bfc7-565a79b8bcfb",
"metadata": {},
Expand All @@ -393,6 +401,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "638cdd2a-c0ad-4a54-a7ce-9eef0096da30",
"metadata": {},
Expand Down Expand Up @@ -526,6 +535,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "48b1efb6-4f39-4ad6-8a27-58648ec66bde",
"metadata": {},
Expand Down Expand Up @@ -593,6 +603,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a23d248c-6280-44b0-88a7-2babbaed3f3f",
"metadata": {},
Expand Down Expand Up @@ -631,6 +642,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4bf422e5-bece-4402-a2ef-8bca1e009b5c",
"metadata": {},
Expand Down Expand Up @@ -698,6 +710,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7da6c43d-b1b2-46ae-89f8-cb7050d9592b",
"metadata": {},
Expand Down Expand Up @@ -811,13 +824,14 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5daa7402-f62b-4710-a269-e078b2ce08ac",
"metadata": {},
"source": [
"# Co-Located Deployment\n",
"# Colocated Deployment\n",
"\n",
"A co-located Orchestrator is a special type of Orchestrator that is deployed\n",
"A colocated Orchestrator is a special type of Orchestrator that is deployed\n",
"on the same compute hosts an a Model instance defined by the user. In this\n",
"deployment, the database is not connected together in a cluster and each shard\n",
"of the database is addressed individually by the processes running on that compute\n",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/ml_inference/colo-db-torch-example.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ def calc_svd(input_tensor):

# connect a client to the database
# no address required since this `Model` was launched through SmartSim
# Cluster=False since co-located databases are never clustered.
# Cluster=False since colocated databases are never clustered.
client = Client(cluster=False)

tensor = np.random.randint(0, 100, size=(5, 3, 2)).astype(np.float32)
Expand Down

0 comments on commit 99958a0

Please sign in to comment.