Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartSim Multidb example #409

Merged
merged 23 commits into from
Dec 12, 2023
Merged

SmartSim Multidb example #409

merged 23 commits into from
Dec 12, 2023

Conversation

amandarichardsonn
Copy link
Contributor

@amandarichardsonn amandarichardsonn commented Nov 2, 2023

This PR merges the SmartSim mutlidb example into the craylab docs. It is step by step instructions on how to setup multiple orchestrators, connect to the orchs using smartredis clients and send, retrieve and manipulate data using the launched dbs and the ConfigOptions object. The manipulation of tensors is done with a torch function. Lastly, the source code is provided at the end of the example.

Copy link

codecov bot commented Nov 2, 2023

Codecov Report

Merging #409 (a8cf2a9) into develop (9ecfbed) will decrease coverage by 0.06%.
Report is 6 commits behind head on develop.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #409      +/-   ##
===========================================
- Coverage    89.65%   89.60%   -0.06%     
===========================================
  Files           59       59              
  Lines         3617     3617              
===========================================
- Hits          3243     3241       -2     
- Misses         374      376       +2     

see 2 files with indirect coverage changes

@al-rigazzi al-rigazzi self-requested a review November 2, 2023 15:37
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a very good start. Some improvements are left inline, but otherwise thanks for starting to tackle this! Also, can you rename your PR?

doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@billschereriii billschereriii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor changes and I think this will be good to go!

Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, some small change to request. I also agree with Bill's comments so be sure to address both of ours

doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@juliaputko juliaputko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to message me to clarify anything!

doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved

.. code-block:: bash

Model: colo logger@00-00-00:The colocated db has tensor_1: True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you maybe print the tensors like you did above? Or just be consistent with how you should the results, so is it clear the same thing is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like how it's polling for the tensors. I think its kinda the same a little? Checks if the tensor are in the database, and it shows another client.function(). I think we could really do either but I like your idea of remaining consistent throughout the example hmmmmm, let me add a description on what poll_tensor does!

experiment printed before it is launched. The summary
will stay for 10 seconds, and it is useful as a last
check. If we set `summary=False`, then the experiment
would be launched immediately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The summary will show each instance that has been launched and completed in this Experiment" - from the SS API. A little misleading to say it will show a summary before it is launched? Is it true that it only appears for 10 seconds and then disappears?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It does not disappear tho, it just waits to push output, so you basically have 10 seconds to scan the summary before it continue in the experiment, Ill change this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok changed, let me know if its more clear!

doc/orchestrator.rst Outdated Show resolved Hide resolved
exp.stop(single_shard_db, multi_shard_db)
logger.info(exp.summary())

Application Script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a good idea to add the name of the scripts that contain this information that you are using in your examples above (eg. exe_args="/lus/scratch/richaama/model_ex.py") is the application script. Also probably a good idea to have step by step instructions that show where files should be located in relation to smartsim, the contents of those files, what should be run (eg. only running experiment script, which uses the application script).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a step by step on what is needed to run the example would be useful (allocation, any module load stuff, the structure of the file system, contents of files, what exactly to run)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this but let me know what u think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure how specific I can be on the setup instructions bc people might be installing in many dif ways, but hmmm @ashao and @juliaputko let me know if I can put submitting an allocation or anything else? It should be the same on all slurm machines right? But someone might want to change the launcher? so I could just say make sure you have allocated resources, lemme know what yall think

Copy link
Member

@ashao ashao Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say point them at the beginning of the section, you could just say something along the lines of "This section assumes that you have already installed and built SmartSim and SmartRedis. Please refer to Section "Installing SmartSim" for further details. For simplicity, this example assumes that you are running on a SLURM-based HPC-platform. Refer to "Running this example" for more details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dern!

logger = get_logger("Multidb Experiment Log")
exp = Experiment("getting-started-multidb", launcher="auto")

single_shard_db = exp.create_database(port=6379, db_nodes=1, interface="ib0", db_identifier="single_shard_db_identifier")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note on which interface to use depending on how you are running the example? I had to change mine to run it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting! I believe smartsim is suppose to find the interface for you, hmmm let me ask someone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ok this is being changed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making a note that ill need to rerun example in future and make sure this does not happen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to omit the ib0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omitted

doc/orchestrator.rst Show resolved Hide resolved
visualization workflow connected to the simulation (requiring a standard orchestrator).

Below is a simple example demonstrating the process of:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have the description of these steps directly below the list of the steps. I was a bit confused when the first heading was "Application script" when I was expecting an explanation od the launching of two standard Orchestrators, which I think you have lower down. Might be good just for easily finding the information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so I changed this, let me know if this its more clear, good find btw I can very much see where ur coming from

|----|------------------------------|---------------|-------------|---------|---------|-----------|--------------|
| 1 | colo_model | Model | 1538905.8 | 0 | 3.5465 | Completed | 0 |
| 2 | single_shard_db_identifier_0 | DBNode | 1538905.5 | 0 | 73.1798 | Cancelled | 0 |
| 3 | multi-shard-db-identifier | DBNode | 1538905.6+2 | 0 | 49.8503 | Cancelled | 0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it true that these are inconsistent with whether they have an _0 appended or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg good catch, this is true, I pasted this from the terminal, I think I might know why this is happening tho

@@ -497,12 +507,12 @@ to the ``create_model()`` function and assign to the variable ``model``.

Step 2: Colocate
""""""""""""""""
To colocate the model, use the ``Model.colocate_db_tcp()`` function to
To colocate the model, use the ``Model.colocate_db_uds()`` function to
Colocate an Orchestrator instance with this Model over TCP/IP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDS isn't a TCP/IP connection, but a Unix domain socket connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that I need to watch a vid on this

Copy link
Contributor

@billschereriii billschereriii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work on this! A couple minor typos, but nothing that will require another round of review from me for. Please get signoff from Andrew and Julia before merging.

Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor comments, but it's getting very close!

doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for all the hard work on this

doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
@mellis13 mellis13 added area: orchestrator Issues related to the Ochestrator API, launch, and runtime area: docs Issues related to documentation labels Dec 12, 2023
@amandarichardsonn amandarichardsonn merged commit 37fb2bb into develop Dec 12, 2023
26 checks passed
@amandarichardsonn amandarichardsonn deleted the orch-multi-ex-docs branch December 12, 2023 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: docs Issues related to documentation area: orchestrator Issues related to the Ochestrator API, launch, and runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants