Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinforcement learning example with tensorflow/agents #1

Closed
wants to merge 27 commits into from
Closed

Reinforcement learning example with tensorflow/agents #1

wants to merge 27 commits into from

Conversation

cwbeitel
Copy link
Contributor

@cwbeitel cwbeitel commented Feb 2, 2018

Outstanding tasks for this PR:

  • Manually verify long run functions correctly i.e. run produces render that appears to perform the task well by being able to pick up blocks.
  • Verify configuration is correct for separate tensorboard deployment steps; launch tensorboard from notebook and see logs.
  • Check if previous error with successful jobs exiting with error code is still occurring and if so fix

This change is Reviewable

Status:
* short run functions without issue, see notebook, long run in progress
* Needs E2E test
* need GCP project permissions to modify service account roles or have someone do so to be able to test run tboard stuff
@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 4, 2018

Long training run with most recent Agents version (https://github.com/tensorflow/agents/tree/14669421f6a1699b9100cc42071a79c4dca2f1c3) produced low quality result evident in low (negative) mean_score and renders (below).

Reverting to last version (https://github.com/tensorflow/agents/tree/459c4f88ece996eac3489e6e97a6ee0b30bdd6b3) that was used to produce good results for this environment.

tboard

Tboard: http://127.0.0.1:8001/api/v1/proxy/namespaces/rl/services/tboard-0204-0911-f303-tb:80/#scalars&run=eval
Renders: https://console.cloud.google.com/storage/browser/kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98/render/?project=kubeflow-rl
Notebook: https://github.com/cwbeitel/examples/blob/b94b9a12597bda21c3f242c164b345772f6d0a13/apps/agents/demo.ipynb
Agents hparams:

{'algorithm': <class 'agents.algorithms.ppo.ppo.PPO'>,
 'debug': True,
 'discount': 0.995,
 'dump_dependency_versions': False,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': 'gs://kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7ff0334ae938>,
 'normalize_ranges': True,
 'num_agents': 30,
 'num_gpus': 0,
 'optimizer': <class 'tensorflow.python.training.adam.AdamOptimizer'>,
 'policy_layers': (200, 100),
 'render_out_dir': None,
 'render_secs': 600,
 'run_base_tag': '0e90193e',
 'run_mode': 'train',
 'save_checkpoint_secs': 600,
 'steps': 40000000,
 'sync_replicas': False,
 'update_epochs': 25,
 'update_every': 30,
 'use_gpu': False,
 'use_monitored_training_session': True,
 'value_layers': (200, 100),
 'weight_summaries': {'all': '.*',
                      'policy': '.*/policy/.*',
                      'value': '.*/value/.*'}}

@jlewi
Copy link
Contributor

jlewi commented Feb 4, 2018

Thanks for this.

Is this PR ready for review? The PR description indicates there's work TBD. If its still in progress and not ready for review please prefix the title with [WIP] and remove it when its ready to review.

@jlewi
Copy link
Contributor

jlewi commented Feb 4, 2018

There will be multiple examples in this repository every example should be in its own top level directory
so please create an appropriate directory for your example and put all files in it.

@@ -0,0 +1,60 @@
local params = std.extVar("__ksonnet/params").components["kubeflow-core"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prune the ksonnet app so that it doesn't contain core Kubeflow components?
i.e. should the app assume you already have Kubeflow deployed?

@@ -0,0 +1,70 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this script fo?
If its a bash file it should have a .sh extension.

# See the License for the specific language governing permissions and
# limitations under the License.

get_project_id() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this function defined in the other file as well?

@cwbeitel cwbeitel changed the title reinforcement learning example with tensorflow/agents [WIP] reinforcement learning example with tensorflow/agents Feb 4, 2018
@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 4, 2018

Sure I'll make those changes thanks for the feedback.

@jlewi
Copy link
Contributor

jlewi commented Feb 6, 2018

@nkashy1 Do you want to be a reviewer of this PR when its ready?

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 6, 2018

Did some experiments to find:

  1. There are combinations of gym, pybullet, tf, and agents versions that do not work together (which have been updated since this was started)
  2. For a combination that does work did a run of four replicas with the same hparam set, see tensorboard; requires kubeflow-rl access, creds for cluster kubeflow-5fc52116, and kubectl proxy running, also see mean_score plot below

instability

Here there seems to be both instability of good solutions and variability in the overall outcome (two are clear failures).

There may very well be hparam configurations that will improve this in the individual case, this may be ironed out with the right distributed training / shared parameters scheme, or it might be best solved by early stopping of failing param sets replicating working ones. @danijar idk if this is interesting to you or you might have thoughts.

The following versions were used in the above

tensorflow==1.4.1
pybullet==1.7.5
gym==0.9.4
agents@459c4f88ece996eac3489e6e97a6ee0b30bdd6b3

also here's a full screen grab from tboard for various other scalars

instability2

Renders are in progress, will be landing in subdirs of gs://kubeflow-rl-kf/jobs/agents that correspond to the replica jobs.

@danijar
Copy link

danijar commented Feb 6, 2018

Thanks for the update! What task is this? Gym recently had a few API changes, updating all libraries should work. I recently added a config for AntBulletEnv-v0 to the main repository that sets update_every = 60. The larger batch size makes training quite a bit more stable.

It would still be super cool to get the SyncReplicaOptimizer working, but I'm not sure how feasible that is. What do you think?

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 6, 2018

Dude, it would be super cool!!

I don't know the feasibility but am interested to explore it.

  • One split in approach is either staying with the existing Loop training manager vs. using Supervisor or MonitoredTrainingSession
    • If doing one of the latter we first need to be able to do a full run with these without error, see below [1].
      • Disabling the checkpoint-writing behavior of MTS may be a simpler approach than disabling that of Agents.
    • If the former then Loop will need to handle distributed variable initialization (local initialization waits for globals).
  • Independent of what manages the init ops of SRO, and perhaps more fundamental, is understanding exactly how SRO works both at initialization and at runtime. I think my lack of clarity on solving the problem is due in part to my understanding of SRO being a little cursory.
    • One question is understanding why if SRO is initialized outside any tf.cond or tf.scan contexts the problem still occurs.

Also it's worth noting that it may be more feasible than I (or we) thought previously when there were other errors masking visibility on synchronous distribution-related issues. These are those [1] caused by either conflicts in checkpoint saving, Saver not supporting MonitoredTrainingSession (expecting Session), or successful jobs "success-failing" (for lack of a better label; where successful jobs would still exit with an error code, then crash for a different reason after re-starting).

The task is 'KukaBulletEnv-v0'.

And that's great to hear about update_every!! I'll try that out.

@cwbeitel cwbeitel changed the title [WIP] reinforcement learning example with tensorflow/agents Reinforcement learning example with tensorflow/agents Feb 8, 2018
@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 9, 2018

Ready for review.

Training performance seems less variable with update_every=60 but this only used 4 replicas.

Played around with managing a set of experiments from submission through comparing with tboard to managing renders. The latter could be a lot better. It might be really useful to be able to manage renders via a tab in tensorboard.

@zomglings
Copy link

@cwbeitel , @jlewi : Sorry for long silence, was on the road. Back now and ready to review this. Will have a look tonight.

@cwbeitel : Are you reachable on the kubeflow slack?

@zomglings zomglings self-requested a review February 9, 2018 02:04
@zomglings
Copy link

@cwbeitel, @danijar : Did a quick skim of the demo notebook - I didn't understand the reason for running the same job over four different workers. In general, though, looks nice. I do have plenty of suggestions and will probably have more once I actually run the code.

This did get me wondering as far as tensorflow/agents and kubeflow - how difficult would it be to deploy each simulation process as a different pod and collect all the results into the same training pod?

Just to make sure I understand, this isn't what is happening now, right? Could we implement this with a common GCS bucket? Need to dig into tensorflow/agents code more.

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 9, 2018

Now reachable on the Slack.

I can clarify that or change the approach. The reason for running four replicas of the same job was that I have been seeing variability in performance from one job to the next, see #1 (comment). This seems less pronounced since switching to update_every=60 but my sample size in each case is only n=4. And perhaps may not occur after updating to the most recent agents version which you might want for this PR. Currently using the combination of versions of agents and dependencies last know to produce a functioning result; some combination of modifications to these yielded consistently poor results, see #1 (comment).

Yeah that's interesting. With OpenAI Universe you can automatically launch local or remote simulations, e.g.

# Local
import gym
import universe  # register the universe environments
env = gym.make('flashgames.DuskDrive-v0')
env.configure(remotes=1)  # automatically creates a local docker container
observation_n = env.reset()
# Remote
env = gym.make('flashgames.DuskDrive-v0')
# Manually specify the addresses of the remotes
env.configure(remotes='vnc://localhost:5900+15900,vnc://localhost:5901+15901') 
observation_n = env.reset()

The quay.io/openai/universe.flashgames entrypoint starts up a VNC server which I think listens for actions and sends rewards on 5900 and 15900 by default. So to access a bunch of simulator pods from the training pod in this approach would just involve the training pod knowing the addresses of all the simulator pods. Which could be accomplished in a similar way to or by extending the TF_CONFIG cluster spec.

Copy link

@zomglings zomglings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwbeitel : First pass, without having run a thing.

I did learn a lot - the content is really good and it would take me a long time and a lot of documentation digging to figure a lot of it out if your example didn't exist.

Most of my comments in this pass are about polishing up the presentation and adding more exposition on what kubeflow is doing under the hood.

Next pass (tomorrow) will be more about the actual process of running this example.

agents/build.sh Outdated
SALT=`python -c 'import datetime; import uuid; now=datetime.datetime.now(); print(now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4])'`

VERSION_TAG=tf${TENSORFLOW_IMAGE_TAG}-${SALT}
IMAGE_TAG=gcr.io/${PROJECT_ID}/${IMAGE_BASE_NAME}:${VERSION_TAG}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwbeitel , @jlewi : Can we make images publicly available under a kubeflow project? Would save users having to go through the trouble of this build step.

We should be publishing to DockerHub (re: kubeflow/kubeflow#211) as the primary source of images (as well as GCR, if we have a project handy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True it would be simpler to use if the example were contained in a publicly-accessible image. People would be able to either run the example from that image or make modifications and build their own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the image for?
In general users have to build their own Docker images so we should figure out how to make that CUJ work within cluster.

I think the general solution is going to be by building docker images in cluster using Docker in Docker using Argo.
You can look at the Seldon example
https://github.com/SeldonIO/kubeflow-seldon-example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could probably be done in a follow up PR though.

agents/build.sh Outdated
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

IMAGE_BASE_NAME=agents
TENSORFLOW_IMAGE_TAG=1.4.1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us parametrize this?

TENSORFLOW_IMAGE_TAG=${TENSORFLOW_IMAGE_TAG:-1.4.1}

tf.logging.info("Preparing local files for upload:\n %s" % local_files)

# Initialize the GCS API client
storage_client = storage.Client()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use tf.gfile here instead?

(Even if the workers are writing to local disk, you can use tf.gfile.Copy.)

FLAGS.hparam_set_id, log_dir, run_config.is_chief)

if FLAGS.run_mode == 'train':
if run_config.is_chief:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only intended to run on a single node, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@@ -0,0 +1,76 @@
# tf-job

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should anything under vendor/kubeflow get checked in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, now that I think of it, thank you. It could either be submoduled in or there could be a command that obtains the dependency at setup.

"cell_type": "markdown",
"metadata": {},
"source": [
"Download and install ksonnet if needed"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run this notebook in a Docker container with the right packages installed? (That same container could be used with JupyterLab)

"output_type": "stream",
"text": [
"NAME READY STATUS RESTARTS AGE\r\n",
"kuka-0208-1010-10df-master-znly-0-srftb 0/1 Completed 0 5h\r\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noisy?

"metadata": {},
"outputs": [],
"source": [
"KEY_FILE=\"/Users/cb/Downloads/kubeflow-rl-ec0f4f646339.json\"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messy to have your local path + filename in the notebook.

"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use [ksonnet](https://ksonnet.io/) to parameterize and apply a TFJob configuration (i.e. run a job). Here you can change the image to be a custom job image, such as one built and deployed with build.sh, or use the one provided here if you only want to change parameters. Below we'll display the templated job YAML for reference."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like more explanation about what kubeflow is doing at each step, if the objective is to teach readers about more fundamental kubeflow-related concepts.

Would be good to link back to kubeflow docs, maybe the user guide as part of that.

"\u001b[34mINFO \u001b[0mParameter 'name' successfully set to '\"render-0208-1335-6d81\"' for component 'agents_render'\n",
"\u001b[34mINFO \u001b[0mParameter 'log_dir' successfully set to '\"gs://kubeflow-rl-kf/studies/replicated-env-comparison/pendulum-0208-1010-7b7f\"' for component 'agents_render'\n",
"\u001b[34mINFO \u001b[0mParameter 'image' successfully set to '\"gcr.io/kubeflow-rl/agents:tf1.4.1-0208-0959-a276\"' for component 'agents_render'\n",
"\u001b[34mINFO \u001b[0mUpdating tfjobs rl.render-0208-1335-6d81\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this output is distracting, was your leaving it in the notebook intentional?

@danijar
Copy link

danijar commented Feb 9, 2018

Running multiple training runs with the same hyper parameters is a good first step. In RL, we often use this to show that how robust or not robust a method is to the random seed. After getting this merged, I'd be great to also be able to run different configurations in parallel in order to find good hyper parameters.

Regarding distributing a single agent, I've heard several people complain about the SyncReplicasOptimizer. How much does that actually help us? Do you think it would be an option to manually average the gradients? In pseudo code:

all_gradients = []
for worker in workers:
  with tf.device(worker):
    experience = collect_experience()
    loss = compute_loss(experience)
    gradients = optimizer.compute_gradients(loss)
    all_gradients.append(gradients)
combined_gradients = tf.reduce_sum(all_gradients)
for worker in workers:
  with tf.device(worker):
    optimizer.apply_gradients(combined_gradients)

@zomglings
Copy link

@danijar : As the different parameters all live in distinct tf-job processes, would it be easy to get access to them with something like tf.device?

If so, your pattern is great (tried and true)

Out of curiosity, what are the complaints about SyncReplicasOptimizer?

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 9, 2018

Tasks

  • Remove mess and distracting output from demo notebook
  • Move setup docs to README and partition into local direct container run vs. remote JupyterHub run; make setup documentation clear and usable.
  • Build a demonstration container containing necessary packages and compatibility both with running local jupyter server and being run on JupyterHub

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Feb 9, 2018

FYI steps related to managing GCP credentials both for running the demo container locally as well as on JupyterHub are a work in progress.

@cwbeitel
Copy link
Contributor Author

There's an issue of whether to combine the training container and the demo containers in the interest of both having smaller images to ship around, simpler development, and less redundency.

On the demo side it would be nice to be able to run renders in the notebook so having the ffmpeg etc. dependencies from the training container seem to fit.

But on the side of the training container the base image size would go up many times:

gcr.io/kubeflow/tensorflow-notebook-cpu   latest                   169f860f283c        6 days ago          5.6GB
tensorflow/tensorflow                     1.5.0                    a2d1671e8a93        3 weeks ago         1.25GB
tensorflow/tensorflow                     1.4.1                    3cac069c491d        3 weeks ago         1.28GB
gcr.io/kubeflow/tensorflow-notebook-gpu   latest                   e68d36c67064        2 months ago        7.11GB

This is only relevant the first time someone runs a training job and if these are being initiated from the demo notebook then caches of the above will already exist in the cluster so the size difference will be irrelevant.

But when building containers with Argo a different docker daemon is used and the build time would go up significantly at least on the first build and until builds are cached on every build; it takes a rather long time to pull 7GB.

Therefore at the moment I have added dependencies from the trainer to the demo notebook but not combined the two.

@cwbeitel cwbeitel changed the title Reinforcement learning example with tensorflow/agents [WIP] Reinforcement learning example with tensorflow/agents Feb 20, 2018
@cwbeitel
Copy link
Contributor Author

Thinking more about it looks like all of the setup steps in the notebook can be encapsulated in a setup function or DevEnvironment object.

In addition if you could build images from the notebook dev environment that would simplify things and be more familiar for developers but not sure if there is a docker daemon accessible.

One outstanding issue seems to be providing users credentials to perform operations within their own namespace on the underlying Kubernetes cluster without them needing to obtain these through gsutil or other means. Also with the way things are configured now "!gcloud container clusters get-credentials kubeflow-dev --zone us-east1-d" just works by default without the need to upload a service account but the idea is to make it vendor-agnostic...

@cwbeitel
Copy link
Contributor Author

So simplified things significantly using Google Container Builder and using the credentials that are already available. Not caching container builds / waiting on builds is a significant point of friction. Will experiment GCB caching but this shouldn't block an everything-but review.

Added Docker to demo notebook but there is not a docker daemon available. This would make it a more complete dev environment and simplify builds of the training container. But don't really need to re-build the training container for each run - if the dependencies haven't changed you could just mount the code via PVC shared with the notebook.

Also in my experience it's important to be able to install system packages if developing in the notebook. Currently various pip install commands fail lacking sufficient permissions.

@cwbeitel cwbeitel changed the title [WIP] Reinforcement learning example with tensorflow/agents Reinforcement learning example with tensorflow/agents Feb 22, 2018
@cwbeitel
Copy link
Contributor Author

Thinking it would make sense to re-work each step of the notebook as a python command (instead of system command) so the example could be tested end-to-end.

@jlewi
Copy link
Contributor

jlewi commented Feb 22, 2018

@cwbeitel It seems like you are continuing to make some big changes to this PR. Should we move those changes into a new PR/branch or should we move this PR back to the WIP stage?

@cwbeitel
Copy link
Contributor Author

@jlewi Sure, I think it would be good to start turning this into a series of small PR's so maybe we can let the current version be the stopping point for this PR then I'll make individual issues and PR's for changes subsequently, discussing those at least briefly ahead of time. Does that sound good to you?

@jlewi
Copy link
Contributor

jlewi commented Feb 23, 2018 via email

@cwbeitel cwbeitel changed the title Reinforcement learning example with tensorflow/agents [WIP] Reinforcement learning example with tensorflow/agents Feb 23, 2018
@jlewi
Copy link
Contributor

jlewi commented Mar 1, 2018

@cwbeitel What needs to happen for this PR to be ready for review?

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Mar 1, 2018

I don't know. Feel free to review it now and let me know what you think still needs to be done for this particular PR.

Adding the triggering of automated tests might be an addition or it might be a necessary requirement. If I was reviewing this I'd say you need to have some basic tests that trigger automatically when you push a commit but testing it interactively may be sufficient initially.

Being able to either (1) quickly build training containers or (2) not need to build training containers is needed but not necessarily for this PR.

@cwbeitel cwbeitel changed the title [WIP] Reinforcement learning example with tensorflow/agents Reinforcement learning example with tensorflow/agents Mar 1, 2018
@jlewi
Copy link
Contributor

jlewi commented Mar 1, 2018

/assign @nkashy1

@nkashy1 Can you please review?

@cwbeitel
Copy link
Contributor Author

cwbeitel commented Mar 3, 2018

Not sure how to rebase given example has been on master branch so closing this, re-forking, creating branch for example, and adding current version of PR with tests.

@cwbeitel cwbeitel closed this Mar 3, 2018
texasmichelle pushed a commit that referenced this pull request Sep 4, 2019
k8s-ci-robot pushed a commit that referenced this pull request Mar 16, 2020
* update simple notebook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants