Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* [autoscaler/AWS] Updated AWS Node Provider threading logic (ray-project#11422) * [autoscaler] Add rsync_exclude and rsync_filter options to cluster config (ray-project#11512) * Add --worker-port-list option to ray start (ray-project#11481) * [hotfix] Pin node version (fix linux wheel build) (ray-project#11532) Co-authored-by: Max Fitton <max@semprehealth.com> * [Core] Allow creating tasks/actors in a detached actor when driver has exited (ray-project#11493) * Allow creating tasks/actors in a detached actor when driver has exited * lint * Address comment * [Autoscaler] Do not count unmanaged nodes in load metrics (ray-project#11458) * fixedd * lint * fixed other test case * . Co-authored-by: Alex Wu <alex@anyscale.com> * [RaySGD] Docs for SGD+Tune usage (ray-project#11479) * Clean up release tests (ray-project#11420) * [tune] a tiny ptl example (ray-project#11497) * [yaml] HotFix for correct example full (ray-project#11584) * [releng]: Quiet Docker Push (and explain why) (ray-project#11623) * [release] Do not tag docker latest on release builds (ray-project#11694) * fix * Added comment Co-authored-by: Alex Wu <alex@anyscale.com> * [tune] fixed validation for search metrics (ray-project#11583) * fixed validation for search metrics * formatting * made error report better * if only one metric is missing extract it from list * any can take a generator * Fix asyncio plasma integration in cluster mode (ray-project#11665) * [tune] PB2 (ray-project#11466) Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Version bump 1.0.1 * Disable validation of cluster config on the cluster to allow for cluster configs with new properties. (ray-project#11693) * [Hotfix] Pin Pydantic Version (ray-project#11622) * [docker] Fix docker regex (ray-project#11726) Co-authored-by: Alex Wu <alex@anyscale.com> * [GCS]Decouple node failure detector with resoure related operations (ray-project#11465) * [Placement Group] Placement group automatic cleanup. (ray-project#11546) * In progress. Done with all placement group manager code. * It is working with job. * Finished detached actor implementation. * Fix minor issue. * In progress. * Addressed code review. * Addressed code review. * Addressed code reivew. * Fix a build error. * [docker] Push to DockerHub in CI (ray-project#11442) * [docker] Disable Readme push to avoid errors (ray-project#11770) * Release testing things * rllib regression results * [Metrics] Implement basic metrics changes (ray-project#11769) * Implement basic metrics changes * Addressed code review. * Fix build issue. * Fix build issue. * [Core] Fix ray start failure to due to bug of redis address detection (ray-project#11735) * Fix ray start failure to due redis address detection bug * Address comment * [Test] Ignore setproctitle for local mode (ray-project#11819) * [Dashboard] Patch issue in 1.0.1 release where worker stats are not present for a node (ray-project#12062) * [autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique (ray-project#11600) * Set version to 1.0.1.post1 * Sync Bonsai Changes in 1.0.1 (#47) * Bump up the version to 0.8.6 * Linting fix. * Add release test runnning full asan python test (ray-project#8836) * [MERGE TO MASTER] Add microbenchmark result. * Fix asyncio re-entry error message (ray-project#8842) * Change os.uname()[1] and socket.gethostname() to the portable and faster platform.node_ip() (ray-project#8839) Co-authored-by: Mehrdad <noreply@github.com> * [serve] Fix long running failure test (ray-project#8863) * [Serve] Serve long running test fix (ray-project#8864) * Replace ps call with psutil (ray-project#8851) * Replace ps call with psutil * Minor formatting Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> * [Core] Fix a detached actor bug fix when GCS actor management is off. (ray-project#8843) * [Testing] Fix LINT/sphinx errors. (ray-project#8874) * Node failure test fix (ray-project#8882) * [core] Check that port is unused before assigning to worker (ray-project#8773) * [rllib] Set framework to tf by default and remove import checks; "Auto" option (ray-project#8748) * tf by default * Update rllib/agents/trainer.py Co-authored-by: Sven Mika <sven@anyscale.io> * remove it * fix * remove * fix * lint Co-authored-by: Sven Mika <sven@anyscale.io> * [RLlib] Issue 8889: action clipping bug ppo not learning mujoco (ray-project#8898) * Fix Windows build (ray-project#8905) Co-authored-by: Mehrdad <noreply@github.com> * Use no_restart=False for ray.kill in Serve failure test (ray-project#8952) * Display GPU Utilization in the Dashboard (ray-project#8564) * Update incorrect detached actor docs (ray-project#8930) * [Dashboard] Dashboard pubsub hotfix. (ray-project#8944) * [CI] Fix Conda Permission on MacOS Github Action(ray-project#9004) Co-authored-by: Mehrdad <noreply@github.com> * Update pandas to 1.0.5 (ray-project#9065) Co-authored-by: Mehrdad <noreply@github.com> * Do not add reference count when it is local mode. (ray-project#8979) * [Dashboard] Update the Ray dashboard documentation to explain memory view. (ray-project#8945) * Windows compatibility (#93) Co-authored-by: mehrdadn <mehrdadn@users.noreply.github.com> Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Preparing 0.8.6 (#26) * Updated Version to 0.8.5. * Formatting. * Fix Serve long running test (ray-project#8223) * Fix release 0.8.5 tests for PPO torch Breakout. (ray-project#8226) * Remove logging (ray-project#8211) * [BRING BACK TO MASTER] Fix cluster.yaml config. * [rllib] Copy plasma memory before adding data to replay buffer * [sgd] Resource limit lift for GPU test (ray-project#8238) * Fix resource_ids_ data race (ray-project#8253) * [rllib] [hotfix] Remove assert that trips on pytorch multiagent (ray-project#8241) * [BRING BACK TO MASTER] add torch download for rllib regresstion test. * [serve] Master actor fault tolerance (ray-project#8116) * [serve] Add delete_backend call (ray-project#8252) * Fix resource_ids_ data race (ray-project#8253) * [serve] Add delete_endpoint call (ray-project#8256) * [serve] Refactor BackendConfig (ray-project#8202) * Delete example files. * Fix serve long running test (ray-project#8268) * [tune] Avoid breakage - soft deprecation warning for search algs (ray-project#8258) * [tune] Hotfix Ax breakage when fixing backwards-compat (ray-project#8285) * Async actor microbenchmark Script (ray-project#8275) * [core] Disable GCS actor management (ray-project#8271) * Pin redis-py version (ray-project#8290) * [BRING BACK TO MASTER] add pip install upgrade to the command. * Add ipython as dependency for autoscaler container (ray-project#8297) Co-authored-by: rbusche <rbusche@inserve.de> * Revert "Async actor microbenchmark Script (ray-project#8275)" This reverts commit 6a6eead. * Docs and LINT. * [RLlib] Increasing reusability v0 (#8) * Set up CI with Azure Pipelines Specifically, we are setting a travis like ADO pipeline following what is already present in the .travis.yml file in the root of the repo. * Separating travis like pipeline from main pipeline * Adding Jenkings jobs equivalent * Making some improvements * Adding validation of the upstream CI * Disabling Tune and large memory tests * Changing threshold for simple reservoir sampling test * Addressing comments * Updating Azure Pipelines with travis updates * Updating Azure Pipelines with more travis updates * Updating CI with new cpp worker tests * Setting code owners * Fixing the version number generation * Making main pipeline also our release pipeline * Updating Azure Pipelines with travis updates * Fixing wheels test * Fixing codeowners * Updating Azure Pipelines with travis updates * Bumping up MACOSX_DEPLOYMENT_TARGET * Updating Azure Pipelines with travis updates * Updating Azure Pipelines with travis updates * Updating Azure Pipelines with travis updates * Disabling Serve tests * Making explicit which branches GitHubActions workflows should watch * Desabling Ray serve tests * Installing numpy explicitly * consolidating Ray test steps in one yml * Making worker set, apex and ppo a little bit more reusable for custom agents * Making Dynamic TF policy more reusable * Allow the actions dict carry user data defined for the episodes * Forcing RLlib tests to run always * Making SAC model more extensible * Adapting exploration API * Reverting the random worker index change * Making epsilon configurable * Fixing method doc * Fixing aguments check in reset_schedule * Fixing per worker epsilon greedy * Activating logs for failing test * Making original_space check more roboust * Allow normalized actions rescaling happend outside RLlib * Passing infos values from agents to callbacks * Installing node js using a task * Adding kwargs in TFModels * Fixing npm and node in mac * Fixing the num workers value passed * Forcing RLlib tests * Merging 0.8.5 * Running some RLlib test in custom agent * Adding echo bazelisk * Force CI * Force CI * Relaxing an installation * Using container jobs * Fixing container jobs * Change base image for container job * Install with sude * Exec with sudo * Test * Changing agent pool * Remove python selection * Fix version replacement * Fix version replacement * Trying Bazel * Installing node with sudo * Run all install as sudo * Reverting sudo -s * Fixing omitted param * install python manually * Fixing missing param * Making NVM available * Fix nvm installation * Fix copye-paste * renaming to req file * fix typo * Install JDK 8 * Install req in other jobs * Install JDK with sudo * Removing docker clean up * Install Docker * fix installation issue * Adding azure package source * Fix docker permissions * Install jq * downloading with sudo * Install llvm as root * Skiping flaky test * copy artifacts as sudo * Fix Bazel build in MacOS (#23) * Fixing mac os building issue * Bazelisk check * Increase bazel version * Fixing typos * Update hash * Include unzip * Improved compilation and convergence tests Added compilation tests that follow proper PyTest conventions. These tests use parametrized settings, and allow for multiple algorithms to be tested with a single test. I've commented out tests these two tests can replace, to show the improvement. Only about half of the algorithms have been transitioned to the new tests in interest of keeping the PR small. * Increasing bazel version * Increasing bazel version only mac pipelines * Printing system info in Ubuntu wheels pipeline * making docker install optional * Compilation and convergence tests for more algos Added compilation and convergence tests for Apex DQN, Apex DDPG Added convergence tests for SAC Removed old (commented out) compilation test code from `rllib.agents.dqn.tests.test_apex` * Clean up Deleted old (commented out) test code * Updated BUILD file Split tests into test_compilation and test_learning.py to work with BAZEL build files. * Updated BUILD file Fixed bug in BUILD - wrong files passed in. * BugFix: Improper imports causing test failures * BugFix: Improper imports causing test failures * Removed test_appo from BUILD file * Fixing copy-paste error * Applying some bazel fixes * Fixing installation issues * Update hash * Fixing NVM/NODE installation * Applying latest changes in travis.yml * Fixing fixture data exclusions * Disable some java tests * Adgudime/apex sac (#25) * WIP: Compilation tests work * Fixed bugs with Apex SAC continuous action spaces * Bugfix: Bad imports * Fixing PyArrow issue * Fixing guava check * Fix datetime java format * Fixing Bazel issues finding or loading conftest * Fixing pytest module loading order * Trying different approach to pickle check * Installing latest pickle5 explicitly * Fixing conftest resolution * Temporarily disabling pickle5 validation * Fixing fixture data exclusions * Fixing data files treated as src * Disable some java tests Co-authored-by: Edilmo Palencia <edilmo@gmail.com> * Fix multiple CI errors * Update hash * Fixing more build issues * Fixing more build issues * Fix pipeline cache path * More fixes * Fix cache * Fixing bazel test command * Fix bazel test * Allowing custom sumarize episodes * Adding custom metrics ops in exec plan * Apex SAC exploration should be stochastic * Leting DQN deal with rechaping for Discrete spaces * Commenting the cache Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Simon Mo <xmo@berkeley.edu> Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: ijrsvt <ian.rodney@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu> Co-authored-by: Rüdiger Busche <rbusche@posteo.net> Co-authored-by: rbusche <rbusche@inserve.de> Co-authored-by: sven1977 <svenmika1977@gmail.com> Co-authored-by: Aditya Gudimella <aditya.gudimella@gmail.com> * Fix system info step (#29) * Fix system info step (#30) * adding testing framework (#28) * adding testing framework * install kubebuilder for testing * adding crrect hash Co-authored-by: Ali Kanso <ali.kanso@microsoft.com> * add shared mem max flag * change readme * Tuned hyperparams for ApexSAC * Bugfix for exploration config. * Allowing PPO to handle async sampling (#34) * Making ppo ParallelRollouts mode configurable * Making dqn ParallelRollouts mode configurable * Making RolloutWorker generator function public * Missing argument * Stop iteration if round robim proportion is not met * fixing wheels parsing * Improving iter union stop-iteration conditions * Fixing DDPG * Fixing MADDPG * Fix tflite compat issue (#35) * Fix tflite compat issue * Fixing iter corner case * Manual stride with elipsis * Fix unecesary stop iteration * Allow replay ops to stop if they are unhealthy (#36) * Allow the replay ops to stop if they are unhealthy * Allowing to configure dqn execution plan consistently * Making configurable concurrency mode in DQN and metric collection in Apex (#37) * Fixing concurrency op in dqn (#38) * Replaced Prioritized Experience Replay with normal Experience replay to create AsyncSAC. * Setting prioritized_replay in config now uses PrioritizedReplay correctly. * Renamed LocalAsyncReplayBuffer and AsyncReplayActor to better reflect usage * Added test with prioritized_replay set to True * Cleaned up code. * Fixing manual slicing (#40) * Fixing manual slicing * Handling the Box space explicitly * Including the force stop in gather_async (#41) * Including the force stop in gather_async * Fix missing bar * Fix for gather across shards * Fix for gather async extreme case * Making env-runner an explicit iterator and Local Iterator regenerable (#42) * Making env-runner an explicit iterator And also making the LocalIterator able to regenerate. * Fix multi agent test * Fix union * Making infinite sequence explicit For the sake of the parallel iterators, one that hold a infinite sequence, could be called again after a stop iteration message. In other words, an StopIteration for a infinite sequence must be seen as a "no items available" message. * Fix unexpected error * Fixing gym version * Update hash * Addressing comments * Improve gathering async and by shards (#44) * Improve gathering async and by shards * Making ParallelIteratorWorker an explicit Iterator in all cases * Making ParallelIteratorWorker an explicit Iterator in all cases * Fixing inverted condition * Removing ForceStopIteration * Make seeding possible even if env cannot be seeded. * Fix grep versions (#46) * Fix grep versions * Spliting the stages * Using pool for all rllib * Update hash * fixing path permissions * Changing node version * Reverting some OS changes * Fixing compilation errors * More compilation errors * More compilations errors * Fix node installation * Fixing some package versions * Using right bazel version * Fix mac os version in wheels * Fix mac os version in wheels * Some minor fixes * Force the target mac os * Fix path * Disable stress test temporarily * Fixing gitignore * Fixing Sampler merge mistakes * Fixing epsilon greddy merge mistakes and requirements versions * Fix merge error * Apply changes in travis.yml * Fix several issues * Fixing more compatibility bugs * Fix more incompatibilities * More incompatibilities * Fixing more compat issues * Disable tune horovod torch tests * Fixing more tests Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Simon Mo <xmo@berkeley.edu> Co-authored-by: mehrdadn <mehrdadn@users.noreply.github.com> Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu> Co-authored-by: Max Fitton <mfitton@berkeley.edu> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Rüdiger Busche <rbusche@posteo.net> Co-authored-by: rbusche <rbusche@inserve.de> Co-authored-by: sven1977 <svenmika1977@gmail.com> Co-authored-by: Aditya Gudimella <aditya.gudimella@gmail.com> Co-authored-by: Ali Kanso <akanso@us.ibm.com> Co-authored-by: Ali Kanso <ali.kanso@microsoft.com> * Applying travis.yml changes * Use latest pip * Update the hash * Fix rllib issues * Fix rllib issues 2 * Fix tune errors * Fix ray issues * Remove old operator * revert some rllib test deletions * revert changes on release folder * Revert more changes * Logging dashboard building * Use previous docker image * Use centos docker image * more logging * Comment step * hash * installing node 14 * Fix hash Co-authored-by: Gekho457 <62982571+Gekho457@users.noreply.github.com> Co-authored-by: Alan Guo <aguo@aguo.software> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Max Fitton <maxfitton@anyscale.com> Co-authored-by: Max Fitton <max@semprehealth.com> Co-authored-by: Kai Yang <kfstorm@outlook.com> Co-authored-by: Alex Wu <itswu.alex@gmail.com> Co-authored-by: Alex Wu <alex@anyscale.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: Barak Michener <me@barakmich.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> Co-authored-by: Raoul Khouri <69156393+raoul-khour-ts@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Jack Parker-Holder <jackph@robots.ox.ac.uk> Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Alan Guo <aguo@anyscale.com> Co-authored-by: Tao Wang <wangtaothetonic@163.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> Co-authored-by: Simon Mo <xmo@berkeley.edu> Co-authored-by: mehrdadn <mehrdadn@users.noreply.github.com> Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu> Co-authored-by: Max Fitton <mfitton@berkeley.edu> Co-authored-by: Rüdiger Busche <rbusche@posteo.net> Co-authored-by: rbusche <rbusche@inserve.de> Co-authored-by: sven1977 <svenmika1977@gmail.com> Co-authored-by: Aditya Gudimella <aditya.gudimella@gmail.com> Co-authored-by: Ali Kanso <akanso@us.ibm.com> Co-authored-by: Ali Kanso <ali.kanso@microsoft.com>
- Loading branch information