Add cloudtools (#6010) · danking/hail@6c231c0

Commit

Add cloudtools (hail-is#6010)

* update

* update

* update

* updatE

* Create README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* update

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* update

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* added default Spark configs to init_notebook.py

* updatE

* update

* update

* add time delay to allow Jupyter to start

* add time delay to allow Jupyter to start

* revert log change

* update

* updatE

* updated README

* set maxResultSize property to unlimited

* changed default worker to n1-standard-8

* merged init_default.py functionality into init_notebook.py

* merged init_default.py functionality into init_notebook.py

* updated README

* updated README

* updated README

* updated README

* updated README

* updated README

* updated README

* updated README

* updated README

* updated README

* updated README

* fixed order of code in init script

* fixed ports argument in start-up script

* moved waiting for Jupyter code to init script

* updated alias code block

* fixed init script filename

* Google Chrome check needs to be fixed

* added gitignore

* highmem worker by default with --vep.

* added --hash option to start_cluster.py to reference older Hail builds

* Merge pull request #5 from Nealelab/dev

added --hash option to start_cluster.py to reference older Hail builds

* decoupled default conf in Jupyter notebook Spark from /etc/spark/conf/spark-defaults.conf

* typo in submit_cluster

* modified init_notebook script

* Update stop_cluster.py

* Now passes extra properties to gcloud

* added ability to specifiy custom Hail jar and zip for Jupyter notebook on startup

* Some tightening of options

* Moving into main

* Removed duplicate keyword argument

* remove duplicate argument

* Added diagnose_cluster.py

Compiles log files for a cluster to a local directory or a google bucket

```
python diagnose_cluster.py -n my-cluster -d my-cluster-diagnose/
python diagnose_cluster.py -n my-cluster -d gs://my-bucket/my-cluster-diagnose/
```

```
usage: diagnose_cluster.py [-h] --name NAME --dest DEST [--hail-log HAIL_LOG]
                           [--overwrite] [--no-diagnose] [--compress]
                           [--workers [WORKERS [WORKERS ...]]] [--take TAKE]

optional arguments:
  -h, --help            show this help message and exit
  --name NAME, -n NAME  Cluster name
  --dest DEST, -d DEST  Directory for diagnose output -- must be local
  --hail-log HAIL_LOG, -l HAIL_LOG
                        Path for hail.log file
  --overwrite           Delete dest directory before adding new files
  --no-diagnose         Do not run gcloud dataproc clusters diagnose
  --compress, -z        GZIP all files
  --workers [WORKERS [WORKERS ...]]
                        Specific workers to get log files from
  --take TAKE           Only download logs from the first N workers
```

- Runs `gcloud dataproc clusters diagnose`
- Grabs following log files from master node

```
/var/log/hive/hive-*
/var/log/google-dataproc-agent.0.log
/var/log/dataproc-initialization-script-0.log
/var/log/hadoop-mapreduce/mapred-mapred-historyserver*
/var/log/hadoop-hdfs/*-m.*
/var/log/hadoop-yarn/yarn-yarn-resourcemanager-*-m.*
/home/hail/hail.log # can be modified with command line argument
```

- Grabs following log files from workers

```
/var/log/hadoop-hdfs/hadoop-hdfs-datanode-*.*
/var/log/dataproc-startup-script.log
/var/log/hadoop-yarn/yarn-yarn-nodemanager-*.*
/var/log/hadoop-yarn/userlogs/*
```

Output directory has following structure:

```
diagnostic.tar
master/my-cluster-m/...
workers/my-cluster-w-*/...
hadoop-yarn/userlogs/application*/container*
```

* sec worker fix

* Break apart ssh options.

Saw failures with some version of gcloud/ssh.

* Exposed --metadata and fixed problem with creating directory

* Recapitulating subprocess fixes of PR #11

* Fix typo in README

* Added executable

* Updates to support multiple Hail versions and new deployment locations.

 - init_notebook is now versioned for compatibility. This commit uses
   version 2, which I've uploaded to gs://hail-common/init_notebook-2.py.
 - Hail now deploys both 0.1 and devel versions, so I added an argument to
   allow either to be used. The stable version should of course be used by
   default.
 - The init arg is now empty by default, because the init_notebook script
   should always be run (and requires the compatibility version to decide
   the correct path). It is still possible to use additional init actions.

* small fix in init_notebook; updated submit script to reflect new Hail deployment

* packaged commands under umbrella 'cluster' module

* updated diagnose

* updated readme; added --quiet flag to stop command

* updated readme with optional arguments

* Update LICENSE.txt

* make notebook default for cluster connect

* Overhaul CLI using argparse subparsers; interface change

 - More informative help messages
 - Added default args to --help output
 - Interface change: module comes before name

* Fixed HAIL_VERSION metadata variable.

* updated setup.py to reflect v1.1

* changed some instances of check_call to call to avoid redundant errors

* Remove zsh artifacts from README

* added --args option to submit script to allow passing arguments to submitted Hail scripts

* incremented to v1.1.2

* Update README.md

* Remove sleep

* removed Anaconda from notebook init; added --pkgs option to cluster start

* Fix deployment issues by bumping compatibility version

* fixed jar distribution issues

* forgot something

* Updating spark version to dataproc 1.2

* a few fixes for 2.2.0

* COMPAT version changes

* Made the os.mkdir statements safer and free from race conditions

* Fix cloudtools to work with Hail devel / 0.2 (hail-is#47)

* Update README.md (hail-is#48)

* Unify hail 0.1 and 0.2 again, fix submit (hail-is#49)

* Unify hail 0.1 and 0.2 again, fix submit

* Fixed submit help message

* Bump version

* Update init_notebook.py (hail-is#51)

* Add parsimonious (hail-is#52)

* Parameterize master memory fraction (hail-is#53)

* Parameterize master memory fraction

* Parameterize master memory fraction

* Parameterize master memory fraction

* add bokeh to imports (hail-is#54)

* Use specific version of decorator (hail-is#56)

* Update README.md (hail-is#57)

* add modify jar and zip (hail-is#59)

* * Fixed zip copying (hail-is#60)

* Added gs:// support

* rolling back google-cloud version (hail-is#62)

* moved up package installation in init script (hail-is#63)

* use beta for max-idle option (hail-is#61)

* use beta for max-idle option

* bug fix

* added Intel MKL to init script (hail-is#64)

* added Intel MKL to init script

* fix

* another fix

* Update default version to devel / spark 2.2.0; update README (hail-is#65)

* Update default version to devel / spark 2.2.0; update README

* fix

* Added initialization time-out option. (hail-is#71)

* add async option to stop (hail-is#73)

* check for errors in start, stop, submit, and list (hail-is#74)

* update version to 1.14 (hail-is#75)

* Syntax error (hail-is#76)

* fix syntax error

* bump versino

* add a bucket parameter (hail-is#78)

* add a bucket parameter

* also document deployment

* use config files to set some default properties (hail-is#77)

* do... something

* set image based on spark version

* tweak to run using paths that deploy will spit out

* fix

* fix rebase

* Set up Continuous Integration (hail-is#80)

* wip hail ci

* fix formatting

* ignore emacs temp files

* add cluster sanity checks

* Update setup.py

* Update cluster-sanity-check-0.2.py

* Fix CI Build (hail-is#81)

* Update hail-ci-build.sh

* Update hail-ci-build.sh

* add more necessary things

* fix build image and update file

* fix build image maybe

* use python2

* fix image

* Update hail-ci-build.sh

* Update hail-ci-build.sh

* Continuous Deployment (hail-is#82)

* add deploy script

* document deployment secret creation

* fix readme

* fix if check

* ensure twine is in build image

* kick ci

* set required property? apparently?

* bump to 0.2 (hail-is#79)

* add make to image (hail-is#85)

* fix deploy (hail-is#86)

* copy some lessons from hail (hail-is#84)

copying some ideas from the discussion at hail-is#4241

* Update hail-ci-deploy.sh (hail-is#87)

* fix (hail-is#88)

* fix cloudtools published check (hail-is#89)

* add warning, versioned hash lookup (hail-is#90)

* fix deploy script version checking (hail-is#92)

* Test python 3.6 and fix python 3.7 incompatibility (hail-is#91)

* test python3

* also fix async is reserved word

* checked in bad build file

* unneeded var

* shush pip

* kick ci

* update build hash

* Ignore INT and TERM in shutdown_cluster

* parse init script list (hail-is#94)

* parse init script list

* Update __init__.py

* switched devel vep to use docker init (hail-is#96)

* bump version for vep init (hail-is#98)

* deploy python2 and python3 to pypi (hail-is#93)

* Update start.py (hail-is#99)

* Update start.py

* Update start.py

* Update __init__.py (hail-is#100)

* fix python3 deploy (hail-is#101)

* Fix pkgs logic (hail-is#102)

* Adding more options to modify (hail-is#67)

* Added options to modify clusters

* Update modify.py

* Add a max-idle 40m to test clusters (hail-is#103)

* Add a max-idle 40m to test clusters

* need gcloud beta components

* Pin dependency versions (hail-is#105)

* pin dependency versions

* update the version of cloudtools

* install all packages together to ensure dependencies are calculated together

* fail when subprocess fails

* fix conda invocation

* compatibility with python2

* Revert "fail when subprocess fails"

This reverts commit 25e7c0a524823d91894b538427f179611e79f271.

* blah

* wtf

* if was backwards

* restart tests

* Improve Error Messages when Subprocesses Fail (hail-is#111)

* add and use safe_call

* fail when subprocess fails

* use safe_call

* use safe_call extensively

* simplify and make correct safe_call

* fix splat

* fix

* foo

* update verison (hail-is#113)

* Added describe to get Hail file info/schema (hail-is#112)

* Added describe to get Hail file info/schema

* f -> format

* Update setup.py

* Update __init__.py (hail-is#115)

* Fix cloudtools (hail-is#116)

* fix

* bump version

* fix (hail-is#117)

* bump ver (hail-is#118)

* fixed describe ordering for python2 (hail-is#119)

* devel => 0.2 (hail-is#121)

* add latest (hail-is#120)

* added --max-age option. (hail-is#123)

* added --max-age option.

* bump version

* update to 3.0.0 (hail-is#122)

* update to 3.0.0

* bump

* bump

* s/devel/0.2

* Fix packages again (hail-is#124)

* Fix packages again

* fix

* Add 'modify' and 'list' command docs (hail-is#125)

* Update connect.py (hail-is#126)

* Rollout fix for chrome 72 (hail-is#130)

* Add python files or folders from environment variable; zip files together (hail-is#127)

* Add python files or folders from environment variable; zip files together

* bumping version

* files -> pyfiles

* missed one

* overloaded variable

* updating VEP init script (hail-is#129)

* updating VEP init script

* Update __init__.py

* files -> pyfiles once more (hail-is#131)

* fix for jupyter/tornado incompatibility (hail-is#133)

* Adding project flag (hail-is#134)

* Adding project flag

* Adding configuration option as well

* Adding support for GRCh38 VEP (hail-is#135)

* Adding support for GRCh38 VEP

* version bump

* Fixing VEP version for 38 (hail-is#136)

* Adding support for GRCh38 VEP

* version bump

* fix for 38 VEP version

* Update __init__.py

* Disable stackdriver on cloudtools clusters (hail-is#138)

* Update default spark version (hail-is#139)

* Update default spark version

* Clean up imports

* allowing pass-through args for submit (hail-is#140)

* allowing pass-through args for submit

* bump version

* moar version

* moved cloudtools to subdirectory project for inclusion in monorepo

* moved .gitignore

* bump

* bump

Loading branch information

cseed authored and danking committed May 8, 2019

1 parent 574103a commit 6c231c0

cloudtools/.gitignore

-Original file line number
+Diff line change
@@ -0,0 +1,7 @@
+    .DS_Store
+    *.pyc
+    dist/
+    build/
+    *.egg-info/
+    test.py
+    *~

cloudtools/LICENSE.txt

-Original file line number
+Diff line change
@@ -0,0 +1,21 @@
+    MIT License
+    Copyright (c) 2017, cloudtools contributors.
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE.

cloudtools/Makefile

-Original file line number
+Diff line change
@@ -0,0 +1,22 @@
+    .PHONY: hail-ci-build-image
+    BUILD_IMAGE_SHORT_NAME = cloud-tools-pr-builder
+    latest-hail-ci-build-image:
+    	cd pr-builder && docker build . -t ${BUILD_IMAGE_SHORT_NAME}
+    hail-ci-build-image: HASH = $(shell docker images -q --no-trunc ${BUILD_IMAGE_SHORT_NAME} | head -n 1 | sed -e 's,[^:]*:,,')
+    hail-ci-build-image: latest-hail-ci-build-image
+    	docker tag ${BUILD_IMAGE_SHORT_NAME} ${BUILD_IMAGE_SHORT_NAME}:${HASH}
+    push-hail-ci-build-image: HASH = $(shell docker images -q --no-trunc ${BUILD_IMAGE_SHORT_NAME} | head -n 1 | sed -e 's,[^:]*:,,')
+    push-hail-ci-build-image: hail-ci-build-image
+    	docker tag ${BUILD_IMAGE_SHORT_NAME}:${HASH} gcr.io/broad-ctsa/${BUILD_IMAGE_SHORT_NAME}:${HASH}
+    	docker push gcr.io/broad-ctsa/${BUILD_IMAGE_SHORT_NAME}:${HASH}
+    	echo gcr.io/broad-ctsa/${BUILD_IMAGE_SHORT_NAME}:${HASH} > hail-ci-build-image
+    deploy:
+    	rm -f dist/*
+    	python2 setup.py bdist_wheel
+    	python3 setup.py sdist bdist_wheel
+    	twine upload dist/*

0 comments on commit `6c231c0`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `6c231c0`

Commit

There are no files selected for viewing

0 comments on commit 6c231c0

0 comments on commit `6c231c0`