README tweak to update quickstart link.

LLNL · Aug 21, 2018 · 8f54031 · 8f54031
1 parent fceeffa
commit 8f54031
Showing 1 changed file with 24 additions and 158 deletions.
diff --git a/README.md b/README.md
@@ -5,139 +5,31 @@
 [![Stars](https://img.shields.io/github/stars/LLNL/maestrowf.svg)](https://github.com/LLNL/maestrowf/stargazers)
 [![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/LLNL/maestrowf/master/LICENSE)
 
-A Python package that implements the workflow and run specification. The
-package provides users with a generalized way to define a workflow, configure
-parameters sweeps, and manage dependencies.
+## Introduction
 
-MaestroWF is designed with the following core principles in mind:
+Maestro Workflow Conductor is a Python tool and library for specifying and automating multi-step computational workflows both locally and on supercomputers. Maestro parses a human-readable YAML specification that is self-documenting and portable from one user and environment to another.
 
-##### Reproducibility
-All simulation studies should be easily reproducible with just a single (or
-small set of) file(s). Person A should be able to hand off to Person B without
- large amounts of effort.
+On the backend, Maestro implements a set of standard interfaces and data structures for handling "study" construction. These objects offer you the ability to use Maestro as a library, and construct your own workflows that suit your own custom needs. We also offer other structures that make portable execution on various schedulers much easier than porting scripts by hand.
 
-##### Repeatability
-All simulation studies should be easily repeatable. That is to say, it is not
-enough to reproduce old studies -- executing the same exact flow on new studies
-is just as important and should be easy to achieve in a simple manner.
-
-##### Self-Documentation
-
-It is not enough that a workflow runs. Getting to results is just as important
- as how you get there. Even more important, documentation of how to execute
-studies and what a workflow is doing at each step.
-
-##### Consistency
-
-Standard documentation and management of studies allows for an ecosystem to
- be built around a common infrastructure. This concept allows for new tools and
- services to be provided (in most cases) in a manner transparent to the end
- user. Even more so, consistency allows different users to communicate about
- a workflow using the same language and core concepts.
-
-##### Dependency Management
-
- An expandable framework for pulling dependencies from a wide array of different
- sources. So long as a programming interface can be defined for acquiring a
- dependency it can be added and managed in a study.
-
-----------------
-
-## External Information and Documentation
-
-We are actively collecting and documenting requirements and user stories. If
-you'd like to contribute information about your own use cases and workflow
-process, please see the links below. Generally, we separate requirements into
-two categories: study and workflow definition, and simulation management.
-
-External Location for requirements pending.**
+### Core Concepts
 
-##### Study and Workflow Definition
+There are many definitions of workflow, so we try to keep it simple and define the term as follows:
+```
+A set of high level tasks to be executed in some order, with or without dependencies on each other.
+```
 
-Anything related to describing the definition of the methodology and process.
-These requirements currently directly refer to the YAML study specification,
-which is a general way to describe workflow processes, their computing
-environment, and the steps in the methodology for producing results.
+We have designed Maestro around the core concept of what we call a "study". A study is defined as a set of steps that are executed (a workflow) over a set of parameters. A study in Maestro's context is analogous to an actual tangible scientific experiment, which has a set of clearly defined and repeatable steps which are repeated over multiple specimen.
 
-##### Simulation Management
+Maestro's core tenets are defined as follows:
 
-Functional requirements about what management capabilities the tool must be
-able to perform. Capabilities such as automatic job tracking, job restarts, and
-other functionality that a user would expect a backend system to handle without
-user intervention.
-
-----------------
+##### Repeatability
+A study should be easily repeatable. Like any well-planned and implemented science experiment, the steps themselves should be executed the exact same way each time a study is run over each set of parameters or over different runs of the study itself.
 
-## MaestroWF Core concepts
-
-The foundations of the MaestroWF package are built on classes designed to
-represent a few high level concepts which aim to have extremely clear APIs:
-* A ```StudyEnvironment``` class that contains all data representing variables,
-sourcing scripts, and dependencies that the Study requires to run.
-* A ```ParameterGenerator``` class that contains all parameters, which
-yields ```Combination``` objects that represent a valid combination of parameters
-to be used in a single instance of a Study.
-* A ```Study``` class (derived from a ```DAG```) which represents the high level
-parameterized workflow and constructs the full study from parameters and
-environment objects that it stores.
-
-### Environment
-
-The environment of a Study is represented by two classes: the ```StudyEnvionment```
-and ```ParameterGenerator``` classes.
-
-#### StudyEnvironment
-The ```StudyEnvironment``` class stores all of the fundamental items a user
-expects in the environment when executing a particular study. These items include:
-* Variables
-* Scripts
-* Dependencies
-
-Each of items stored within the ```StudyEnvironment``` is derived from the
-appropriate abstract class with the appropriate interface. Each abstract type
-requires a derived class know how to apply itself to the item being passed to it;
-and if it must acquire some external item must provide the appropriate method to
-do so. This design aims to make it so that a study is much easier to repeat (and
-with metadata easy to reproduce).
-
-#### ParameterGenerator
-The goal of the ```ParameterGenerator``` class is to provide one centralized location
- for managing and storing parameters. The implementation of the ParameterGenerator,
- currently, is very basic. It takes lists of parameters and uses those to construct
- combinations. Essentially, if you were to view this as an Excel table, you would
- have a row for each valid combination you wanted to study.
-
-The other goal is to make it so that by having the ParameterGenerator manage
-parameters, functionality can be added without affecting how the end user interacts
-with this class. The ParameterGenerator has an Iterator built in and will generate
-each combination one by one. The end user should NEVER SEE AN INVALID COMBINATION.
-Because this class generates the combinations as specified by the parameters added
-(eventually with types or enforced inheritance), it opens up being able to quietly
-change how this class generates its combinations. The iterable interface that the
-end user sees will remain constant, allowing the internal workings of
-the ```ParameterGenerator``` to remain abstracted.
-
-### Study
-
-The ```Study``` class is part of the meat and potatoes of this whole package. A
-Study object is where the intersection of the major moving parts are
-collected. These moving parts include:
-- ParameterGenerator for getting combinations of user parameters
-- StudyEnvironment for managing and applying the environment to studies
-- Study flow, which is a DAG of the abstract workflow
-
-The class is responsible for a number of the major key steps in study setup
-as well. Those responsibilities include (but are not limited to):
-- Setting up the workspace where a simulation campaign will be run.
-- Applying the StudyEnvionment to the abstract flow DAG:
-    - Creating the global workspace for a study.
-    - Setting up the parameterized workspaces for each combination.
-    - Acquiring dependencies as specified in the StudyEnvironment.
-- Intelligently constructing the expanded ExecutionDAG to be able to:
-    - Recognize when a step executes in a parameterized workspace
-    - Recognize when a step executes in the global workspace
-- Expanding the abstract flow to the full set of specified parameters.
+##### Consistent
+Studies should be consistently documented and able to be run in a consistent fashion. The removal of variation in the process means less mistakes when executing studies, ease of picking up studies created by others, and uniformity in defining new studies.
 
+##### Self-documenting
+Documentation is important in computational studies as much as it is in physical science. The YAML specification defined by Maestro provides a few required key encouraging human-readable documentation. Even further, the specification itself is a documentation of a complete workflow.
 
 ----------------
 
@@ -161,48 +53,22 @@ Once set up, test the environment. The paths should point to a virtual environme
     $ which python
     $ which pip
 
+### Installation
+
+For general installation, you can install MaestroWF using the following:
+
+    $ pip install maestrowf
+
 If you plan to develop on MaestroWF, install the repository directly using:
 
     $ pip install -r requirements.txt
     $ pip install -e .
 
 ----------------
 
-## Quickstart Example
-
-MaestroWF comes packed with a basic example using LULESH, a proxy application provided
-by LLNL. Information and source code for LULESH can be found [here](https://codesign.llnl.gov/lulesh.php).
-
-The example performs the following workflow locally:
-- Download LULESH from the webpage linked above and decompress it.
-- Substitute all necessary variables with their serial compilers and make LULESH.
-- Execute a small parameter sweep of varying size and iterations (a simple sensitivity study)
-
-Two copies of the workflow are in the ```samples/lulesh``` directory for unix and macosx.
-This is due to differences with ```sed```. In order to execute the sample study simply
-execute from the root directory of the repository:
-
-Unix:
-
-    $ maestro run ./samples/lulesh/lulesh_sample1_unix.yaml
-
-MacOSX:
-
-    $ maestro run ./samples/lulesh/lulesh_sample1_macosx.yaml
-
-When prompted, reply in the affirmative:
-
-    $ Would you like to launch the study?[yn] y
-
-Maestro will create a timestamped directory in ```sample_output/lulesh```.
-
-To monitor the study run:
-
-    $ maestro status sample_output/lulesh/<study_dir>
-
-To cancel the study:
+### Quickstart Example
 
-    $ maestro cancel sample_output/lulesh/<study_dir>
+MaestroWF comes packed with a basic example using LULESH, a proxy application provided by LLNL. You can find the Quick Start guide [here](https://maestrowf.readthedocs.io/en/latest/quick_start.html#).
 
 ----------------