diff --git a/docs/docs/hydroflow/ecosystem.md b/docs/docs/hydroflow/ecosystem.md index 62e10157ec9d..44199e74523b 100644 --- a/docs/docs/hydroflow/ecosystem.md +++ b/docs/docs/hydroflow/ecosystem.md @@ -4,24 +4,22 @@ sidebar_position: 4 # The Hydro Ecosystem The Hydro Project is an evolving stack of libraries and languages for distributed programming. -A rough picture of the envisioned Hydro stack is below: +A rough picture of the Hydro stack is below: ![Hydro Stack](./img/hydro_stack.png) -The core of the Hydro stack is shown in in the grey box; components that have not been implemented are in orange. +Working down from the top: -Working up from the bottom: +- [*Hydroflow+*](../hydroflow_plus) is an end-user-facing high-level [choreographic](https://en.wikipedia.org/wiki/Choreographic_programming) [dataflow](https://en.wikipedia.org/wiki/Dataflow_programming) language. Hydroflow+ is a *global* language for programming a fleet of transducers. Programmers author dataflow pipelines that start with streams of events and data, and span boundaries across multiple `process` and (scalable) `cluster` specifications. -- [Hydroplane](https://github.com/hydro-project/hydroplane) is a service for launching and monitoring Hydroflow transducers. It works both on local developer machines with Docker, and in cloud services like AWS EKS. Over time we expect to add autoscaling features to Hydroplane, to allow users to configure the cluster to grow and shrink based on the monitored behavior and a pluggable policy. +- *Hydrolysis* is a compiler that translates a global Hydroflow+ spec to multiple single-threaded Hydroflow IR programs, which collectively implement the global spec. +This compilation phase is currently a part of the Hydroflow+ codebase, but will evolve into a standalone optimizing compiler inspired by database query optimizers and [e-graphs](https://en.wikipedia.org/wiki/E-graph). -- [Hydroflow](https://github.com/hydro-project/hydroplane) is the subject of this book; a library for defining individual transducers in a distributed system. It uses the Rust compiler to generate binaries for deployment. +- [Hydroflow IR and its compiler/runtime](https://github.com/hydro-project/hydroflow/tree/main/hydroflow) are the subject of this book. +Where Hydroflow+ is a *global* language for programming a fleet of processes, Hydroflow is a *local* language for programming a single process that participates in a distributed system. More specifically, Hydroflow is an internal representation (IR) language and runtime library that generates the low-level Rust code for an individual transducer. As a low-level IR, Hydroflow is not intended for the general-purpose programmer. For most users it is intended as a readable compiler target from Hydroflow+; advanced developers can also use it to manually program individual transducers. -- *Hydrolysis* is a compiler we envision translating from Hydrologic to Hydroflow. +- [HydroDeploy](../deploy) is a service for launching Hydroflow transducers on a variety of platforms. -- *Hydrologic* is a high-level domain-specific language that we envision for distributed programming. Unlike Hydroflow, we expect Hydrologic to abstract away many notions of distributed computing. In particular, Hydrologic will be insensitive to the specific deployment of the codeā€”the partitioning of functionality and data across transducers, the number of replicas of each transducer, etc. Instead, programmers will provide specifications for desired properties like the number of failures to tolerate, the consistency desired at a given endpoint, the latency of a given endpoint, etc. The Hydrolysis compiler will then generate Hydroflow transducers that can be deployed by Hydroplane to meet those specifications. - -- [Metalift](https://github.com/metalift/metalift) is a framework for "lifting" code from one language to a (typically higher-level) language. We envision that Metalift will be used to translate from one of many distributed programming models/languages into our common Internal Representation, Hydrologic. +- Hydro also supports *Deterministic Simulation Testing* to aid in debugging distributed programs. Documentation on this feature is forthcoming. The Hydro stack is inspired by previous language stacks including [LLVM](https://llvm.org) and [Halide](https://halide-lang.org), which similarly expose multiple human-programmable Internal Representation langauges. - -An early paper on the Hydro vision appeared in CIDR 2021, under the title [New Directions in Cloud Programming](https://www.cidrdb.org/cidr2021/papers/cidr2021_paper16.pdf). \ No newline at end of file diff --git a/docs/docs/hydroflow/img/hydro_stack.png b/docs/docs/hydroflow/img/hydro_stack.png index d6fa04edcb0e..a5d87b7b1ad3 100644 Binary files a/docs/docs/hydroflow/img/hydro_stack.png and b/docs/docs/hydroflow/img/hydro_stack.png differ diff --git a/docs/docs/hydroflow/todo.md b/docs/docs/hydroflow/todo.md deleted file mode 100644 index b10714f83fd6..000000000000 --- a/docs/docs/hydroflow/todo.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -sidebar_position: 8 ---- - -# TODO - -## Concepts -- p1 (Mingwei) Hydroflow and Rust: how do they go together? - - State, control, scoping -- p1 State over time - - lifetimes - - explicit deletion -- p3 Coordination tricks? - - End-of-stream to Distributed EOS? - -## Docs -- p1 `hydroflow` struct and its methods -- p2 Review the ops docs - -## Operators not discussed - - dest_sink - - identity - - unzip - - p1 *fold* -- add to chapter on [state](./syntax/state.md) - - p1 *reduce* -- add to chapter on [state](./syntax/state.md) - - p1 *fold_keyed* -- add to chapter on [state](./syntax/state.md) - - p3 *sort_by_key* -- add to chapter on [state](./syntax/state.md) - - p2 *next_stratum* -- add to chapter on [time](./concepts/life_and_times.md) - - p2 *next_tick* -- add to chapter on [time](./concepts/life_and_times.md) - - p2 *inspect* -- add to chapter on [debugging](./concepts/debugging.md) - - p2 *null* -- add to chapter on [debugging](./concepts/debugging.md) - -## How-Tos and Examples -- p1 Lamport clocks -- p2 Vector clocks -- p2 A partitioned Service -- p2 A replicated Service -- p2 Interfacing with external data -- p2 Interfacing with external services -- p1 Illustrate `'static` and `'tick` lifetimes (KVS) -- p3 Illustrate the `next_stratum` operator for atomicity (eg Bloom's upsert `<+-` operator) -- p3 Illustrate ordered streams (need `zip` operator ... what's the example?) -- p3 Actor model implementation (Borrow an Akka or Ray Actors example?) -- p3 Futures emulation? (Borrow a Ray example) -- p2 Illustrate external storage source and sink (e.g. for WAL of KVS) - -## Odds and ends taken out of other chapters -- **Document the methods on the `hydroflow` struct** -- especially the run methods. - - The [`run_tick()`](https://hydro-project.github.io/hydroflow/doc/hydroflow/scheduled/graph/struct.Hydroflow.html#method.run_tick), [`run_stratum()`](https://hydro-project.github.io/hydroflow/doc/hydroflow/scheduled/graph/struct.Hydroflow.html#method.run_stratum), [`run()`](https://hydro-project.github.io/hydroflow/doc/hydroflow/scheduled/graph/struct.Hydroflow.html#method.run), and [`run_async()`](https://hydro-project.github.io/hydroflow/doc/hydroflow/scheduled/graph/struct.Hydroflow.html#method.run_async) methods provide other ways to control the graph execution. - - Also `run_available()` `next_stratum()` and `recv_events` are important - -- **Make sure `src/examples/echoserver` is the same as the template project** -- or better, find a way to do that via github actions or a github submodule - -## What's covered in examples -- Concepts covered - - cargo generate for templating - - Hydroflow program specs embedded in Rust - - Tokio Channels and how to use them in Hydroflow - - Network sources and sinks (`source_stream`) - - Built-in serde (`source_stream_serde`, `dest_sink_serde`) - - Hydroflow syntax: operators, ->, variables, indexing multi-input/output operators - - running Hydroflow via `run_available` and `run_async` - - Recursion via cyclic dataflow - - Fixpoints and Strata - - Template structure: `clap`, message types - - `source_stdin` - - Messages and `demux` - - broadcast pattern - - the `persist` operator to store and replay dataflow - - the `defer_signal` operator to gate a dataflow - - bootstrapping pipelines: `initialize` - -- Operators covered - - cross_join - - defer_signal - - demux - - dest_sink_serde - - difference - - filter - - filter_map - - flatten - - flat_map - - for_each - - initialize - - join - - map - - persist - - union - - source_iter - - source_stdin - - source_stream - - source_stream_serde - - tee - - union - - unique