Skip to content

Commit

Permalink
Update README.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 449791185
  • Loading branch information
CLRSDev authored and copybara-github committed May 24, 2022
1 parent eddd94e commit 433f018
Showing 1 changed file with 126 additions and 44 deletions.
170 changes: 126 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ python -m clrs.examples.run

If this is the first run of the example, the dataset will be downloaded and
stored in `--dataset_path` (default '/tmp/CLRS30').
Alternatively, you can also download and extract https://storage.googleapis.com/dm-clrs/CLRS30.tar.gz
Alternatively, you can also download and extract https://storage.googleapis.com/dm-clrs/CLRS30_v1.0.0.tar.gz

## Algorithms as graphs

Expand All @@ -65,23 +65,34 @@ ordering through inclusion of predecessor links.
For each algorithm, we provide a canonical set of *train*, *eval* and *test*
trajectories for benchmarking out-of-distribution generalization.

| | Trajectories | Problem Size |
|-------|--------------|--------------|
| Train | 1000 | 16 |
| Eval | 32 | 16 |
| Test | 32 | 64 |
| | Trajectories | Problem Size |
|-------|-----------------|--------------|
| Train | 1000 | 16 |
| Eval | 32 x multiplier | 16 |
| Test | 32 x multiplier | 64 |


where "problem size" refers to e.g. the length of an array or number of nodes in
a graph, depending on the algorithm. These trajectories can be used like so:
Here, "problem size" refers to e.g. the length of an array or number of nodes in
a graph, depending on the algorithm. "multiplier" is an algorithm-specific
factor that increases the number of available *eval* and *test* trajectories
to compensate for paucity of evaluation signals. "multiplier" is 1 for all
algorithms except:

- Maximum subarray (Kadane), for which "multiplier" is 32.
- Quick select, minimum, binary search, string matchers (both naive and KMP),
and segment intersection, for which "multiplier" is 64.

The trajectories can be used like so:

```python
train_ds, spec = clrs.create_dataset(
train_ds, num_samples, spec = clrs.create_dataset(
folder='/tmp/CLRS30', algorithm='bfs',
split='train', batch_size=32)

for feedback in train_ds.as_numpy_iterator():
model.train(feedback.features)
for i, feedback in enumerate(train_ds.as_numpy_iterator()):
if i == 0:
model.init(feedback.features, initial_seed)
loss = model.feedback(rng_key, feedback.features)
```

Here, `feedback` is a `namedtuple` with the following structure:
Expand Down Expand Up @@ -109,51 +120,82 @@ examples using JAX and the DeepMind JAX Ecosystem of libraries.
Our initial CLRS-30 benchmark includes the following 30 algorithms. We aim to
support more algorithms in the future.

- Sorting
- Insertion sort
- Bubble sort
- Heapsort (Williams, 1964)
- Quicksort (Hoare, 1962)
- Searching
- Minimum
- Binary search
- Quickselect (Hoare, 1961)
- Divide and conquer
- Maximum subarray (Kadane)
- Maximum subarray (Kadane's variant) (Bentley, 1984)
- Greedy
- Activity selection (Gavril, 1972)
- Task scheduling (Lawler, 1985)
- Dynamic programming
- Matrix chain multiplication
- Longest common subsequence
- Matrix chain order
- Optimal binary search tree
- Geometry
- Graham scan
- Jarvis' march
- Segment intersection
- Optimal binary search tree (Aho et al., 1974)
- Graphs
- Depth-first search
- Breadth-first search
- Topological sort
- Depth-first search (Moore, 1959)
- Breadth-first search (Moore, 1959)
- Topological sorting (Knuth, 1973)
- Articulation points
- Bridges
- Strongly connected components (Kosaraju)
- Minimum spanning tree (Prim)
- Minimum spanning tree (Kruskal)
- Single-source shortest-path (Bellman-Ford)
- Single-source shortest-path (Dijsktra)
- DAG shortest paths
- All-pairs shortest-path (Floyd-Warshall)
- Greedy
- Activity selector
- Task scheduling
- Searching
- Minimum
- Binary search
- Quickselect
- Sorting
- Insertion sort
- Bubble sort
- Heapsort
- Quicksort
- Kosaraju's strongly connected components algorithm (Aho et al., 1974)
- Kruskal's minimum spanning tree algorithm (Kruskal, 1956)
- Prim's minimum spanning tree algorithm (Prim, 1957)
- Bellman-Ford algorithm for single-source shortest paths (Bellman, 1958)
- Dijkstra's algorithm for single-source shortest paths (Dijkstra et al., 1959)
- Directed acyclic graph single-source shortest paths
- Floyd-Warshall algorithm for all-pairs shortest-paths (Floyd, 1962)
- Strings
- String matcher (naive)
- String matcher (Knuth-Morris-Pratt)
- Naïve string matching
- Knuth-Morris-Pratt (KMP) string matcher (Knuth et al., 1977)
- Geometry
- Segment intersection
- Graham scan convex hull algorithm (Graham, 1972)
- Jarvis' march convex hull algorithm (Jarvis, 1973)

### Baselines

We additionally provide JAX implementations of the following GNN baselines:
Models consist of a *processor* and a number of *encoders* and *decoders*.
We provide JAX implementations of the following GNN baseline processors:

- Graph Attention Networks (Velickovic et al., ICLR 2018)
- Deep Sets (Zaheer et al., NIPS 2017)
- End-to-End Memory Networks (Sukhbaatar et al., NIPS 2015)
- Graph Attention Networks (Veličković et al., ICLR 2018)
- Graph Attention Networks v2 (Brody et al., ICLR 2022)
- Message-Passing Neural Networks (Gilmer et al., ICML 2017)
- Pointer Graph Networks (Veličković et al., NeurIPS 2020)

If you want to implement a new processor, the easiest way is to add
it in the `processors.py` file and make it available through the
`get_processor_factory` method there. A processor should have a `__call__`
method like this:

```
__call__(self,
node_fts, edge_fts, graph_fts,
adj_mat, hidden,
nb_nodes, batch_size)
```

where `node_fts`, `edge_fts` and `graph_fts` will be float arrays of shape
`batch_size` x `nb_nodes` x H, `batch_size` x `nb_nodes` x `nb_nodes` x H,
and `batch_size` x H with encoded features for
nodes, edges and graph respectively, `adj_mat` a
`batch_size` x `nb_nodes` x `nb_nodes` boolean
array of connectivity built from hints and inputs, and `hidden` a
`batch_size` x `nb_nodes` x H float array with the previous-step outputs
of the processor. The method should return a `batch_size` x `nb_nodes` x H
float array.

For more fundamentally different baselines, it is necessary to create a new
class that extends the Model API (as found within `clrs/_src/model.py`).
`clrs/_src/baselines.py` provides one example of how this can be done.

## Creating your own dataset

Expand All @@ -162,6 +204,46 @@ be modified to generate different versions of the available algorithms, and it
can be built by using `tfds build` after following the installation instructions
at https://www.tensorflow.org/datasets.

Alternatively, you can generate samples without going through `tfds` by
instantiating samplers with the `build_sampler` method in
`clrs/_src/samplers.py`, like so:

```
sampler, spec = clrs.build_sampler(
name='bfs',
seed=42,
num_samples=1000,
length=16)
def _iterate_sampler(batch_size):
while True:
yield sampler.next(batch_size)
for feedback in _iterate_sampler(batch_size=32):
...
```

## Adding new algorithms

Adding a new algorithm to the task suite requires the following steps:

1. Determine the input/hint/output specification of your algorithm, and include
it within the `SPECS` dictionary of `clrs/_src/specs.py`.
2. Implement the desired algorithm in an abstractified form. Examples of this
can be found throughout the `clrs/_src/algorithms/` folder.
- Choose appropriate moments within the algorithm’s execution to create probes
that capture the inputs, outputs and all intermediate state (using
the `probing.push` function).
- Once generated, probes must be formatted using the `probing.finalize`
method, and should be returned together with the algorithm output.
3. Implement an appropriate input data sampler for your algorithm,
and include it in the `SAMPLERS` dictionary within `clrs/_src/samplers.py`.

Once the algorithm has been added in this way, it can be accessed with the
`build_sampler` method, and will also be incorporated to the dataset if
regenerated with the generator class in `dataset.py`, as described above.

## Citation

To cite the CLRS Algorithmic Reasoning Benchmark:
Expand Down

0 comments on commit 433f018

Please sign in to comment.