Skip to content

Commit

Permalink
Updates for weasel (#200)
Browse files Browse the repository at this point in the history
* Update maintenance scripts for weasel

* Update README for weasel

* Update project READMEs for weasel

* Update spacy requirements across projects for weasel

* CI: Add WEASEL_CONFIG_OVERRIDES

* CI: Switch to python 3.8
  • Loading branch information
adrianeboyd authored Nov 9, 2023
1 parent e24a085 commit 782da3a
Show file tree
Hide file tree
Showing 86 changed files with 400 additions and 403 deletions.
3 changes: 2 additions & 1 deletion .github/update_category_docs.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from pathlib import Path
from spacy.cli._util import PROJECT_FILE, load_project_config
from weasel.cli.main import PROJECT_FILE
from weasel.util import load_project_config
from wasabi import msg, MarkdownRenderer
import typer

Expand Down
3 changes: 2 additions & 1 deletion .github/update_projects_jsonl.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from pathlib import Path
from spacy.cli._util import PROJECT_FILE, load_project_config
from weasel.cli.main import PROJECT_FILE
from weasel.util import load_project_config
from wasabi import msg
import json
import typer
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ on:
env:
# Make sure we're exiting training as early as possible
SPACY_CONFIG_OVERRIDES: '--training.max_epochs=1 --training.max_steps=1'
WEASEL_CONFIG_OVERRIDES: '--training.max_epochs=1 --training.max_steps=1'
WASABI_LOG_FRIENDLY: 1

jobs:
Expand All @@ -23,9 +24,9 @@ jobs:
matrix:
include:
- os: windows-2019
python_version: "3.7"
python_version: "3.8"
- os: ubuntu-20.04
python_version: "3.7"
python_version: "3.8"
runs-on: ${{ matrix.os }}

steps:
Expand Down
46 changes: 26 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,19 @@

# 🪐 Project Templates

[spaCy projects](https://spacy.io/usage/projects) let you manage and share
**end-to-end spaCy workflows** for different **use cases and domains**, and
[Weasel](https://github.com/explosion/weasel), previously
[spaCy projects](https://spacy.io/usage/projects), lets you manage and share
**end-to-end workflows** for different **use cases and domains**, and
orchestrate training, packaging and serving your custom pipelines. You can start
off by cloning a pre-defined project template, adjust it to fit your needs, load
in your data, train a pipeline, export it as a Python package, upload your
outputs to a remote storage and share your results with your team.

> ⚠️ spaCy project templates require [**spaCy v3**](https://spacy.io). You can
> install it from pip with `pip install spacy` or conda with
> `conda install spacy -c conda-forge`. Make sure to use a fresh virtual
> environment.
> ⚠️ Weasel project templates require
> [**Weasel**](https://github.com/explosion/weasel), which is also included by
> default with spaCy v3.7+. You can install it from pip with
> `pip install weasel` or conda with `conda install weasel -c conda-forge`. Make
> sure to use a fresh virtual environment.
>
> See the [`master` branch](https://github.com/explosion/projects/tree/master)
> for the previous version of this repo.
Expand All @@ -32,31 +34,35 @@ outputs to a remote storage and share your results with your team.

## 🚀 Quickstart

Projects can be used via the new
[`spacy project`](https://spacy.io/api/cli#project) CLI. To find out more about
a command, add `--help`. For detailed instructions, see the
[usage guide](https://spacy.io/usage/projects).

<!-- TODO: update example -->
Projects can be used via the
[`weasel`](https://github.com/explosion/weasel/blob/main/docs/cli.md) CLI, or
through the [`spacy project`](https://spacy.io/api/cli#project) alias. To find
out more about a command, add `--help`. For detailed instructions, see the
[Weasel documentation](https://github.com/explosion/weasel/tree/main#-documentation)
or [spaCy projects usage guide](https://spacy.io/usage/projects).

1. **Clone** the project template you want to use.
```bash
python -m spacy project clone tutorials/ner_fashion_brands
python -m weasel clone tutorials/ner_fashion_brands
```
2. **Fetch assets** (data, weights) defined in the `project.yml`.
2. **Install** any project requirements.
```bash
cd ner_fashion_brands
python -m spacy project assets
python -m pip install -r requirements.txt
```
3. **Fetch assets** (data, weights) defined in the `project.yml`.
```bash
python -m weasel assets
```
3. **Run a command** defined in the `project.yml`.
4. **Run a command** defined in the `project.yml`.
```bash
python -m spacy project run preprocess
python -m weasel run preprocess
```
4. **Run a workflow** of multiple steps in order.
5. **Run a workflow** of multiple steps in order.
```bash
python -m spacy project run all
python -m weasel run all
```
5. **Adjust** the template for **your specific use case**, load in your own
6. **Adjust** the template for **your specific use case**, load in your own
data, adjust the settings and model and share the result with your team.

## 👷‍♀️Repository maintanance
Expand Down
14 changes: 7 additions & 7 deletions benchmarks/healthsea_spancat/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Healthsea-Spancat
# 🪐 Weasel Project: Healthsea-Spancat

This spaCy project uses the Healthsea dataset to compare the performance between the Spancat and NER architecture.

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -29,7 +29,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -42,11 +42,11 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
| --- | --- | --- |
| `assets/annotation.jsonl` | URL | NER annotations exported from Prodigy with 5000 examples and 2 labels |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
12 changes: 6 additions & 6 deletions benchmarks/nel/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: NEL Benchmark
# 🪐 Weasel Project: NEL Benchmark

Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation).

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -36,7 +36,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -45,7 +45,7 @@ inputs have changed.
| `all` | `download_mewsli9` &rarr; `download_model` &rarr; `wikid_clone` &rarr; `preprocess` &rarr; `wikid_download_assets` &rarr; `wikid_parse` &rarr; `wikid_create_kb` &rarr; `parse_corpus` &rarr; `compile_corpora` &rarr; `train` &rarr; `evaluate` &rarr; `compare_evaluations` |
| `training` | `train` &rarr; `evaluate` |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->

Notes:
> **Warning**: Parts of this project are currently not platform-agnostic and run only on Linux. Making the entire
Expand Down
1 change: 0 additions & 1 deletion benchmarks/nel/project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
title: 'NEL Benchmark'
description: "Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation)."
spacy_version: ">=3.0.0,<3.6.0"
vars:
run: "cg-default"
language: "en"
Expand Down
1 change: 1 addition & 0 deletions benchmarks/nel/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ rapidfuzz>=2.0.0
spacyfishing
virtualenv
pysqlite3-binary
spacy>=3.0.0,<3.6.0
14 changes: 7 additions & 7 deletions benchmarks/ner_conll03/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Named Entity Recognition (CoNLL-2003)
# 🪐 Weasel Project: Named Entity Recognition (CoNLL-2003)

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -25,7 +25,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -36,7 +36,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -47,4 +47,4 @@ in the project directory.
| `assets/conll2003/train.iob` | Local | Training data (not available publicly so you have to add the file yourself) |
| `assets/orth_variants.json` | URL | A file containing orth variants for data augmentation |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/ner_embeddings/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Comparing embedding layers in spaCy
# 🪐 Weasel Project: Comparing embedding layers in spaCy

This project contains the code to reproduce the results of the
[Multi hash embeddings in spaCy](https://arxiv.org/abs/2212.09255) technical report by Explosion.
Expand Down Expand Up @@ -29,12 +29,12 @@ the hash embedding layers. We apologize for the inconvenience.

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -54,7 +54,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -66,7 +66,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -76,4 +76,4 @@ in the project directory.
| `assets/fasttext.nl.gz` | URL | Dutch fastText vectors. |
| `span-labeling-datasets` | Git | |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/parsing_penn_treebank/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Dependency Parsing (Penn Treebank)
# 🪐 Weasel Project: Dependency Parsing (Penn Treebank)

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -25,7 +25,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -36,7 +36,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -47,4 +47,4 @@ in the project directory.
| `assets/vectors.zip` | URL | GloVe vectors |
| `assets/orth_variants.json` | URL | A file containing orth variants for data augmentation |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
Loading

0 comments on commit 782da3a

Please sign in to comment.