Skip to content

spaCy-Prodigy workflow for NER Citation model on eCFR Banking Regulation

Notifications You must be signed in to change notification settings

wesslen/spacy-ecfr-ner

Repository files navigation

🪐 spaCy Project: NER Citations of ECFR Banking Regulation in a spaCy pipeline.

Custom NER project for spaCy v3 adapted from the spaCy v3 ner_demo example script for creating an NER component in a new pipeline.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
download Download a spaCy model with pretrained vectors
data-to-spacy Merge your annotations and create data in spaCy's binary format
data-to-asset-senter Export senter annotations to assets
train-curve-ner Train curve for NER
data-to-asset-ner Export NER annotations to assets
train Train pipeline models
evaluate Evaluate the model and export metrics
prodigy-al-ner NER prodigy active learning annotaitons
prodigy-manual-ner NER prodigy manual learning annotations
package Package the trained model as a pip package
visualize-model Visualize the model's output interactively using Streamlit
setup Install dependencies
clean Remove intermediate files
document Export README for project details

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all downloadtrainevaluatepackage

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/ecfr_ner_labels.jsonl Local 400 initial NER labels of sections, cites, and laws
assets/patterns.jsonl Local Patterns for sections, cites, and laws for initial NER training
assets/ecfr_senter_labels.jsonl Local 150 initial sentence segmentations of eCFR sub-sections
assets/raw-files/ecfr-sample-sents.jsonl Local Sample of Prodigy annotated sentences from ecfr-sample-title-12.jsonl file
assets/raw-files/ecfr-sample-title-12.jsonl Local Sample of 47 records (sub-sections) from ecfr-title-12.jsonl
assets/raw-files/ecfr-title-12.jsonl Local eCFR Title 12 (Banking) parsed as a jsonl file
assets/raw-files/ecfr-title-12-sent.jsonl Local Senter scored model segmenting ecfr-title-12.jsonl

About

spaCy-Prodigy workflow for NER Citation model on eCFR Banking Regulation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages