MaintKG: Automated Maintenance Knowledge Graph Construction

MaintKG (Maintenance Knowledge Graph) is a framework for automatically constructing knowledge graphs from maintenance work order records. It processes CMMS (Computerized Maintenance Management System) records to create structured, graph-based knowledge representations.

🚀 Features

Automated knowledge graph construction from maintenance records
Built-in normalization and information extraction (NoisIE)
Neo4j integration for graph storage and querying
Comprehensive data processing pipeline

🔧 Installation

Clone the Repository

git clone https://github.com/nl-tlp/maintkg.git
cd maintkg

Set Up Virtual Environment

python -m venv env
# On Unix/macOS:
source env/bin/activate
# On Windows:
.\env\Scripts\activate

Install Dependencies

pip install -e .
pip install -r requirements.txt

📦 Prerequisites

Python 3.9+
Neo4j Database Server
PyTorch (CUDA-enabled recommended)
Virtual Environment (recommended)

🔑 Environment Variables

Update the .env file in the project root with your own configuration if you wish to create MaintKG from your own data. Otherwise the default will create the default graph.

# Input Settings
INPUT__CSV_FILENAME='your_file.csv'
INPUT__ID_COL='id'
INPUT__TYPE_COL='type'
# ... other settings

# Full configuration example available in `.env`

💻 Usage

Prepare Your Data
- Place your CMMS data in the ./input directory
- Configure column mappings by updating the .env file.
Run the Pipeline
```
python ./src/maintkg/main.py
```
View Results
- Generated knowledge graphs are stored in Neo4j
- Output files are saved in ./output/YYYY-MM-DD_HH_MM-SS-MM/

📁Project Structure

maintkg/
├── cache/                          # Cache directory
│   └── .gitkeep                    # Placeholder for git
├── input/                          # Input data directory
│   └── README.md                   # Input data specifications
├── notebooks/                      # Jupyter notebooks
│   ├── assets/                     # Notebook resources
│   │   ├── images/                 # Visualization images
│   │   └── data/                   # Sample datasets
│   └── example_queries.ipynb       # MaintKG competency queries
├── output/                         # Generated artifacts
│   ├── .gitkeep
│   └── YYYY-MM-DD_HH_MM-SS-MM/    # Timestamped outputs
├── src/                           # Source code
│   ├── maintkg/                   # Core MaintKG package
│   │   ├── __init__.py           # Package initialization
│   │   ├── builder.py            # Graph construction logic
│   │   ├── main.py              # Entry point script
│   │   ├── models.py            # Data models and schemas
│   │   ├── settings.py          # Configuration management
│   │   └── utils/               # Utility functions
│   ├── noisie/                   # NoisIE package
│       ├── __init__.py
│       ├── download_checkpoint.py  # Model checkpoint downloader
│       ├── lightning_logs/      # Model checkpoints
│       │   └── .gitkeep
│       ├── data/                # MaintNormIE corpus
│           └── README.md        # Data documentation

├── .git/                        # Git repository
├── .gitignore                   # Git ignore patterns
├── .pre-commit-config.yaml      # Pre-commit hooks
├── requirements.txt             # Project dependencies
├── pyproject.toml              # Project configuration
├── LICENSE                     # MIT License
└── README.md                   # Project documentation

🤖 NoisIE Model

NoisIE is a sequence-to-sequence normalization and semantic information extraction model that processes raw maintenance text into high-quality semantically structured output using specialised tags for normalisations, entities, and relations.

Pretrained Model Setup

By default, the MaintKG process uses a pretrained NoisIE checkpoint. To use the pretrained NoisIE checkpoint:

python ./src/noisie/download_checkpoint.py

This will:

Create the ./src/noisie/lightning_logs/ directory
Download and verify the model checkpoints
Make the model available for the MaintKG pipeline

Training Custom Models

Prerequisites

Dataset Access: The original MaintNormIE dataset used in the thesis research requires special access. Please contact us to:
- Access the MaintNormIE dataset
- Use MaintNormIE for pretraining your own models
- Discuss custom training requirements

To retrain NoisIE on the MaintNormIE dataset or to use it as pretraining for your own dataset, please contact us.

Dataset Format

Training data should be in JSONL format with paired input-output examples:

{
    "input": "1570-3week service 2-3/3/10",
    "output": "<entity> service <activity>"
}
{
    "input": "pedestal bearing 3 guage faulty",
    "output": "<norm> guage [ gauge ] <relation> faulty <state> gauge <object> has patient <relation> pedestal bearing <object> bearing <object> is a <relation> pedestal bearing <object> gauge <object> has part"
}

The input-output pairs follow these conventions:

Input: Raw maintenance text
Output: Linearized text with semantic tags:
- <norm>: Normalization annotations
- <entity>: Entity spans
- <relation>: Relationship markers

For detailed information about the tagging scheme, please refer to the thesis documentation.

Training Steps

Data Preparation:
- Place your JSONL dataset in ./src/noisie/data/
- Update the data path in train.py:
```
# In ./src/noisie/train.py
data_path = base_dir / "data" / "your_dataset.jsonl"
```
Start Training:
```
python ./src/noisie/train.py
```
Monitor Progress:
- Checkpoints and logs are saved in ./src/noisie/lightning_logs/
- Track training progress using TensorBoard
- Model checkpoints are saved at regular intervals

Evaluating NoisIE

Important

Status Update: The evaluation pipeline is currently undergoing final refinements and code review. For immediate evaluation needs, please see ./model_data.py::evaluate_model.

🗄️ Neo4j Database

Installation

Download Neo4j

Get Neo4j Desktop or use Docker:

docker run \
  --name maintkg-neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:4.4

Configure Database

# Default credentials in .env
NEO4J__URI=bolt://localhost:7687
NEO4J__USERNAME=neo4j
NEO4J__PASSWORD=password
NEO4J__DATABASE=neo4j

Thesis Reference Database

To explore the exact database used in the MaintKG thesis:

Download the dump file:
- Download Neo4j Dump File

Restore the database:

# Using neo4j-admin
neo4j-admin load --from=/path/to/dump.dump --database=neo4j

# Or with Docker
docker exec maintkg-neo4j \
  neo4j-admin load --from=/imports/dump.dump --database=neo4j

Access the database:
- Web interface: http://localhost:7474
- Bolt connection: bolt://localhost:7687

Example Queries

Example queries that correspond to the competency questions (CQs) outlined in the MaintKG thesis chapter can be found in ./notebooks/example_queries.ipynb.

🤝 Contributing

We welcome contributions! Please follow these steps:

Fork & Clone
Create Feature Branch
```
git checkout -b feature/amazing-feature
```

Follow Commit Convention

<type>(<scope>): <subject>

Types:
- feat: New feature
- fix: Bug fix
- docs: Documentation
- style: Formatting
- refactor: Code restructuring
- test: Testing
- chore: Maintenance

Submit PR
- Ensure tests pass
- Update documentation
- Follow code style guidelines

📄 License

This project is licensed under the MIT License - see LICENSE for details.

🔍 Attribution

If you use MaintKG in your research, please cite:

COMING SOON

🙏 Acknowledgments

This work was made possible by the Australian Research Centre for Transforming Maintenance through Data Science.

📧 Contact

For questions, support, or collaboration:

Email: tyler.bikaun@research.uwa.edu.au

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaintKG: Automated Maintenance Knowledge Graph Construction

🚀 Features

📋 Table of Contents

🔧 Installation

📦 Prerequisites

🔑 Environment Variables

💻 Usage

📁Project Structure

🤖 NoisIE Model

Pretrained Model Setup

Training Custom Models

Prerequisites

Dataset Format

Training Steps

Evaluating NoisIE

🗄️ Neo4j Database

Installation

Thesis Reference Database

Example Queries

🤝 Contributing

📄 License

🔍 Attribution

🙏 Acknowledgments

📧 Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
cache		cache
input		input
notebooks		notebooks
output		output
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

nlp-tlp/MaintKG

Folders and files

Latest commit

History

Repository files navigation

MaintKG: Automated Maintenance Knowledge Graph Construction

🚀 Features

📋 Table of Contents

🔧 Installation

📦 Prerequisites

🔑 Environment Variables

💻 Usage

📁Project Structure

🤖 NoisIE Model

Pretrained Model Setup

Training Custom Models

Prerequisites

Dataset Format

Training Steps

Evaluating NoisIE

🗄️ Neo4j Database

Installation

Thesis Reference Database

Example Queries

🤝 Contributing

📄 License

🔍 Attribution

🙏 Acknowledgments

📧 Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages