MaintKG (Maintenance Knowledge Graph) is a framework for automatically constructing knowledge graphs from maintenance work order records. It processes CMMS (Computerized Maintenance Management System) records to create structured, graph-based knowledge representations.
- Automated knowledge graph construction from maintenance records
- Built-in normalization and information extraction (NoisIE)
- Neo4j integration for graph storage and querying
- Comprehensive data processing pipeline
- Installation
- Prerequisites
- Usage
- Project Structure
- NoisIE Model
- Neo4J Database
- Contributing
- License
- Attribution
- Acknowledgements
- Contact
-
Clone the Repository
git clone https://github.com/nl-tlp/maintkg.git cd maintkg
-
Set Up Virtual Environment
python -m venv env # On Unix/macOS: source env/bin/activate # On Windows: .\env\Scripts\activate
-
Install Dependencies
pip install -e . pip install -r requirements.txt
- Python 3.9+
- Neo4j Database Server
- PyTorch (CUDA-enabled recommended)
- Virtual Environment (recommended)
Update the .env
file in the project root with your own configuration if you wish to create MaintKG from your own data. Otherwise the default will create the default graph.
# Input Settings
INPUT__CSV_FILENAME='your_file.csv'
INPUT__ID_COL='id'
INPUT__TYPE_COL='type'
# ... other settings
# Full configuration example available in `.env`
-
Prepare Your Data
- Place your CMMS data in the
./input
directory - Configure column mappings by updating the
.env
file.
- Place your CMMS data in the
-
Run the Pipeline
python ./src/maintkg/main.py
-
View Results
- Generated knowledge graphs are stored in Neo4j
- Output files are saved in
./output/YYYY-MM-DD_HH_MM-SS-MM/
maintkg/
βββ cache/ # Cache directory
β βββ .gitkeep # Placeholder for git
βββ input/ # Input data directory
β βββ README.md # Input data specifications
βββ notebooks/ # Jupyter notebooks
β βββ assets/ # Notebook resources
β β βββ images/ # Visualization images
β β βββ data/ # Sample datasets
β βββ example_queries.ipynb # MaintKG competency queries
βββ output/ # Generated artifacts
β βββ .gitkeep
β βββ YYYY-MM-DD_HH_MM-SS-MM/ # Timestamped outputs
βββ src/ # Source code
β βββ maintkg/ # Core MaintKG package
β β βββ __init__.py # Package initialization
β β βββ builder.py # Graph construction logic
β β βββ main.py # Entry point script
β β βββ models.py # Data models and schemas
β β βββ settings.py # Configuration management
β β βββ utils/ # Utility functions
β βββ noisie/ # NoisIE package
β βββ __init__.py
β βββ download_checkpoint.py # Model checkpoint downloader
β βββ lightning_logs/ # Model checkpoints
β β βββ .gitkeep
β βββ data/ # MaintNormIE corpus
β βββ README.md # Data documentation
βββ .git/ # Git repository
βββ .gitignore # Git ignore patterns
βββ .pre-commit-config.yaml # Pre-commit hooks
βββ requirements.txt # Project dependencies
βββ pyproject.toml # Project configuration
βββ LICENSE # MIT License
βββ README.md # Project documentation
NoisIE is a sequence-to-sequence normalization and semantic information extraction model that processes raw maintenance text into high-quality semantically structured output using specialised tags for normalisations, entities, and relations.
By default, the MaintKG process uses a pretrained NoisIE checkpoint. To use the pretrained NoisIE checkpoint:
python ./src/noisie/download_checkpoint.py
This will:
- Create the
./src/noisie/lightning_logs/
directory - Download and verify the model checkpoints
- Make the model available for the MaintKG pipeline
- Dataset Access: The original MaintNormIE dataset used in the thesis research requires special access. Please contact us to:
- Access the MaintNormIE dataset
- Use MaintNormIE for pretraining your own models
- Discuss custom training requirements
To retrain NoisIE on the MaintNormIE dataset or to use it as pretraining for your own dataset, please contact us.
Training data should be in JSONL format with paired input-output examples:
{
"input": "1570-3week service 2-3/3/10",
"output": "<entity> service <activity>"
}
{
"input": "pedestal bearing 3 guage faulty",
"output": "<norm> guage [ gauge ] <relation> faulty <state> gauge <object> has patient <relation> pedestal bearing <object> bearing <object> is a <relation> pedestal bearing <object> gauge <object> has part"
}
The input-output pairs follow these conventions:
- Input: Raw maintenance text
- Output: Linearized text with semantic tags:
<norm>
: Normalization annotations<entity>
: Entity spans<relation>
: Relationship markers
For detailed information about the tagging scheme, please refer to the thesis documentation.
-
Data Preparation:
- Place your JSONL dataset in
./src/noisie/data/
- Update the data path in
train.py
:# In ./src/noisie/train.py data_path = base_dir / "data" / "your_dataset.jsonl"
- Place your JSONL dataset in
-
Start Training:
python ./src/noisie/train.py
-
Monitor Progress:
- Checkpoints and logs are saved in
./src/noisie/lightning_logs/
- Track training progress using TensorBoard
- Model checkpoints are saved at regular intervals
- Checkpoints and logs are saved in
Important
Status Update: The evaluation pipeline is currently undergoing final refinements and code review. For immediate evaluation needs, please see ./model_data.py::evaluate_model
.
-
Download Neo4j
- Get Neo4j Desktop or use Docker:
docker run \ --name maintkg-neo4j \ -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/password \ neo4j:4.4
- Get Neo4j Desktop or use Docker:
-
Configure Database
# Default credentials in .env NEO4J__URI=bolt://localhost:7687 NEO4J__USERNAME=neo4j NEO4J__PASSWORD=password NEO4J__DATABASE=neo4j
To explore the exact database used in the MaintKG thesis:
-
Download the dump file:
-
Restore the database:
# Using neo4j-admin neo4j-admin load --from=/path/to/dump.dump --database=neo4j # Or with Docker docker exec maintkg-neo4j \ neo4j-admin load --from=/imports/dump.dump --database=neo4j
-
Access the database:
- Web interface: http://localhost:7474
- Bolt connection: bolt://localhost:7687
Example queries that correspond to the competency questions (CQs) outlined in the MaintKG thesis chapter can be found in ./notebooks/example_queries.ipynb
.
We welcome contributions! Please follow these steps:
-
Fork & Clone
-
Create Feature Branch
git checkout -b feature/amazing-feature
-
Follow Commit Convention
<type>(<scope>): <subject> Types: - feat: New feature - fix: Bug fix - docs: Documentation - style: Formatting - refactor: Code restructuring - test: Testing - chore: Maintenance
-
Submit PR
- Ensure tests pass
- Update documentation
- Follow code style guidelines
This project is licensed under the MIT License - see LICENSE for details.
If you use MaintKG in your research, please cite:
COMING SOON
This work was made possible by the Australian Research Centre for Transforming Maintenance through Data Science.
For questions, support, or collaboration: