A Python package for managing and uploading datasets to Argilla, providing a streamlined interface for dataset creation, configuration, and management.
pip install argilla-dataset-manager
- Easy dataset creation and configuration
- Predefined templates for common dataset types
- Flexible settings management
- Built-in support for various data formats
- Type-safe implementation with mypy support
from argilla_dataset_manager.datasets import SettingsManager
from argilla_dataset_manager.utils import DatasetManager, get_argilla_client
# Initialize Argilla client
client = get_argilla_client()
# Create dataset managers
dataset_manager = DatasetManager(client)
settings_manager = SettingsManager()
# Create dataset settings
settings = settings_manager.create_text_classification(
labels=["positive", "negative", "neutral"],
guidelines="Classify the sentiment of the text",
include_metadata=True
)
# Create dataset
dataset = dataset_manager.create_dataset(
workspace="my_workspace",
dataset="sentiment_analysis",
settings=settings
)
argilla_dataset_manager/
├── datasets/
│ ├── __init__.py
│ └── settings_manager.py # Dataset settings and templates
├── utils/
│ ├── __init__.py
│ ├── argilla_client.py # Argilla client configuration
│ ├── dataset_manager.py # Dataset operations
│ ├── data_loader.py # Data loading utilities
│ ├── data_processor.py # Data processing utilities
│ └── logger.py # Logging configuration
└── __init__.py
The package provides several predefined dataset templates:
settings = settings_manager.create_text_classification(
labels=["label1", "label2"],
guidelines="Classification guidelines",
include_metadata=True
)
settings = settings_manager.create_qa_dataset(
include_context=True,
include_keywords=True,
guidelines="QA dataset guidelines"
)
settings = settings_manager.create_text_generation(
include_prompt_template=True,
include_context=True,
guidelines="Text generation guidelines"
)
settings = settings_manager.create_text_summarization(
include_metadata=True,
include_keywords=True,
guidelines="Text summarization guidelines"
)
Required environment variables:
ARGILLA_API_URL=your_argilla_instance_url
ARGILLA_API_KEY=your_api_key
HF_TOKEN=your_huggingface_token # Optional, for private spaces
- Clone the repository:
git clone https://github.com/jordanrburger/argilla_dataset_manager.git
cd argilla_dataset_manager
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install development dependencies:
pip install -e ".[dev]"
- Run tests:
pytest tests/
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.