Refine high-quality datasets and visual AI models
-
Updated
Nov 2, 2024 - Python
Refine high-quality datasets and visual AI models
The Open Source Feature Store for Machine Learning
Always know what to expect from your data.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
A demo of Bufstream, a drop-in replacement for Apache Kafka that's 10x less expensive to operate
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
On this site I share personal thoughts about data, data governance, data quality, metadata, and side projects.
Source-available data quality tool
lakeFS - Data version control for your data lake | Git for data
Scalable data pre processing and curation toolkit for LLMs
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.
Client interface for all things Cleanlab Studio
Examples for trying out the harpin AI identity resolution and data quality toolkit
Numerical data imputation methods for extremely missing data contexts
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
This Chrome Extension automatically performs SRM checks and flags potential data quality issues on supported experimentation platforms.
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."