ETL, Analytics, Versioning for Unstructured Data
-
Updated
Aug 6, 2025 - Python
ETL, Analytics, Versioning for Unstructured Data
A Python toolbox for gaining geometric insights into high-dimensional data
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Machine learning with dataframes
Tools for test driven data-wrangling and data validation.
Package python to remove common ugliness from a csv-like file
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Data Cleaning with Python
A framework for data piping in python
data wrangling simplicity, complete audit transparency, and at speed
Execute OpenRefine JSON scripts without OpenRefine (or Java)
Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)
Library to make MongoDB aggregation framework and pipelines easy to use in python.
A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
🚀🤖 Cognito - Simplifies AutoML Data Preprocessing.
Fluent dataset operations, compatible with your favorite libraries
Import, maintain and export tag metadata to/from audio files and a dynamically created SQLite table. Automates incremental tag cleanup, enrichment and standardisation for your digital audio library at scale using pre-scripted SQL queries and Polars, achieving quality and consistency in your metadata not possible with a tagger
Make quick and dirty data mining made easier in Sublime Text
Wrangle messy numerical, image, and text data into consistent well-organized formats
Add a description, image, and links to the data-wrangling topic page so that developers can more easily learn about it.
To associate your repository with the data-wrangling topic, visit your repo's landing page and select "manage topics."