Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Oct 14, 2024 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A light-weight, flexible, and expressive statistical data testing library
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Large-scale pretraining for dialogue
Extract Transform Load for Python 3.5+
Python Stream Processing
Data and tools for generating and inspecting OLMo pre-training data.
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
All-in-one text de-duplication
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Production-ready data processing made easy and shareable
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Python Adaptive Signal Processing
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Manipulating VASP files with Python.
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."