Skip to content

Latest commit

 

History

History
46 lines (31 loc) · 1.8 KB

README.md

File metadata and controls

46 lines (31 loc) · 1.8 KB

Thund , DAG processor based on Apache Arrow.

A modern and performant/robust dag processor for data pipelines allowing processing data in/out from storages like S3 or Iceberg/delta lakes without interruptions.

Why ??

Whenever it is not feasible for an Apache Airflow / NIFI / Hadoop flying circus alike. Legacy software could remain operating on your storage/lake data in conjunction with Thund handling In/Out. For a complete modern stack combine Apache Arrows Balista/Datafusion in combination with Thund.

If you dont get it , no worries its an early experiment , perhaps "Grímnismál" (Year 1300-1325) in the Poetic Edda explains it goal better
"Thunda's waters hast'ning fleet,
Touch not Valgom! with thy feet."

Design goals are

Goals below are to be sorted for V1,V2 or V never

Functional Goals V0

  • Fix eventhandler/step arguments from simple reader to functions for create reader and writer.
  • Picture of the watcher->eventhandlers mechanics and tossing of parameters.

Functional Goals V1

  • Alloy component , Could Arrow references be used betwen Golang-Rust ?
  • Support for Arrows filesystem HDFS,
  • Incorporate RCLONE
  • Graph support
  • Add handlers to Arrow->Tantivy/Apache flight/kafka/delta-rs
  • Handlers Deployable/Callable from minifi

Functional Goals V2

  • Steps spread out on multiple Processors
  • Jaeger
  • Metrics
  • Static Deployment via ipmi
  • Deployment via kubernetes, as static as possible.

Thund in the litterature

Translations poeems describing Thund Germanic mythology
Learn pronounce in Icelandic ÓÐSMÁL