Skip to content
/ pd Public
forked from ptiger10/pd

A fast, tested, and predictable way to clean, aggregate, and transform data

License

Notifications You must be signed in to change notification settings

gloudx/pd

 
 

Repository files navigation

pd

Go Report Card GoDoc Build Status codecov License: MIT

pd (informally known as "GoPandas") is a library for cleaning, aggregating, and transforming data using Series and DataFrames. GoPandas combines a flexible API familiar to Python pandas users with the qualities of Go, including type safety, predictable error handling, and fast concurrent processing.

The API is still version 0 and subject to major revisions. Use in production code at your own risk.

Some notable features of GoPandas:

  • flexible constructor that supports float, int, string, bool, time.Time, and interface Series
  • seamlessly handles null data and type conversions
  • well-suited to either the Jupyter notebook style of data exploration or conventional programming
  • advanced filtering, grouping, and pivoting
  • hierarchical indexing (i.e., multi-level indexes and columns)
  • reads from either CSV or any spreadsheet or tabular data structured as [][]interface (e.g., Google Sheets)
  • complete test coverage
  • minimal dependencies (total package size is <10MB, compared to Pandas at >200MB)
  • uses concurrent processing to achieve faster speeds than Pandas on many fundamental operations, and the performance differential becomes more pronounced with scale (6x+ superior performance summing two columns in a 500k row spreadsheet - see the most recent benchmarking table

Getting Started

Check out the Jupyter notebook examples in the guides. Github sometimes has trouble rendering .ipynb, backup views are here: Series, DataFrame, Options.

To run the Jupyter notebooks yourself, I recommend lgo (Docker required)

  • cd guides/docker
  • start: ./up.sh
  • stop: ./down.sh
  • rebuild package to newest version: ./up.sh -r

Replicating Benchmark Tests

  • Requires Python 3.x and pandas
  • Download data from here and save in benchmarking/profiler
  • go run -tags=benchmarks benchmarking/profiler/main.go

About

A fast, tested, and predictable way to clean, aggregate, and transform data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 89.5%
  • Jupyter Notebook 9.9%
  • Other 0.6%