pd (informally known as "GoPandas") is a library for cleaning, aggregating, and transforming data using Series and DataFrames. GoPandas combines a flexible API familiar to Python pandas users with the qualities of Go, including type safety, predictable error handling, and fast concurrent processing.
The API is still version 0 and subject to major revisions. Use in production code at your own risk.
Some notable features of GoPandas:
- flexible constructor that supports float, int, string, bool, time.Time, and interface Series
- seamlessly handles null data and type conversions
- well-suited to either the Jupyter notebook style of data exploration or conventional programming
- advanced filtering, grouping, and pivoting
- hierarchical indexing (i.e., multi-level indexes and columns)
- reads from either CSV or any spreadsheet or tabular data structured as [][]interface (e.g., Google Sheets)
- complete test coverage
- minimal dependencies (total package size is <10MB, compared to Pandas at >200MB)
- uses concurrent processing to achieve faster speeds than Pandas on many fundamental operations, and the performance differential becomes more pronounced with scale (6x+ superior performance summing two columns in a 500k row spreadsheet - see the most recent benchmarking table
Check out the Jupyter notebook examples in the guides. Github sometimes has trouble rendering .ipynb, backup views are here: Series, DataFrame, Options.
To run the Jupyter notebooks yourself, I recommend lgo (Docker required)
cd guides/docker
- start:
./up.sh
- stop:
./down.sh
- rebuild package to newest version:
./up.sh -r
- Requires Python 3.x and pandas
- Download data from here and save in benchmarking/profiler
go run -tags=benchmarks benchmarking/profiler/main.go