From 53c25e55b94e3873d51b71276a1c31ac5473ee52 Mon Sep 17 00:00:00 2001 From: GenerateMe Date: Thu, 13 Aug 2020 23:08:26 +0200 Subject: [PATCH] alpha9 --- CHANGELOG.md | 17 + README.md | 75 +- deps.edn | 2 +- docs/index.Rmd | 4 +- docs/index.html | 4611 ++++++++++++++++---------- docs/index.md | 1481 +++++---- docs/index.pdf | Bin 483032 -> 496332 bytes docs/index.tex | 2720 ++++++++------- src/tablecloth/api.clj | 3 +- src/tablecloth/api/aggregate.clj | 22 +- src/tablecloth/api/columns.clj | 44 +- src/tablecloth/api/fold_unroll.clj | 2 +- src/tablecloth/api/group_by.clj | 93 +- src/tablecloth/api/join_separate.clj | 8 +- src/tablecloth/api/order_by.clj | 4 +- src/tablecloth/api/reshape.clj | 25 + src/tablecloth/api/rows.clj | 11 +- src/tablecloth/api/unique_by.clj | 4 +- src/tablecloth/api/utils.clj | 11 + 19 files changed, 5596 insertions(+), 3541 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ebc5dd2..1ddbe24 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,22 @@ # Change Log +## [1.0.0-pre-alpha9] + +`tech.ml.dataset` version 4.03 + +### Added + +* some operations on grouped dataset can be parallel (`parallel?` option set to `true`). These are: `aggregate`, `unique-by`, `order-by`, `join-columns`, `separate-columns`, `ungroup` + +### Fixed + +* #2 - docs typo +* #3 - recover datatypes after ungrouping + +### Changed + +* `aggregation` uses now in-place ungrouping which is much faster + ## [1.0.0-pre-alpha8] `tech.ml.dataset` version 3.06 diff --git a/README.md b/README.md index 5a09ac4..f12ce3c 100644 --- a/README.md +++ b/README.md @@ -1,34 +1,55 @@ -[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth) [![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth) [![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api) +[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth) +[![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth) +[![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api) Introduction ------------ -[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger `tech.ml` stack. - -I've started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: [dplyr](https://dplyr.tidyverse.org/), [tidyr](https://tidyr.tidyverse.org/) and [data.table](https://rdatatable.gitlab.io/data.table/). - -During conversions of the examples I've come up how to reorganized existing `tech.ml.dataset` functions into simple to use API. The main goals were: - -- Focus on dataset manipulation functionality, leaving other parts of `tech.ml` like pipelines, datatypes, readers, ML, etc. -- Single entry point for common operations - one function dispatching on given arguments. -- `group-by` results with special kind of dataset - a dataset containing subsets created after grouping as a column. -- Most operations recognize regular dataset and grouped dataset and process data accordingly. +[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a +great and fast library which brings columnar dataset to the Clojure. +Chris Nuernberger has been working on this library for last year as a +part of bigger `tech.ml` stack. + +I’ve started to test the library and help to fix uncovered bugs. My main +goal was to compare functionalities with the other standards from other +platforms. I focused on R solutions: +[dplyr](https://dplyr.tidyverse.org/), +[tidyr](https://tidyr.tidyverse.org/) and +[data.table](https://rdatatable.gitlab.io/data.table/). + +During conversions of the examples I’ve come up how to reorganized +existing `tech.ml.dataset` functions into simple to use API. The main +goals were: + +- Focus on dataset manipulation functionality, leaving other parts of + `tech.ml` like pipelines, datatypes, readers, ML, etc. +- Single entry point for common operations - one function dispatching + on given arguments. +- `group-by` results with special kind of dataset - a dataset + containing subsets created after grouping as a column. +- Most operations recognize regular dataset and grouped dataset and + process data accordingly. - One function form to enable thread-first on dataset. -Important! This library is not the replacement of `tech.ml.dataset` nor a separate library. It should be considered as a addition on the top of `tech.ml.dataset`. +Important! This library is not the replacement of `tech.ml.dataset` nor +a separate library. It should be considered as a addition on the top of +`tech.ml.dataset`. -If you want to know more about `tech.ml.dataset` and `tech.ml.datatype` please refer their documentation: +If you want to know more about `tech.ml.dataset` and `tech.ml.datatype` +please refer their documentation: - [Datatype](https://github.com/techascent/tech.datatype/blob/master/docs/cheatsheet.md) - [Date/time](https://github.com/techascent/tech.datatype/blob/master/docs/datetime.md) - [Dataset](https://github.com/techascent/tech.ml.dataset/blob/master/docs/walkthrough.md) -Join the discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api) +Join the discussion on +[Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api) Documentation ------------- -Please refer [detailed documentation with examples](https://scicloj.github.io/tablecloth/index.html) +Please refer [detailed documentation with +examples](https://scicloj.github.io/tablecloth/index.html) Usage example ------------- @@ -50,18 +71,18 @@ Usage example \_unnamed \[10 3\]: -| :symbol | :year | :summary| -|---------|-------|-------------:| -| AAPL | 2000 | 21.74833341| -| AAPL | 2001 | 10.17583323| -| AAPL | 2002 | 9.40833330| -| AAPL | 2003 | 9.34749989| -| AAPL | 2004 | 18.72333336| -| AAPL | 2005 | 48.17166678| -| AAPL | 2006 | 72.04333369| -| AAPL | 2007 | 133.35333379| -| AAPL | 2008 | 138.48083242| -| AAPL | 2009 | 150.39333344| +| :symbol | :year | :summary | +|---------|-------|--------------| +| AAPL | 2000 | 21.74833333 | +| AAPL | 2001 | 10.17583333 | +| AAPL | 2002 | 9.40833333 | +| AAPL | 2003 | 9.34750000 | +| AAPL | 2004 | 18.72333333 | +| AAPL | 2005 | 48.17166667 | +| AAPL | 2006 | 72.04333333 | +| AAPL | 2007 | 133.35333333 | +| AAPL | 2008 | 138.48083333 | +| AAPL | 2009 | 150.39333333 | TODO ---- diff --git a/deps.edn b/deps.edn index 4c56496..14ba556 100644 --- a/deps.edn +++ b/deps.edn @@ -1,3 +1,3 @@ {:extra-paths ["data"] :deps {org.clojure/clojure {:mvn/version "1.10.1"} - techascent/tech.ml.dataset {:mvn/version "3.06"}}} + techascent/tech.ml.dataset {:mvn/version "4.03"}}} diff --git a/docs/index.Rmd b/docs/index.Rmd index a1cf4bc..8359a27 100644 --- a/docs/index.Rmd +++ b/docs/index.Rmd @@ -1291,7 +1291,7 @@ Additionally you may want to precalculate some values which will be visible for #### Select -Select fourth row +Select fifth row ```{clojure results="asis"} (api/select-rows DS 4) @@ -4466,7 +4466,7 @@ Expression chaining using > (-> DS (api/group-by [:V4]) (api/aggregate {:V1sum #(dfn/sum (% :V1))}) - (api/select-rows #(> (:V1sum %) 5) )) + (api/select-rows #(>= (:V1sum %) 5))) ``` ```{clojure results="asis"} diff --git a/docs/index.html b/docs/index.html index 6a7f1fd..b1d5833 100644 --- a/docs/index.html +++ b/docs/index.html @@ -11,20 +11,1271 @@ - + Dplyr-like API for tech.ml.dataset - + - - - - - - - - + + + + + + + +