Skip to content

Commit

Permalink
alpha9
Browse files Browse the repository at this point in the history
  • Loading branch information
genmeblog committed Aug 13, 2020
1 parent 33785dc commit 53c25e5
Show file tree
Hide file tree
Showing 19 changed files with 5,596 additions and 3,541 deletions.
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Change Log

## [1.0.0-pre-alpha9]

`tech.ml.dataset` version 4.03

### Added

* some operations on grouped dataset can be parallel (`parallel?` option set to `true`). These are: `aggregate`, `unique-by`, `order-by`, `join-columns`, `separate-columns`, `ungroup`

### Fixed

* #2 - docs typo
* #3 - recover datatypes after ungrouping

### Changed

* `aggregation` uses now in-place ungrouping which is much faster

## [1.0.0-pre-alpha8]

`tech.ml.dataset` version 3.06
Expand Down
75 changes: 48 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,55 @@
[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth) [![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth) [![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)
[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth)
[![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth)
[![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)

Introduction
------------

[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger `tech.ml` stack.

I've started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: [dplyr](https://dplyr.tidyverse.org/), [tidyr](https://tidyr.tidyverse.org/) and [data.table](https://rdatatable.gitlab.io/data.table/).

During conversions of the examples I've come up how to reorganized existing `tech.ml.dataset` functions into simple to use API. The main goals were:

- Focus on dataset manipulation functionality, leaving other parts of `tech.ml` like pipelines, datatypes, readers, ML, etc.
- Single entry point for common operations - one function dispatching on given arguments.
- `group-by` results with special kind of dataset - a dataset containing subsets created after grouping as a column.
- Most operations recognize regular dataset and grouped dataset and process data accordingly.
[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a
great and fast library which brings columnar dataset to the Clojure.
Chris Nuernberger has been working on this library for last year as a
part of bigger `tech.ml` stack.

I’ve started to test the library and help to fix uncovered bugs. My main
goal was to compare functionalities with the other standards from other
platforms. I focused on R solutions:
[dplyr](https://dplyr.tidyverse.org/),
[tidyr](https://tidyr.tidyverse.org/) and
[data.table](https://rdatatable.gitlab.io/data.table/).

During conversions of the examples I’ve come up how to reorganized
existing `tech.ml.dataset` functions into simple to use API. The main
goals were:

- Focus on dataset manipulation functionality, leaving other parts of
`tech.ml` like pipelines, datatypes, readers, ML, etc.
- Single entry point for common operations - one function dispatching
on given arguments.
- `group-by` results with special kind of dataset - a dataset
containing subsets created after grouping as a column.
- Most operations recognize regular dataset and grouped dataset and
process data accordingly.
- One function form to enable thread-first on dataset.

Important! This library is not the replacement of `tech.ml.dataset` nor a separate library. It should be considered as a addition on the top of `tech.ml.dataset`.
Important! This library is not the replacement of `tech.ml.dataset` nor
a separate library. It should be considered as a addition on the top of
`tech.ml.dataset`.

If you want to know more about `tech.ml.dataset` and `tech.ml.datatype` please refer their documentation:
If you want to know more about `tech.ml.dataset` and `tech.ml.datatype`
please refer their documentation:

- [Datatype](https://github.com/techascent/tech.datatype/blob/master/docs/cheatsheet.md)
- [Date/time](https://github.com/techascent/tech.datatype/blob/master/docs/datetime.md)
- [Dataset](https://github.com/techascent/tech.ml.dataset/blob/master/docs/walkthrough.md)

Join the discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)
Join the discussion on
[Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)

Documentation
-------------

Please refer [detailed documentation with examples](https://scicloj.github.io/tablecloth/index.html)
Please refer [detailed documentation with
examples](https://scicloj.github.io/tablecloth/index.html)

Usage example
-------------
Expand All @@ -50,18 +71,18 @@ Usage example

\_unnamed \[10 3\]:

| :symbol | :year | :summary|
|---------|-------|-------------:|
| AAPL | 2000 | 21.74833341|
| AAPL | 2001 | 10.17583323|
| AAPL | 2002 | 9.40833330|
| AAPL | 2003 | 9.34749989|
| AAPL | 2004 | 18.72333336|
| AAPL | 2005 | 48.17166678|
| AAPL | 2006 | 72.04333369|
| AAPL | 2007 | 133.35333379|
| AAPL | 2008 | 138.48083242|
| AAPL | 2009 | 150.39333344|
| :symbol | :year | :summary |
|---------|-------|--------------|
| AAPL | 2000 | 21.74833333 |
| AAPL | 2001 | 10.17583333 |
| AAPL | 2002 | 9.40833333 |
| AAPL | 2003 | 9.34750000 |
| AAPL | 2004 | 18.72333333 |
| AAPL | 2005 | 48.17166667 |
| AAPL | 2006 | 72.04333333 |
| AAPL | 2007 | 133.35333333 |
| AAPL | 2008 | 138.48083333 |
| AAPL | 2009 | 150.39333333 |

TODO
----
Expand Down
2 changes: 1 addition & 1 deletion deps.edn
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{:extra-paths ["data"]
:deps {org.clojure/clojure {:mvn/version "1.10.1"}
techascent/tech.ml.dataset {:mvn/version "3.06"}}}
techascent/tech.ml.dataset {:mvn/version "4.03"}}}
4 changes: 2 additions & 2 deletions docs/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1291,7 +1291,7 @@ Additionally you may want to precalculate some values which will be visible for

#### Select

Select fourth row
Select fifth row

```{clojure results="asis"}
(api/select-rows DS 4)
Expand Down Expand Up @@ -4466,7 +4466,7 @@ Expression chaining using >
(-> DS
(api/group-by [:V4])
(api/aggregate {:V1sum #(dfn/sum (% :V1))})
(api/select-rows #(> (:V1sum %) 5) ))
(api/select-rows #(>= (:V1sum %) 5)))
```

```{clojure results="asis"}
Expand Down
4,611 changes: 2,944 additions & 1,667 deletions docs/index.html

Large diffs are not rendered by default.

Loading

0 comments on commit 53c25e5

Please sign in to comment.