Skip to content

Commit

Permalink
Extend documentation for reduce_agg (facebookincubator#7160)
Browse files Browse the repository at this point in the history
Summary:
Clarify requirements for reduce_agg inputs.

Also, fix formatting issues throughout the documentation.

Pull Request resolved: facebookincubator#7160

Reviewed By: xiaoxmeng

Differential Revision: D50495517

Pulled By: mbasmanova

fbshipit-source-id: 130f714d89e9a3ff1d84ea24f859fee805902c12
  • Loading branch information
mbasmanova authored and facebook-github-bot committed Oct 20, 2023
1 parent dde33b8 commit 7acc337
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 11 deletions.
7 changes: 2 additions & 5 deletions velox/docs/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ Generic Configuration
* - table_scan_getoutput_time_limit_ms
- integer
- 5000
- TableScan operator will exit getOutput() method after this many milliseconds even if it has no data to return yet.
- Zero means 'no time limit'.
- TableScan operator will exit getOutput() method after this many milliseconds even if it has no data to return yet. Zero means 'no time limit'.
* - abandon_partial_aggregation_min_rows
- integer
- 100,000
Expand Down Expand Up @@ -211,9 +210,7 @@ Spilling
* - aggregation_spill_all
- boolean
- false
- If true and spilling has been triggered during the input processing, the spiller will spill all the remaining
- in-memory state to disk before output processing. This is to simplify the aggregation query OOM prevention in
- output processing stage.
- If true and spilling has been triggered during the input processing, the spiller will spill all the remaining in-memory state to disk before output processing. This is to simplify the aggregation query OOM prevention in output processing stage.
* - join_spill_memory_threshold
- integer
- 0
Expand Down
3 changes: 1 addition & 2 deletions velox/docs/develop/operators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,7 @@ followed by the group ID column. The type of group ID column is BIGINT.
* - Property
- Description
* - groupingSets
- List of grouping key sets. Keys within each set must be unique, but keys can repeat across the sets.
- Grouping keys are specified with their output names.
- List of grouping key sets. Keys within each set must be unique, but keys can repeat across the sets. Grouping keys are specified with their output names.
* - groupingKeyInfos
- The names and order of the grouping key columns in the output.
* - aggregationInputs
Expand Down
11 changes: 11 additions & 0 deletions velox/docs/functions/presto/aggregate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,17 @@ General Aggregate Functions
The final state is returned. Throws an error if ``initialState`` is NULL or
``inputFunction`` or ``combineFunction`` returns a NULL.

Take care when designing ``initialState``, ``inputFunction`` and ``combineFunction``.
These need to support evaluating aggregation in a distributed manner using partial
aggregation on many nodes, followed by shuffle over group-by keys, followed by
final aggregation. Make sure that

combineFunction(s1, s2) = combineFunction(s2, s1) for any s1 and s2;

inputFunction(inputFunction(initialState, x), y) = combineFunction(inputFunction(initialState, x), inputFunction(initialState, y)) for any x and y

Check out `blog post about reduce_agg <https://velox-lib.io/blog/reduce-agg>`_ for more context.

Note that reduce_agg doesn't support evaluation over sorted inputs.::

-- Compute sum (for illustration purposes only; use SUM aggregate function in production queries).
Expand Down
8 changes: 4 additions & 4 deletions velox/docs/functions/spark/datetime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,21 +34,21 @@ These functions support TIMESTAMP and DATE input types.
Returns Returns the day of year of the date/timestamp. ::

SELECT dayofyear('2016-04-09'); -- 100
SELECT dayofyear('2016-04-09'); -- 100

.. spark:function:: dayofmonth(date) -> integer
Returns the day of month of the date/timestamp. ::

SELECT dayofmonth('2009-07-30'); -- 30
SELECT dayofmonth('2009-07-30'); -- 30

.. spark:function:: dayofweek(date/timestamp) -> integer
Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday).
We can use `dow` as alias for ::

SELECT dayofweek('2009-07-30'); -- 5
SELECT dayofweek('2023-08-22 11:23:00.100'); -- 3
SELECT dayofweek('2009-07-30'); -- 5
SELECT dayofweek('2023-08-22 11:23:00.100'); -- 3

.. function:: dow(x) -> integer

Expand Down

0 comments on commit 7acc337

Please sign in to comment.