Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another issue related to 'errata-query-11' (sequence versus multiset) #95

Closed
hartig opened this issue Jun 9, 2023 · 4 comments
Closed

Comments

@hartig
Copy link
Contributor

hartig commented Jun 9, 2023

When looking at errata-query-11, another question came to my mind. The suggest fix for the errata also proposes to

replace "multiset of partial functions" by "set of partial functions"

in the definition of the Aggregation operator. In particular, this proposal is about the first line of the definition, which ends as follows.

"[...] and let { key1→Ω1, ..., keym→Ωm } be a multiset of partial functions from keys to solution sequences as produced by the grouping step."

I certainly agree with the proposal to replace "multiset" by "set". However, I wonder why the definition talks about "solution sequences"?

While I see that the corresponding definition of "the grouping step"---that is, the definition of the Group operator---also says that it "produc[es] a set of partial functions from keys to solution sequences," the partial functions produced by the actual formula of this definition (as copied below) are from keys to multisets (or sets?) of solution mappings, not to sequences of solution mappings.

Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ) = ListEval(exprlist, μ') } | μ in Ω }

(Notice that Ω is a multiset of solution mappings because that is passed to the Group operator by the Evaluation of Group, where it is obtained from eval(D(G), P) with P being "the algebra translation of the GroupGraphPattern of the query level" according to the translation algorithm for aggregates in Section 18.2.4.1 Grouping and Aggregation).

Consequently, in addition to replacing "multiset" by "set", I propose to also replace "solution sequences" by "sets of solution mappings", and this latter replacement should be done not only in the definition of the Aggregation operator but also in the definition of the Group operator.

@kasei
Copy link
Contributor

kasei commented Jun 9, 2023

On the other hand, we know there is interest in things like sorted GROUP_CONCAT results. I think it might be prudent to stick with the "solution sequences" language and fix the definitions to align with that interpretation (and likely tweak the translation to ensure correct conversion from sets to sequences).

@hartig
Copy link
Contributor Author

hartig commented Jun 13, 2023

@kasei I have been looking at the options to implement your proposal. I see two main options.

Option 1
The first option is to extend the translation algorithm for aggregates in Section 18.2.4.1 Grouping and Aggregation by changing the first if block. Currently, this block looks as follows.

If Q contains GROUP BY exprlist
   Let G := Group(exprlist, P)
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
   Let G := Group((1), P)
Else
   skip the rest of the aggregate step
   End

The change would be to replace each of the two occurrences of P by ToList(P), resulting in the following.

If Q contains GROUP BY exprlist
   Let G := Group(exprlist, ToList(P))
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
   Let G := Group((1), ToList(P))
Else
   skip the rest of the aggregate step
   End

Option 2
The second option is to keep the translation algorithm as is and, instead, extend the Evaluation of Group, which currently looks as follows.

eval(D(G), Group(exprlist, P)) = Group(exprlist, eval(D(G), P))

The change would be:

eval(D(G), Group(exprlist, P)) = Group(exprlist, ToList(eval(D(G), P)))

The effect of both of these options is the same: The second argument that is passed to the Group operator is a sequence of solution mappings (rather than a multiset of solution mappings), exactly as is already claimed in the current definition of the Group operator.

The next step would then be to change the definitions of both the Group operator and the Aggregation operator to really produce and operate on partial functions from keys to solution sequences. In this context, the function M that is used in the definition of the Aggregation operator would then produce a list of tuples (rather than a set of tuples as is the case at the moment). A follow-up consequence of that is then that the set functions take a list of tuples as argument, rather than a (multi)set of tuples.

The latter makes me wonder whether we should still call them set functions in this case?

@kasei
Copy link
Contributor

kasei commented Jun 14, 2023

I think I prefer the change to happen in the translation stage. That seems to me likely to be easier to extend for things like adding an ORDER BY clause to GROUP_CONCAT. We could just add an extra step in the translation inside the For each aggregate R(args ; scalarvals) now in X loop so that handling of each aggregate gets the opportunity to add sorting to G before its use in Ai := Aggregation(args, R, scalarvals, G).

For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q
  For each unaggregated variable V in X
      Replace V with Sample(V)
      End
  For each aggregate R(args ; scalarvals) now in X

      [[[HANDLE OPTIONAL SORTING HERE]]]

      # note scalarvals may be omitted, then it's equivalent to the empty set
      Ai := Aggregation(args, R, scalarvals, G)
      Replace R(...) with aggi in Q
      i := i + 1
      End
  End

hartig added a commit that referenced this issue Jun 14, 2023
… into solution sequences rather then sets/multisets of solutions; see #95 (comment)
@hartig
Copy link
Contributor Author

hartig commented Jun 29, 2023

PR #98 has addressed this issue.

@hartig hartig closed this as completed Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants