Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1374896 unify structured types string representation #1882

Merged

Conversation

sfc-gh-mkubik
Copy link
Contributor

@sfc-gh-mkubik sfc-gh-mkubik commented Sep 2, 2024

Overview

SNOW-1374896

Build string representations of Snowflake structured types recursively to reuse existing converters design for specific logical types (e.g. timestamps/binary)

Code replaces the existing structured types converters implementation that was running the native getObject method with a solution that utilises reading a field vectors within the structured type and running a proper converter on each nested type. Changes are made to Array, Map and Struct converters, helper methods are added to ArrowVectorConverter interface and new ArrowStringRepresentationBuilder classes that abstract away the logic of actually building a string object out of the arrow structured type.


Follow ups:

  • pretty print - currently the builders don't add new lines or tabs to the string representation as I think it makes the code more readable but the downside is that it causes some divergence between ARROW and JSON (which is pretty printed). Potential solution is adding some setting that enables pretty print and converting it once the string is built (to avoid passing the depth to recursive toString calls)
  • recursive call of ARROW converters returns null while for JSON there's undefined which also is some kind of divergence but not necessarily something to fix as ARROW's null sounds more reasonable

example for SELECT [12, 10, 5, NULL]::ARRAY(DOUBLE)

JSON                                                         | ARROW
[                                                            | [12.0,10.0,5.0,null]
  1.200000000000000e+01,
  1.000000000000000e+01,
  5.000000000000000e+00,
  undefined
]

Pre-review self checklist

  • PR branch is updated with all the changes from master branch
  • The code is correctly formatted (run mvn -P check-style validate)
  • New public API is not unnecessary exposed (run mvn verify and inspect target/japicmp/japicmp.html)
  • The pull request name is prefixed with SNOW-XXXX:
  • Code is in compliance with internal logging requirements

@sfc-gh-mkubik sfc-gh-mkubik requested a review from a team as a code owner September 2, 2024 07:49
Copy link
Collaborator

@sfc-gh-astachowski sfc-gh-astachowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding custom to string for vectors, in case vector ever accepts types other than int and float.

@sfc-gh-mkubik
Copy link
Contributor Author

expected:<[FALSE]> but was:<[false]>

Seems I haven't change all ocurrences of upper case booleans in tests, will fix in next commit

Base automatically changed from init-converters-refactor to master September 2, 2024 14:01
sfc-gh-mkubik and others added 9 commits September 4, 2024 13:06
Move prefix and suffix cofiguration to the constructor of base builder, remove unnecessary comments, extract shouldQuote check to a super method, make valueType a constructor parameter for Array toString builder, fix tests failing due to the lowercase booleans
Add helper ArrowStringRepresentationBuilders that take care of converting recursive toString results into a valid json, taking logical type into accunt. Extract fetching logical type from field metadata to a separate static function, change boolean string representations to lowercase, add tests.
Move prefix and suffix cofiguration to the constructor of base builder, remove unnecessary comments, extract shouldQuote check to a super method, make valueType a constructor parameter for Array toString builder, fix tests failing due to the lowercase booleans
@sfc-gh-pbulawa sfc-gh-pbulawa dismissed their stale review September 4, 2024 13:04

Comments to be addressed

@sfc-gh-mkubik sfc-gh-mkubik merged commit 9e221ea into master Sep 27, 2024
140 checks passed
@sfc-gh-mkubik sfc-gh-mkubik deleted the SNOW-1374896-unify-structured-types-string-representation branch September 27, 2024 15:24
@github-actions github-actions bot locked and limited conversation to collaborators Sep 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants