Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility for transforming Trino ROW type columns into Ruby hashes #117

Merged
merged 5 commits into from
Sep 19, 2023

Conversation

kmcq
Copy link
Contributor

@kmcq kmcq commented Sep 11, 2023

Hello! I'm a longtime user of this gem, thanks so much for creating and maintaining it ✨

Purpose

My team uses Ruby to parse a lot of data with Trino and many of the columns of the tables we handle are Trino ROW types (e.g. objects like a User). I'd like to upstream the code we've been using to parse these into Ruby hashes; we find it to be quite useful.

Overview

What's in here

Mostly I've added a new class, Trino::Client::ColumnValueParser, that parses the Trino type of a column in order to correctly transform the array that Trino returns for a ROW type into a Ruby hash. It handles ROW types that have ROW types (and ARRAY types too for what it's worth) embedded within them, as this is something we need.

We also exclusively stream results with something like

client.query("select * ...") { |q| ... }

so I've added methods to Trino::Client::Query to accommodate transforming results like this:

client.query("select * ...") do |q|
  q.each_row do |row|
    transformed_row = q.transform_row(row)
  end
end

It seemed like the best tests to modify were the ones in client_spec.rb. I updated an existing test to show that run_with_names returns arrays and then added a new spec that transforms the ROW types into hashes uses a helper method I added for folks who don't have access to the query object, like those using run_with_names.

What's not in here

Within my team's work, having ROW types returned as arrays of elements, as Trino does by default, isn't as useful as having them transformed into Hashes like this PR enables. Therefore, I wonder if modifying Trino::Client#run_with_names to do this makes sense. However, this is a breaking change, so I didn't put it in here. What do you think? If you'd like, I could do that change in a subsequent PR. We don't use run_with_names so I don't have an opinion here. I also considered adding a new method that is a run_with_names clone but with this change, but figured I'd ask what you think would be useful before doing so.

Finally, I haven't updated the README at all. I'd be happy to! But I wanted to see if you think this is useful enough for folks to add it there.

Cheers, thanks for reading ✨
-- @kmcq

Checklist

  • Code compiles correctly
  • Created tests which fail without the change (if possible)
  • All tests passing
    • I think they will, but I didn't have success running them all locally
  • Extended the README / documentation, if necessary

@kmcq kmcq requested a review from a team as a code owner September 11, 2023 17:22
Comment on lines +110 to +116
def blank?(obj)
obj.respond_to?(:empty?) ? !!obj.empty? : !obj
end

def starts_with?(str, prefix)
prefix.respond_to?(:to_str) && str[0, prefix.length] == prefix
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are adopted from ActiveSupport.

@kmcq kmcq force-pushed the parse-rows branch 2 times, most recently from 90727cb to fcb8c10 Compare September 11, 2023 21:52
Copy link
Member

@takezoe takezoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull request! I really like this idea.

Transforming structured types to objects looks fine, but transforming primitive types by default may be a problematic as the current logic doesn't seem to cover all use cases. Isn't it better to have a mechanism to control transformations and/or to allow using custom transformers?

Therefore, I wonder if modifying Trino::Client#run_with_names to do this makes sense. However, this is a breaking change, so I didn't put it in here. What do you think?

Backward compatibility is important. It would be better to add an option to run_with_names or new method.

lib/trino/client/column_value_parser.rb Show resolved Hide resolved
lib/trino/client/column_value_parser.rb Show resolved Hide resolved
lib/trino/client/column_value_parser.rb Outdated Show resolved Hide resolved
@kmcq
Copy link
Contributor Author

kmcq commented Sep 13, 2023

@takezoe thank you for reviewing this!

Transforming structured types to objects looks fine, but transforming primitive types by default may be a problematic as the current logic doesn't seem to cover all use cases. Isn't it better to have a mechanism to control transformations and/or to allow using custom transformers?

I like this idea a lot and I'll work on it 👍 I think this will cover your questions around timestamps and varbinary; we can keep these implementations that work for us, but move their definition into something that gets configured vs being a part of the library.

Map type is not handled? Also, I wonder if we can transform JSON type as well.

Great idea to handle these! I'll look into the Trino spec for these to see what will work.

Copy link
Member

@takezoe takezoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Scalar transformation is now configurable and it looks really good to me. I think including your scalar parser in this library is also fine if you want.

@kmcq
Copy link
Contributor Author

kmcq commented Sep 14, 2023

@takezoe thanks again for reviewing! I'm still looking into including support for MAP types, but that could come in a future PR as it might take me a bit to get to as I have some other things to work on for a bit. Regarding JSON, I thought about it and I think leaving it as a string could be good and letting users of the library choose to JSON.parse that if they want a Hash.

I think including your scalar parser in this library is also fine if you want.

What do you think about me adding an example to the README of using #transform_row? In that I could include a small example scalar_parser so folks can see how to pass those lambdas in.

@takezoe
Copy link
Member

takezoe commented Sep 14, 2023

What do you think about me adding an example to the README of using #transform_row? In that I could include a small example scalar_parser so folks can see how to pass those lambdas in.

Sounds good! Would you include it in this PR?

@takezoe
Copy link
Member

takezoe commented Sep 19, 2023

@kmcq For now, we will merge this PR and release new version. It would be great if you can create another PR to update README.

@takezoe takezoe merged commit e14251c into treasure-data:master Sep 19, 2023
@kmcq kmcq deleted the parse-rows branch September 19, 2023 02:56
@kmcq
Copy link
Contributor Author

kmcq commented Sep 19, 2023

Will do @takezoe! Thanks for merging and sorry I was slow to update the docs.

@yuokada
Copy link
Contributor

yuokada commented Sep 19, 2023

@kmcq Thanks for your contribution. 🎉 I published a new version, 2.1.0.
https://rubygems.org/gems/trino-client/versions/2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants