Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to use symbols as field names instead of strings #306

Merged
merged 7 commits into from
Nov 21, 2019

Conversation

larskanis
Copy link
Collaborator

@larskanis larskanis commented Nov 10, 2019

This adds:

  • PG::Result#field_name_type=
  • PG::Result#field_name_type
  • PG::Result#field_names_as
  • PG::Connection#field_name_type=
  • PG::Connection#field_name_type

Comparison and lookup is faster on symbols than on strings, since the string hash of symbols is generated at Symbol creation. So retrieval of Symbol based field names is somewhat slower, but this is easily outperformed in frameworks that do a lot of lookups.

There are currently two ways to create hashes in C: dynamic and static symbols. Dynamic symbols are GC'ed and static symbols aren't. I added both ways to this PR. Non-GC'able symbols can have unexpected memory bloat, when field names are dynamically generated. Since both ways showed comparable timings, I think we could remove static symbols again.

Mysql2 implements symbol keys since ages, but only as static symbols. Maybe the possible memory bloat is a reason why symbolize_keys isn't used in ActiveRecord.

This raises ruby version requirement to 2.2, since it depends on GC'able dynamic symbols.

Fixes #278

@larskanis larskanis force-pushed the fields-as-symbol branch 2 times, most recently from d63dbd0 to 48f1163 Compare November 10, 2019 20:56
Copy link
Contributor

@cbandy cbandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good.

@@ -390,6 +395,27 @@ pgresult_get(VALUE self)
return this->pgresult;
}

static VALUE pg_cstr_to_sym(char *cstr, unsigned int flags, int enc_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of this function could be more clear that it is specific to field names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrug! I don't care much about static function names.

this->nfields = i + 1;

RB_GC_GUARD(fname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the call to (or return from?) pg_cstr_to_sym makes this redundant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A static function like pg_cstr_to_sym() could be inlined by the compiler, so it's not safe to protect the VALUE from being GC'ed. However a call to a non-static function is sufficient to protect it from being GC'ed, so the call to rb_obj_freeze() is (and was) already enough.

@@ -583,24 +606,23 @@ pgresult_nfields(VALUE self)

/*
* call-seq:
* res.fname( index ) -> String
* res.fname( index ) -> String or Symbol
*
* Returns the name of the column corresponding to _index_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems worth mentioning field_name_type in these cases where "string or symbol" may be returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change this.

* It can be set to one of:
* * +:string+ to use String based field names
* * +:symbol+ to use Symbol based field names
* * +:static_symbol+ to use pinned Symbol (can not be garbage collected) - Don't use this, it will probably removed in future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why deprecate :static_symbol the moment it is introduced? Is there a use case for having it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:static_symbol is the classic way to create symbols from C and it is also used by mysql2 gem. So I wanted to be able compare and benchmark it a bit more. So far I think it is mostly obsolete.

ext/pg_result.c Outdated
UNUSED(nfields);
newthis->flags = this->flags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't grok pg_new_result, so this is a naive question: should this copy happen inside pg_new_result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question! If I think about it a bit, I notice that the initial struct values are copied from this->connection in pg_new_result, but they should be copied from this instead. So currently typemap= is broken on stream_each_tuple, unless it's already set on the connection. I'll think a bit more about it, how this can be solved best.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this issue in commit 6a450fa on the master branch by introducing a result copy function instead of reusing pg_new_result. Thank you for highlighting this!

@larskanis
Copy link
Collaborator Author

Thank you @cbandy for your valuable feedback! I'll come back with an improved version.

@larskanis
Copy link
Collaborator Author

@ged I would appreciate an opinion from you. Is the API OK for you?

@ged
Copy link
Owner

ged commented Nov 18, 2019

@larskanis : Yes, this looks good to me. Just to clarify: you're including static_symbol so you can benchmark against mysql2, but it will likely be removed in favor of just symbol?

@larskanis
Copy link
Collaborator Author

@ged : No, not to compare against mysql2, just to compare the two ways of symbol allocation. I noticed that String allocation is faster, but dynamic vs. static Symbol allocation seems to have mostly the same speed. I still kept both versions, so far, just to have them easily available. And as it turned out, the static symbols have to be used on Truffleruby (and they are dynamic there). I'll remove :static_symbol type, if you don't like it. Otherwise it can be removed similar to conn.guess_result_memsize= when we know that no one needs the alternative way.

What do you think about conn.field_name_type= to set a default field_name_type for results?

@ged
Copy link
Owner

ged commented Nov 18, 2019

Ah, okay. The method name makes sense to me, as does keeping the :static_symbol setting. 👍

@larskanis
Copy link
Collaborator Author

@ged The second question was more about adding field_name_type= to PG::Connection to set the default type per connection?

@ged
Copy link
Owner

ged commented Nov 19, 2019

Oh, then yes I think that'd be useful. I guess I took its utility for granted. ;)

This adds:
* PG::Result#field_name_type=
* PG::Result#field_name_type
* PG::Result#field_names_as

Symbol comparison and lookup is faster, since the string hash is generated at Symbol creation.
So retrieval of Symbol based field names is somewhat slower, but this is easily outperformed in frameworks that do a lot of lookups.

This raises ruby version requirement to 2.2, since it depends on GC'able symbols.

Fixes ged#278
@larskanis larskanis force-pushed the fields-as-symbol branch 2 times, most recently from 0ae39a9 to fd66f35 Compare November 19, 2019 21:35
Truffleruby doesn't support rb_check_symbol_cstr() so far, but all symbols in Truffleruby are GC'd anyway.
So we can use classic ID2SYM to allocate the Symbol objects.

Discussed here: oracle/truffleruby#1814
It allows to set the field name type on a connection scope.

Added methods are:
    * PG::Connection#field_name_type=
    * PG::Connection#field_name_type
ext/pg_result.c Outdated
{
VALUE fname;
if( flags & PG_RESULT_FIELD_NAMES_SYMBOL ){
#ifndef TRUFFLERUBY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here looks correct for :symbol and :string on truffle ruby, but it looks like :static_symbol will produce strings.

It was using strings before.
@larskanis larskanis merged commit dffb658 into ged:master Nov 21, 2019
@larskanis larskanis deleted the fields-as-symbol branch November 21, 2019 20:23
@ged
Copy link
Owner

ged commented Nov 22, 2019

Excellent work as usual sir! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symbol hashes
3 participants