Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new: typetraits.getTypeId #13305

Closed
wants to merge 9 commits into from

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Jan 31, 2020

new: typetraits.getTypeId: get a unique id (int) for a type (note: static, not dynamic), can be used for type equality, reliable type hashing (more reliable than via $T), + many other use cases that would be really hard without this

this is much simpler and more efficient than something like getTypeId from https://github.com/yglukhov/variant/blob/master/variant.nim which relies on stringification

@timotheecour timotheecour changed the title new: typetraits.getTypeId [pending #13303] new: typetraits.getTypeId Jan 31, 2020
@cooldome
Copy link
Member

cooldome commented Jan 31, 2020

There is already signatureHash and symBodyHash in macros module to solve the same problem.

@timotheecour
Copy link
Member Author

timotheecour commented Jan 31, 2020

@cooldome

There is already signatureHash and symBodyHash in macros module to solve the same problem.

could you please elaborate, and show how signatureHash (or symBodyHash) can be used in place of getTypeid?
getTypeid returns a stable id (a simple int) indicating the type; not sure how I can do that with symBodyHash:

  import std/macros
  macro getTypeid2(T: typed): untyped =
    newLit signatureHash(T)

  template getTypeid3(T: typed): untyped =
    type T2 = T
    getTypeid2(T2)
  type MyInt = int
  echo getTypeid2(MyInt)
  echo getTypeid2(int)
  echo getTypeid2(int)
  # echo getTypeid2(type(1)) # Error: node is not a symbol
  echo getTypeid3(type(1))
  echo getTypeid3(type(1))
  echo getTypeid3(int)
  echo getTypeid3(int)

prints:

__9aprACdypoaazK8UMIsdo4Q
__GkpzIdbuOm8RKb9cCiLpthA
__GkpzIdbuOm8RKb9cCiLpthA
__3OdjBnrClJioDGW47qXstw
__6ZKf3oBRvMMUY1riPSCGcQ
__wO8y53sSYaadjVh1yHbxFg
__xOE39bc63vaZr1kKLr5If2g

furthermore, even if it did work with signatureHash, it's a lot more efficient as it simply returns what the compile already knows (the id field) instead of re-inventing a hash (which, alghough unlikely, could have collisions etc)

@Araq
Copy link
Member

Araq commented Feb 3, 2020

The type IDs are not stable though and will end up in some .compileTime cache if we expose them and this can produce terrible bugs once IC arrives. Whatever we expose must be as stable as possible. Git also exposes commit hashes, not simple int32 values. For good reasons.

@timotheecour
Copy link
Member Author

timotheecour commented Feb 3, 2020

I know, and I can improve the docs to make that clear, and maybe rename to unsafeGetTypeid or hide it behind an experimental flag. Note that there should still be a way to make sure compiler has stable id's even with IC anyways, eg:

We cache generic instantiations and need to ensure this caching works well with the incremental compilation feature.

Regardless, this feature solves real problems that are hard/impossible to solve without it.

Here are just 2 examples:

passing a callback to a proc that handles multiple types:

this defines a generic pretty that works with any type and can be customized by callers via ca callback function (simplified for brevity)

## prettys.nim:
  type Callback* = proc(result: var string, a: pointer, id: int): bool

  import std/typetraits

  proc pretty[T](result: var string, a: T, callback: Callback) =
    when T is object:
      result.add "("
      for k,v in fieldPairs(a):
        result.add $k & ": "
        pretty(result, v, callback)
        result.add ", "
      result.add ")"
    elif T is ref|ptr:
      if callback(result, cast[pointer](a), getTypeid(T)):
        discard
      elif a == nil:
        result.add "nil"
      else:
        pretty(result, a[], callback)
    else:
      result.add $a

  proc pretty*[T](a: T, callback: Callback): string = pretty(result, a, callback)

## main.nim:
  import prettys, std/typeid
  proc main()=
    type Foo = ref object
      x: int
    type Bar = object
      b1: Foo
      b2: string

    let f = Bar(b1: Foo(x: 12), b2: "abc")

    proc callback(ret: var string, a: pointer, id: int): bool =
      case id
      of Foo.getTypeid:
        ret.add $("custom:", cast[Foo](a).x)
        return true
      else:
        discard

    echo pretty(f, callback)

    proc callback2(ret: var string, a: pointer, id: int): bool =
      case id
      of Foo.getTypeid:
        ret.add $("custom2:", cast[Foo](a).x)
        return true
      else:
        discard

    echo pretty(f, callback2)

  main()

prints:

(b1: ("custom:", 12), b2: abc, )
(b1: ("custom2:", 12), b2: abc, )

I'm using getTypeId in my compiler patch to pretty-print (or generate html with dynamically expandable nodes) for things like PNode, PSym etc, while applying custom rules depending on the type, eg call typeToString for PType, so that it avoids showing things I don't care about, or that would cause infinite recursion, etc.

exportc callback that handles multiple types

# module a:
import ./types
use callbackFun(...)
# module types:
define types ...
proc callbackFun(a: pointer, id: int) {.importc.}
import std/typetraits
proc callbackFun*[T](a: T) = callbackFun(cast[pointer](a), getTypeid(T))

implementation of callbackFun: say, this carries a lot of dependencies that we don't want to add to all modules that use callbackFun, because it would cause cyclic imports etc

# module impl:
proc callbackFun(a: pointer, id: int) {.exportc.} =
  case id
  of getTypeid(Foo1): echo cast[Foo1](a)
  of getTypeid(Foo2): echo cast[Foo2](a)
  else: doAssert false, $id

I'm using this to customize the compiler using callbacks and it requires with very little changes to compiler sources, unlike other solutions which would require a lot of refactorings for this to work.

@Varriount
Copy link
Contributor

Wouldn't a more natural implementation use typedescs and compare them?

@timotheecour
Copy link
Member Author

timotheecour commented Feb 3, 2020

Wouldn't a more natural implementation use typedescs and compare them?

I don't see how this could work, see the 2 examples I gave in #13305 (comment) (nor does it help with type hashing); you'd be forced to use a generic in order to use typedescs, which can't work in those use cases. If you disagree, please post an answer with more details (or working code, ideally)

@Araq
Copy link
Member

Araq commented Feb 4, 2020

Your use cases do not depend on the fact that a type ID is an int. I'm arguing it should be a string.

@Varriount
Copy link
Contributor

Varriount commented Feb 5, 2020

What about making type descriptions values at runtime, as well as compile time? Transparently represent as a pointer (or string, or int or whatever)? After all, that's what getTypeID is doing anyway - retrieving a representation of a typedesc that can be used at runtime.

@timotheecour
Copy link
Member Author

What about making type descriptions values at runtime

I don't understand; if it's available at compile time it's available at runtime too, both these works:

const c1 = getTypeid(type(12))
let c2 = getTypeid(type(12))
doAssert c1 == c2

Unless you'd talking about dynamic class id in the context of object inheritance, in which case that's out of scope for this PR.

Your use cases do not depend on the fact that a type ID is an int. I'm arguing it should be a string.

I've now changed it to string instead of int (simply by stringifying id); the id itself can be changed in future implementations, and is (obviously) not assumed stable across compiler releases

@Araq
Copy link
Member

Araq commented Feb 5, 2020

I've now changed it to string instead of int (simply by stringifying id);

What's wrong with using the type's hash value though?

@timotheecour
Copy link
Member Author

timotheecour commented Feb 5, 2020

What's wrong with using the type's hash value though?

type_info::hash_code. As hashes for two different types can collide, it is useless for determining type equality [...]

std::type_index or std::type_info are better for guaranteeing uniqueness; if we're also considering shared libraries maybe the picture is a bit different, but here i'm concerned about use cases given above and id fits well

  • is there an API for that? or at least a not-yet-exposed internal compiler API ? not aware of it
    looking for this specifically:
type Foo = object
const id: string = typeHash(Foo)

the advantage of id is that the compiler already did that work for us, and in theory it should work:

  TType* {.acyclic.} = object of TIdObj # \
                              # types are identical iff they have the
                              # same id; there may be multiple copies of a type
                              # in memory!

note that's it's still "in theory", because that comment is a lie right now; which this PR over-comes to some extent (at least for all primitive types; not yet for things like tuple or seq); nothing that can't be fixed though IMO

@Varriount
Copy link
Contributor

Varriount commented Feb 6, 2020

@timotheecour As far as I know, you currently cannot create a Table[typedesc, string] at runtime - you must either use getTypeId (this proposal) or get the name of the type.

What I'm trying to suggest is allowing the above by essentially calling getTypeId behind the scenes (or representing a typedesc as some other opaque value).

@timotheecour
Copy link
Member Author

@Varriount that's because an instance of a typedesc is not a value (eg something with a fixed sizeof) so Table[typedesc, string] can't work.
However since #13064 you can do this:

import std/typetraits
type Foo = object
  x1: string
type t = (int, Foo)
var names = ["myint", "myfoo"]
echo ($t.get(0), names[0])

anyhow what you're describing is out of scope for this PR

@Araq
Copy link
Member

Araq commented Feb 6, 2020

Hash collisions are very rare and we can always change the implementation to compress(canonicalTypeRepresentationAsAString) later on. we can only do that though if the type ID is a string or a distinct string.

@timotheecour timotheecour changed the title [pending #13303] new: typetraits.getTypeId new: typetraits.getTypeId Feb 9, 2020
@timotheecour timotheecour marked this pull request as ready for review February 9, 2020 23:13
@timotheecour
Copy link
Member Author

timotheecour commented Feb 9, 2020

PTAL now that the pending PR got merged.

we can only do that though if the type ID is a string or a distinct string

it now is a string; implementation simply uses $id, but is un-specified and allowed to change in future PRs

compress(canonicalTypeRepresentationAsAString)

that doesn't seem robust, depending on what canonicalTypeRepresentationAsAString is; eg "Foo" could mean different types (bar.Foo or baz.Foo); furthermore it looks like a roundabout way to do it and less efficient than simply accessing id which the compiler already knows about

@Araq
Copy link
Member

Araq commented Feb 10, 2020

Well it has to be package.module.Foo but even that isn't correct since nowadays you can have multiple modules of the same name in the same package and nobody can ever accept useful restrictions. So now we need $relativePathWeHopeWeGotRight/Foo as the canonical name but that's a story for another day.

@Varriount
Copy link
Contributor

Would it be better to make the string distinct?

@timotheecour
Copy link
Member Author

done, PTAL

@timotheecour
Copy link
Member Author

ping @Araq

@Araq
Copy link
Member

Araq commented Feb 24, 2020

it can, but that's really something that can be done independently of this PR for reasons already mentioned.

Yeah well, but the integer ID (even if exposed as a distinct string) is fragile, who will read its documentation and ensure they are not serialized? This might be one foundation for an entity-component-system. One where every recompile is a gamble -- a recipe for instabilities.

@Araq
Copy link
Member

Araq commented Apr 20, 2020

Can't we instead make signatureHash(getTypeImpl(t)) work? For me it's the same thing.

@Varriount
Copy link
Contributor

Yeah well, but the integer ID (even if exposed as a distinct string) is fragile, who will read its documentation and ensure they are not serialized?

Yeah, that's one thing that would make me much more comfortable with this PR - disclaimers stating what API/ABI promises are being made with regards to the value being returned.

@stale
Copy link

stale bot commented Apr 20, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Apr 20, 2021
@stale stale bot removed the stale Staled PR/issues; remove the label after fixing them label Apr 21, 2021
@timotheecour timotheecour force-pushed the pr_typeid_getTypeId branch from b2af1f1 to d3b0665 Compare June 8, 2021 04:02
@timotheecour
Copy link
Member Author

timotheecour commented Jun 8, 2021

ping @Araq , I've fixed bitrot again. This PR is useful, solves real problems that are hard/impossible to do without this feature, as mention in description as well as illustrated in the tests and I'd like this feature to get merged.

I've made the return type a distinct string as was requested.

The doc comment in getTypeId clearly mentions that ids are not intended to be stable across re-compilation if sources change, at some point users just have to read the docs.

This feature is present in lots of languages, for good reasons; eg:

and even these:

signatureHash(getTypeImpl(t)) has its own issues (no uniqueness guarantee, however rare but more importantly it's not in sync with what the compiler uses to assess type equality, unlike this PR which uses id and is guaranteed to be in sync); The premise that it would offer stability or be superior because of it is questionable; comparing typeids across re-compilation after sources change seems error-prone no matter what (eg is the hash structural or nominal for objects/tuples; does it allow renames/reorderings; more crucially, is it consistent with compiler's notion of type equality).

In any case, if someone needs signatureHash(getTypeImpl(t)) later, it can always be added in future work.

@timotheecour timotheecour added the Ready For Review (please take another look): ready for next review round label Jun 8, 2021
@disruptek
Copy link
Contributor

This makes pooling continuation allocators easier.

@timotheecour
Copy link
Member Author

timotheecour commented Jun 10, 2021

signatureHash(getTypeImpl(t)) work? For me it's the same thing.

@Araq this cannot work.

  • you'd have to solve the inverse problem of finding a hash that satisfies the equivalence relationship defined by sameTypeAux(x, y: PType, c: var TSameTypeClosure) (high likelyhood of being buggy and not representative of it, it's a complex inverse problem that'd have to duplicate a lot of logic)
  • you'd have to live with fact that collisions could happen (see Collisions in type_id rust-lang/rust#10389, type_id collisions in rust, unsolved problem since 2013)
  • you'd still have to rely on compiler id's eg for symbols (which are not stable across re-compilation + source changes), which defeats the whole purpose of ignoring type ids in the first place, e.g:

A the the 1st and 2nd block are structurally equivalent, so there's no way for a signature based hash to tell them apart unless you look at symbol id's

when true:
  import std/typetraits
  proc fn[T](a: T): auto =
    static: echo "compiling fn" # this will print twice, both A's are distinct types
  block:
    type A = object
    fn(A.default)
    echo A.getTypeId
  block:
    type A = object
    fn(A.default)
    echo A.getTypeId

with this PR, I get 2 distinct ids (754974726, 754974746), which is correct. With a signature based hash, you'd either get a collision or you'd have to look at the symbol id (defeating whole purpose of ignoring the typeid, as you'd still loose any guarantees wrt stability against unrelated changes after re-compilation).

just for kicks I tried to see what hashType(t) would give, it fails even for simple cases like this:

import std/typetraits
type Foo[T] = object
echo Foo[int].getTypeIdViaHashType # __SpGTocwyb9chaUYWzXQMGSg
echo Foo[float].getTypeIdViaHashType # __SpGTocwyb9chaUYWzXQMGSg 

@n0bra1n3r
Copy link

n0bra1n3r commented Oct 14, 2021

This is useful for generating type IDs for an ECS framework for game engines, especially ones with hot code reloading. The best C++ solution I've seen seems quite hacky: https://skypjack.github.io/2020-03-14-ecs-baf-part-8/ (uses __PRETTY_FUNCTION__).

@stale
Copy link

stale bot commented Oct 14, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Oct 14, 2022
@stale stale bot closed this Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready For Review (please take another look): ready for next review round stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants