Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define limitations on the set of characters to be used for the xdm:id property #432

Open
fmeschbe opened this issue Jul 13, 2018 · 4 comments
Labels
v0.9.4 Scheduled for v0.9.4

Comments

@fmeschbe
Copy link
Collaborator

fmeschbe commented Jul 13, 2018

In issue #419 @jbeckert comments:

Does xdm:id for Identity need a "pattern" property to reject ids with prohibited characters, e.g. characters that would mess up the usage of the value in URL path components?

This is a very valid concern and we should absolutely work through it and properly handle it.

Assuming the intent would be for this to be the "URL-safe" characters, correct ?

I propose to add two pieces:

  • Amend the description to state, that only characters are supported for the xdm:id property.
  • Define a pattern property codifying the valid set of characters for validation

The question is, what set of characters we should be supporting.

Looking at the section 3.3. Path of RFC 3986 one option would be to support pchar except percent-encoded which would be forbidden:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

But then, I am not so sure, there is any value in most of the sub-delims in an identifier. So I propose to just use unreserved plus :, @, +. This gives us a good ability for identifiers, such as UUIDs, email addresses, and even some URNs.

So the proposed pattern would be:

pattern = "^[a-zA-Z0-9:@+._~-]+$"

What are the schemas that are affected by the issue

Identity, EndUserIds, Profile, ExperienceEvent (and their extensions)

What are examples of products that are impacted by the issue

Analytics, Campaign, Ad Cloud, Target

@jbeckert
Copy link
Contributor

This looks to me like a very sensible (and safe) character set. The only downside I can see is that Base64 encoded secure hashes aren't supported as identifiers any longer. We don't allow '=' but also don't support '/' which was a problem to begin with. Oh well.

@fmeschbe
Copy link
Collaborator Author

For base64 it would be base64url which uses - and _ instead of + and /.

And we can add = to it.

@jbeckert
Copy link
Contributor

Yes, base64url would do the trick. Nice.

@fmeschbe fmeschbe added the v0.9.4 Scheduled for v0.9.4 label Jul 14, 2018
@fmeschbe
Copy link
Collaborator Author

fmeschbe commented Jul 14, 2018

To recap, then, the valid xdm:id properties must comply to the following ABNF production:

id     =  1*char
char   =  ALPHA / DIGIT / "@" / "+" / "."  / "-" / "_" / "~" / "="
ALPHA  =  %x41-5A / %x61-7A               ; A-Z / a-z
DIGIT  =  %x30-39                         ; 0-9

This production is encoded for validation as the following pattern:

pattern = "^[a-zA-Z0-9@+._~=-]+$"

This allows for email addresses, numbers, and UUIDs, but also Base64URL encoded values (with or without padding).

If we go for the URN proposal in #434 the same pattern would have to be applied to the namespace xdm:code property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v0.9.4 Scheduled for v0.9.4
Projects
None yet
Development

No branches or pull requests

2 participants