Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix jvm compile unicode issues when using Python 3 #6987

Merged

Conversation

Eric-Arellano
Copy link
Contributor

When running ./pants3 test tests/python/pants_test/goal::, it was discovered that there were several unicode issues affecting the backend/jvm set of modules that the tests had not picked up.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :) Thanks!

for arg in zinc_args:
fp.write(arg)
fp.write(b'\n')
# TODO: `zinc_args` include mixed bytes and strings. The values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to fix this, if you happen to have the context in your head for the root cause...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I discovered this is a much bigger issue than just zinc_compile! I think we are registering default values for list option types as bytes. I've narrowed it down to this diff so far, which intentionally throws an error to help debug when simply running ./pants.

diff --git a/src/python/pants/option/options.py b/src/python/pants/option/opti
ons.py
index e974cd096..7c2c022d6 100644
--- a/src/python/pants/option/options.py
+++ b/src/python/pants/option/options.py
@@ -379,6 +379,9 @@ class Options(object):
     if inherit_from_enclosing_scope:
       self._check_and_apply_deprecations(scope, values)

+    ba = [a for a in values if isinstance(a, bytes)]
+    if len(ba) != 0:
+      raise ValueError(ba)
     return values

   def get_fingerprintable_for_scope(self, bottom_scope, include_passthru=False,

I'll keep investigating! Will wait to merge this PR.

Copy link
Contributor Author

@Eric-Arellano Eric-Arellano Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I investigated this for a long time today, and @OniOni checked it out on Friday.

I think I found why ./pants is using both unicode and bytes for options: Python 2 stores attributes as a map between byte strings to the corresponding values.

# in Py2
>>> [type(x) for x in dir("test")]

[<type 'str'>, ...]

I resolved the issue down to running ./pants with the following diff:

diff --git a/src/python/pants/option/option_value_container.py b/src/python/pants/option/option_value_container.py
index 3e6d824ca..c5fb50c12 100644
--- a/src/python/pants/option/option_value_container.py
+++ b/src/python/pants/option/option_value_container.py
@@ -120,6 +122,8 @@ class OptionValueContainer(object):
 
   # Support attribute setting, e.g., opts.foo = 42.
   def __setattr__(self, key, value):
+    if isinstance(key, bytes):
+     raise ValueError(key)
     if key == '_value_map':
       return super(OptionValueContainer, self).__setattr__(key, value)
     self._set(key, value)
@@ -128,6 +132,8 @@ class OptionValueContainer(object):
   # Note: Called only if regular attribute lookup fails,
   # so method and member access will be handled the normal way.
   def __getattr__(self, key):
+    if isinstance(key, bytes):
+      raise ValueError(key)
     if key == '_value_map':
       # In case we get called in copy/deepcopy, which don't invoke the ctor.
       raise AttributeError(key)

I do not think it makes sense to force the OptionValueContainer to use unicode always, because that's not how Py2 works. Instead, we keep Py2 as is, and this ends up not even being an issue in Py3 because Py3 represents attributes as unicode strings.

Allowing either bytes or unicode is bad! And confusing. We should be explicit about what the value is.

This keeps the same semantics, but is more explicit.
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :) Thanks!

@Eric-Arellano Eric-Arellano merged commit 0bc0178 into pantsbuild:master Jan 9, 2019
@Eric-Arellano Eric-Arellano deleted the fix-jvm-compile-unicode-issues branch January 9, 2019 11:31
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this pull request Mar 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants