Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid force_reset to make patching thread-safe #213

Closed
wants to merge 1 commit into from

Conversation

mgeisler
Copy link

Before, force_reset was used twice when instantiating the original baseclasses. It was necessary to unpatch things, since the classes were subclasses of patched classes and because the subclass referred to the super class explicitly by name.

That is, httplib has code like:

class HTTPConnection:
    ...

class HTTPSConnection(HTTPConnection):
    "This class allows communication via SSL."

    def __init__(self, host, port=None, ...):
        HTTPConnection.__init__(self, host, port, ...)

Creating a real HTTPSConnection instance is now impossible as long as httplib.HTTPConnection is patched.

To get around this, force_reset was used to temporary unpatch things. However, this means modifying global state and is thus not thread-safe. Modifying the classes once when vcr is loaded is better since no further patching and unpatching needs to be done after that.

This code in this commit uses byteplay, which is not Python 3 compatible. I think byteplay could be upsed with Python 3 compatibility, but I haven't looked much into it. The project also seems semi-abandoned with the last release done in 2010.

Also, I will agree that this kind of bytecode level patching is at best controversial :-) However, since I only use it to fix a reference to a super-class, I think it could be acceptable here.

Fixes #212 and makes the test program there run stable.

Before, force_reset was used twice when instantiating the original
baseclasses. It was necessary to unpatch things, since the classes
were subclasses of patched classes *and* because the subclass referred
to the super class explicitly by name.

That is, httplib has code like:

  class HTTPConnection:
      ...

  class HTTPSConnection(HTTPConnection):
      "This class allows communication via SSL."

      default_port = HTTPS_PORT

      def __init__(self, host, port=None, ...):
          HTTPConnection.__init__(self, host, port, ...)

Creating a real HTTPSConnection instance is now impossible as long as
httplib.HTTPConnection is patched.

To get around this, force_reset was used to temporary unpatch things.
However, this means modifying global state and is thus not
thread-safe. Modifying the classes once when vcr is loaded is better
since no further patching and unpatching needs to be done after that.

This code in this commit uses byteplay, which is not Python 3
compatible.
@kevin1024
Copy link
Owner

Wow, this is pretty mindblowing. I have to admit I haven't thought much about making VCR.py threadsafe. Is this all that is needed for it to work?

I'm obviously not too excited about adding another dependency / doing bytecode-level patching. The goal is noble though.

@colonelpanic8
Copy link
Collaborator

Before, force_reset was used twice when instantiating the original baseclasses. It was necessary to unpatch things, since the classes were subclasses of patched classes and because the subclass referred to the super class explicitly by name.

Sigh. This is why you are supposed to use super.

This code in this commit uses byteplay, which is not Python 3 compatible. I think byteplay could be upsed with Python 3 compatibility, but I haven't looked much into it. The project also seems semi-abandoned with the last release done in 2010.

That would be a dealbreaker for its inclusion in this library.

We could do something sort of hacky where we just patch the function with an exact rewrite of the existing function except with real super calls instead of explicit superclass calls (instead of the force reset).

Is this really the only reason we do a force reset here? I feel like I vaguely remember some other reason.

Wow, this is pretty mindblowing. I have to admit I haven't thought much about making VCR.py threadsafe. Is this all that is needed for it to work?

This is only threadsafety assuming that there is exactly one active cassette at any given moment, but i do believe it would give us threadsafety in thise case.

@kevin1024
Copy link
Owner

Sigh. This is why you are supposed to use super.

Even better: they are old-style classes so they actually can't use super.

@mgeisler
Copy link
Author

mgeisler commented Oct 5, 2015

@kevin1024 wrote:

Wow, this is pretty mindblowing.

Thanks, that's a much better response than I had hoped for :-)

@IvanMalison wrote:

This code in this commit uses byteplay, which is not Python 3 compatible. I think byteplay could be updated with Python 3 compatibility, but I haven't looked much into it. The project also seems semi-abandoned with the last release done in 2010.

That would be a dealbreaker for its inclusion in this library.

Yeah, I would call the patch a proof of concept at best. I plan to rewrite it as a standalone piece of code that uses the custom_patches argument for a project at work. That way people who use Python 2.7 can take it and play with it without you having to actually merge it into vcr.py directly.

I think making vcr.py threadsafe would be a good and reasonable thing. After all, making parallel HTTP requests is the prime use case of packages like concurrent.futures :-)

We could do something sort of hacky where we just patch the function with an exact rewrite of the existing function except with real super calls instead of explicit superclass calls (instead of the force reset).

Is this really the only reason we do a force reset here? I feel like I vaguely remember some other reason.

I think it is the main reason. One force_reset was added in e1f65bc to fix #130. That issue was also getting

TypeError: unbound method connect() must be called with
VCRHTTPConnection/something.yaml instance as first argument
(got HTTPSConnection instance instead)

errors, albeit in the single-threaded case.

@colonelpanic8
Copy link
Collaborator

Yeah, I would call the patch a proof of concept at best. I plan to rewrite it as a standalone piece of code that uses the custom_patches argument for a project at work. That way people who use Python 2.7 can take it and play with it without you having to actually merge it into vcr.py directly.

Cool, I'd love to take a look at that when you are done

I think making vcr.py threadsafe would be a good and reasonable thing. After all, making parallel HTTP requests is the prime use case of packages like concurrent.futures :-)

yeah agreed, I just want to make sure we do so in a way that doesn't break any existing features/compatibility

@mgeisler
Copy link
Author

In the end I simply went with a bit of code that is imported at the top of my tests:

import byteplay

def _patch_name(func, old, new):
    """Replace references to old with new in func bytecode.

    This effectively renames variables and attributes in the code for
    the function passed. When the function used to access 'old', it
    will now instead access 'new'.
    """
    bytecode = byteplay.Code.from_code(func.func_code)
    wanted_opcodes = [byteplay.LOAD_GLOBAL, byteplay.LOAD_ATTR]
    for i, (opcode, arg) in enumerate(bytecode.code):
        if opcode in wanted_opcodes and arg == old:
            bytecode.code[i] = (opcode, new)
    func.func_code = bytecode.to_code()

# Make the original HTTPSConnection class refer to the original
# HTTPConnection class, always. This avoids "unbound method __init__()
# must be called with VCRHTTPConnection<something>.yaml instance as
# first argument (got HTTPSConnection instance instead)" errors.
#
# Please see https://github.com/kevin1024/vcrpy/issues/212 for more
# background and discussion.
httplib._HTTPConnection = httplib.HTTPConnection
httplib._HTTPSConnection = httplib.HTTPSConnection
_patch_name(httplib._HTTPSConnection.__init__.im_func,
            'HTTPConnection', '_HTTPConnection')
_patch_name(httplib._HTTPSConnection.connect.im_func,
            'HTTPConnection', '_HTTPConnection')

Since the code is safe to run always (it doesn't change the behavior when no patching is done), I decided that was the easiest way.

The force_reset done by vcrpy no longer matter when __init__ and connect is patched like above, so the part of my original pull request that removed the force_reset calls can be left in.

@MSeal
Copy link

MSeal commented Apr 5, 2016

FYI there is now a python3 library for byteplay:
https://pypi.python.org/pypi/byteplay3/3.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

force_reset makes vcr non thread-safe
4 participants