I would like to share an idea about RAII-like approach
for the resource management there.
I don’t follow the Python world (my preference go for Scala, C++ or Rust),
so it’s most likely the idea I present here is not new;
someone must have "discovered" it before me.
TL;DR The idea is to use coroutine trampolining to avoid nesting with
blocks in the code, and so to allow RAII approach:
when an instance goes out of its scope, a release action is executed.
By a resource we mean anything which follows the pattern Acquire — Use — Release.
That is, a resource which is aquired should be eventually released.
A typical example from the native world is a block of memory allocated on the free store,
in C++ acquired with new
and released with delete
(C: malloc
/free
).
An example which everybody knows is opening a file (C: fopen
) which returns
a handle to a just opened file which we should later pass to some close mechanism
(C: fclose
).
Some other examples most developers have met:
TCP sockets, database connections or mutexes.
The main problem with a proper resource managemnt is ensuring that once a resource is acquired, it is released somewhere later, even in the presence of exceptions (in languages having them). This is a well known problem, so I will make it short:
def program():
f = open('file.txt', 'w')
...
f.write('Hello, world')
...
f.close()
What if the code between open
and close
returns or throws
(in the Python lingo, raises)?
Then we leave the file open "forever", which is a resource leak.
For scripts which exit immediately, this might not be a big problem,
but for every long running application we need to ensure this will not happen:
If we open more and more files and don’t close them,
the operation system will eventually reject opening one more;
if we don’t close a mutex, we have a deadlock.
An approach which C++ (and Rust and more) has taken is the
Resource acquisition is initialization idiom — RAII.
The idea is that for each resource we introduce a scope-bound resource manager
(also known as owner).
The resource is usually acquired as part of instantiation of the manager
(C++: manager contructor, mgr m(…)
),
and the release action is called automatically when the manager goes out of the scope
(C++: manager destructor, m.~mgr()
).
For brevity I will not speak about the move semantics which allows us to pass the ownership
(and hence the responsiblity to release the managed resource)
between different contexts. The file example in C++:
void program(const std::string & fname) {
std::fstream f(fname);
...
f << "Hello, world";
...
// f.~fstream() is called here, right before }
}
The important thing is that the compiler ensures that the release action is called always, on all exit paths, including the exceptional ones. One nice property is that RAII approach nests really well:
void program() {
resource1 r1(...); // acquire the resoure #1 -------+
resource2 r2(...); // acquire the resoure #2 --+ |
... // use them | |
// automatically called (note the order): | |
// r2.~resource2() // release the resoure #2 --+ |
// r1.~resource1() // release the resoure #1 -------+
}
This is the approach taken by many mainstream languages with automatic
memory management: Java, C#
(for the Python version of this, please refer to the next section).
The idea starts with try
-finally
block:
BufferedReader br = new BufferedReader(new FileReader(path));
try {
... // use `br` here
} finally {
br.close();
}
(an example from Oracle Java tutorial).
The runtime guarantees that whatever happens inside the try
block,
the finally
block will be executed.
The pattern is so common that the languages introduced some special measures. In Java, it is the try-with-resources block:
try (BufferedReader br = new BufferedReader(new FileReader(path))) {
return br.readLine();
}
We create an instance derived from a special interface Closeable
(C#: IDisposable
)
which defines the release action close()
(C#: dispose()
) called in case
of successful construction of Closeable
instance.
This pattern can lead (and often leads) to a spaghetti code
(similar in nature to the callback hell known from Javascript before await
/async
syntax)
with nested resources:
try (Resource1 r1 = new Resource1()) {
try (Resource1 r2 = new Resource2()) {
try (Resource1 r3 = new Resource3()) {
...
}
}
}
There are ways to deal with this nesting and the code shifted "too much right", but it’s not the purpose of this text to discuss them.
The Python approach is similar, but more functional-programming like (quite surprisingly in Python) — we will use this fact to our advantage later. The correct way to open a file for writing in Python is the following:
def program():
with open('file.txt', 'w') as f:
...
f.write('Hello, world')
...
The with
statement starts a block of code, introducing a new variable (as …
)
for a handle (f
) for the just acquired resource (opened file).
What happens underneath here:
-
open(…)
creates an instance of a context manager (call itctx
) which saves the parameters, but itself does nothing, just represents what it means to acquire and release the resource. (This is the functional programming face of it: to defer doing the actual side effect to somewhere else.) -
Then,
ctx.__init__()
is called, making the system call to open the file with parameters saved from thectx
initialization. -
When leaving the block of code indented after
with
, Python callsctx.__exit__()
which closes the file. The__exit__
method is called both on standard return and if an exception is raised.
As with try-with-resources in Java or C#, we often see Python codebases nesting
with
blocks and code there shifted too much right:
def program():
with resource1(...) as r1:
with resource2(...) as r2:
with resource3(...) as r3:
...
Again, there are ways to deal with this, but it requires some care which is not always seen in real codebases.
This text is not meant as an exhaustive resource on resource management — there are definitely other approaches,
the most interesting I know is a Resource[IO, T]
abstraction in the IO monad world.
See the Cats Effect implementation of it.
As mentioned above, there is a problem that nesting with
blocks causes our code
to look like spaghetti shifted too much right.
I would like to present an idea how this can be prevented
using another Python language feature, coroutines.
I have not seen this before, but I’m not a Python developer (meh)
and so it’s very likely somebody got the idea before me — yet I was not able to find any reference on Google for this
(maybe I searched for bad words).
I would like to know any prior knowledge on this:
please let me know at marekscholle@gmail.com.
The idea is to have something like RAII in Python — when a variable goes out of the scope, we want a release action to run:
def program():
r1 = <RAII> resource1()
r2 = <RAII> resource2()
r3 = <RAII> resource3()
...
# on program exit, run the release actions
# for `r1`, `r2`, `r3`, in reversed order, similarly to RAII in C++
The <RAII>
stands for some "magic" to convince Python to "register" release actions
to be run when we leave the scope.
This looks like as an impossible task in Python, but it is not.
What we want to do in the runtime is what Python allows us to do with
the with
statement at the time of writing the code:
def program():
with resource1(...) as r1:
with resource2(...) as r2:
with resource3(...) as r3:
...
i.e. we want to delegate the guarantee to call release actions to Python itself
and not "invent" some new "runtime" on top of Python runtime
(which is what IO libraries do in JVM).
At the same time, we want to avoid using with
blocks and their inherent nesting
(which probably is by design and in accordance with the rule
Explicit is better than implicit).
The idea is to not call program
directly,
but manage its execution as a coroutine execution:
def program():
r1 = yield resource1(...)
r2 = yield resource2(...)
r3 = yield resource3(...)
...
# for the implementation of the "driver" of this coroutine
# please continue reading
An oversimplified introduction:
a coroutine is a "function" from which you can return back to the caller
with yield
(instead of return
), but unlike with ordinary functions,
the caller can pass the execution back to the callee
to the point where it was leaved before (after the last yield
),
possibly passing a value there — all you need is to assign a result of yield
to a variable.
So, a coroutine execution can be driven from outside.
In the example above, the code driving the program
needs to execute it
as if it was an ordinary function
def program():
with resource1(...) as r1:
with resource2(...) as r2:
with resource3(...) as r3:
...
Without further ado, here it is:
def program():
r1 = yield resource1(...)
r2 = yield resource2(...)
r3 = yield resource3(...)
...
def run(program):
coro = program()
def stack(res):
with res as r:
next_res = coro.send(r)
stack(next_res)
stack(next(coro))
run(program)
What happens here?
-
The
run
function creates a generator from the suppliedprogram
. We save this generator ascoro
. Note thatcoro
is now suspended, i.e. prepared to be run; nothing has happened yet. -
Next,
next(coro)
is called. This actually enters the body ofprogram
and executesresource1(…)
which returns a context managerres1
for the resource #1 (not the resource handle itself as mentioned above — this is the crucial point). -
The context manager
res1
is yielded ("sent" in the sense of message passing) fromprogram
back torun
, and passed there tostack
asres
. -
Now we are at the line with the
with
statement which callsr = res1.__enter__()
. Ther
is the handle to to the just acquired resource #1. -
coro.send(r)
resumes theprogram
where it was left and sends there the handler
which is saved as local variabler1
. -
Now, the
program
continues and creates a context managerres2
for the resource #2 which is again yielded (sent) torun
and saved tonext_res
variable. -
run
continues by executingstack(next_res)
and the history repeats: we acquire the resource #2 byres2.__enter__()
ing it, theprogram
is resumed again provided the resource handle which is there saved to a local variabler2
-
And so on.
So, we gradually build the nested with
blocks inside the run
driver
and each time we make a new with
block, we resume the program
with the resource handle — and since the nesting is done inside run
(with the help of recursion instead of hardcoding it),
the program
itself is relieved from it.
Let me show you a concrete example:
from contextlib import contextmanager
@contextmanager
def resource(r):
print('resource::acquire', r)
try:
yield r
finally:
print('resource::release', r)
def program():
a = yield resource(1)
print('use a =', a)
b = yield resource(2)
print('use b =', b)
c = yield resource(3)
print('use c =', c)
assert False, "intentional error"
def run(program):
coro = program()
def stack(r):
with res as r:
next_res = coro.send(r)
stack(next_res)
stack(next(coro))
run(program)
The output:
resource::acquire 1
use a = 1
resource::acquire 2
use b = 2
resource::acquire 3
use c = 3
resource::release 3
resource::release 2
resource::release 1
Traceback (most recent call last):
...
assert False, "intentional error`
The @contextmanager
part is just a convenient way to create a context manager.
You can see that the program
itself is a nice function
(more precisely, a generator function)
without any syntactic noise
and without any nesting, yet even in the presence of exception (assert False
),
the release actions are called for r3
, r2
and r1
(in the right, reversed order).
Again: This idea I have not seen anywhere, but this does not mean I am the first person who "discovered" it. Please let me know if you have seen this before.
This is the idea itself and what follows is just an iteration / warning that there are caveats.
If we change our program
to
def program():
for i in range(1000):
a = yield resource(i)
print('use a =', a)
assert False, "intentional error"
and run
it, we get an unpleseant
RecursionError: maximum recursion depth exceeded while calling a Python object
caused by recursive calling of stack
.
I’m not a Python person, so I will present a simple solution for this, but I wouldn’t be surprised if this had a better solution — I just want to demonstrate a solution exists:
async def run(program):
coro = program()
async def stack(res):
with res as x:
next_res = coro.send(x)
next_stack = asyncio.create_task(stack(next_res))
await next_stack
await stack(next(coro))
asyncio.run(run(program))
Instead of letting the execution stack grow,
we use asyncio
to turn stack
into an "awaitable" Task
we we submit to the underlying executor.
This way, every call of stack
gets its own indepedent context
and no RecursionError
will happen.
Let us try again with this asyncio
version of run
.
The output of program
is then
resource::acquire 0
use 0
resource::acquire 1
use 1
...
resource::acquire 998
use 998
resource::acquire 999
use 999
resource::release 999
resource::release 998
...
resource::release 1
resource::release 0
Traceback (most recent call last):
...
assert False, "intentional error"
AssertionError: intentional error
Please let me know if you find this intersting
or if you have seen this trick before, making the with
statement nesting
inside a function driving a coroutine execution.
To my best knowledge, this is not published anywhere as of today,
but I don’t follow Python world and googling is often not much helpful
in getting this kind of information.
I can imagine that for the use case which made me think about the ways of resource management in Python and which requires acquiring many nested resources, this can be a revolution in code safety / clarity.
Waiting for your feedback 🙏