-
Notifications
You must be signed in to change notification settings - Fork 33
Manual
Next: Introduction, Up: (dir)
This manual is for Fibers (version 0.3.0, updated 12 October 2016)
Copyright 2016 Andy Wingo
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
• Introduction: | What’s this all about? | |
• Reference: | API reference. | |
• Pitfalls: | Stay on the happy path. | |
• Status: | Fibers is a work in progress. |
Fibers is a facility for lightweight concurrency in Guile.
• Context: | How do other systems handle concurrency? | |
• Design: | Fibers’ point in the design space. |
Next: Design, Up: Introduction
Modern machines have the raw capability to serve hundreds of thousands of simultaneous long-lived connections, but it’s often hard to manage this at the software level. Fibers tries to solve this problem in a nice way. Before discussing the approach taken in Fibers, it’s worth spending some time on history to see how we got here.
One of the most dominant patterns for concurrency these days is
“callbacks”, notably in the Twisted library for Python and the
Node.js run-time for JavaScript. The basic observation in the
callback approach to concurrency is that the efficient way to handle
tens of thousands of connections at once is with low-level operating
system facilities like poll
or epoll
. You add all of
the file descriptors that you are interested in to a “poll set” and
then ask the operating system which ones are readable or writable, as
appropriate. Once the operating system says “yes, file descriptor
7145 is readable”, you can do something with that socket; but what?
With callbacks, the answer is “call a user-supplied closure”: a
callback, representing the continuation of the computation on that
socket.
Building a network service with a callback-oriented concurrency system means breaking the program into little chunks that can run without blocking. Whereever a program could block, instead of just continuing the program, you register a callback. Unfortunately this requirement permeates the program, from top to bottom: you always pay the mental cost of inverting your program’s control flow by turning it into callbacks, and you always incur run-time cost of closure creation, even when the particular I/O could proceed without blocking. It’s a somewhat galling requirement, given that this contortion is required of the programmer, but could be done by the compiler. We Schemers demand better abstractions than manual, obligatory continuation-passing-style conversion.
Callback-based systems also encourage unstructured concurrency, as in practice callbacks are not the only path for data and control flow in a system: usually there is mutable global state as well. Without strong patterns and conventions, callback-based systems often exhibit bugs caused by concurrent reads and writes to global state.
Some of the problems of callbacks can be mitigated by using “promises” or other library-level abstractions; if you’re a Haskell person, you can think of this as lifting all possibly-blocking operations into a monad. If you’re not a Haskeller, that’s cool, neither am I! But if your typey spidey senses are tingling, it’s for good reason: with promises, your whole program has to be transformed to return promises-for-values instead of values anywhere it would block.
An obvious solution to the control-flow problem of callbacks is to use threads. In the most generic sense, a thread is a language feature which denotes an independent computation. Threads are created by other threads, but fork off and run independently instead of returning to their caller. In a system with threads, there is implicitly a scheduler somewhere that multiplexes the threads so that when one suspends, another can run.
In practice, the concept of threads is often conflated with a particular implementation, kernel threads. Kernel threads are very low-level abstractions that are provided by the operating system. The nice thing about kernel threads is that they can use any CPU that is the kernel knows about. That’s an important factor in today’s computing landscape, where Moore’s law seems to be giving us more cores instead of more gigahertz.
However, as a building block for a highly concurrent system, kernel threads have a few important problems.
One is that kernel threads simply aren’t designed to be allocated in huge numbers, and instead are more optimized to run in a one-per-CPU-core fashion. Their memory usage is relatively high for what should be a lightweight abstraction: some 10 kilobytes at least and often some megabytes, in the form of the thread’s stack. There are ongoing efforts to reduce this for some systems but we cannot expect wide deployment in the next 5 years, if ever. Even in the best case, a hundred thousand kernel threads will take at least a gigabyte of memory, which seems a bit excessive for book-keeping overhead.
Kernel threads can be a bit irritating to schedule, too: when one thread suspends, it’s for a reason, and it can be that user-space knows a good next thread that should run. However because kernel threads are scheduled in the kernel, it’s rarely possible for the kernel to make informed decisions. There are some “user-mode scheduling” facilities that are in development for some systems, but again only for some systems.
The other significant problem is that building non-crashy systems on top of kernel threads is hard to do, not to mention “correct” systems. It’s an embarrassing situation. For one thing, the low-level synchronization primitives that are typically provided with kernel threads, mutexes and condition variables, are not composable. Also, as with callback-oriented concurrency, one thread can silently corrupt another via unstructured mutation of shared state. It’s worse with kernel threads, though: a kernel thread can be interrupted at any point, not just at I/O. And though callback-oriented systems can theoretically operate on multiple CPUs at once, in practice they don’t. This restriction is sometimes touted as a benefit by proponents of callback-oriented systems, because in such a system, the callback invocations have a single, sequential order. With multiple CPUs, this is not the case, as multiple threads can run at the same time, in parallel.
Kernel threads can work. The Java virtual machine does at least manage to prevent low-level memory corruption and to do so with high performance, but still, even Java-based systems that aim for maximum concurrency avoid using a thread per connection because threads use too much memory.
In this context it’s no wonder that there’s a third strain of concurrency: shared-nothing message-passing systems like Erlang. Erlang isolates each thread (called processes in the Erlang world), giving each it its own heap and “mailbox”. Processes can spawn other processes, and the concurrency primitive is message-passing. A process that tries receive a message from an empty mailbox will “block”, from its perspective. In the meantime the system will run other processes. Message sends never block, oddly; instead, sending to a process with many messages pending makes it more likely that Erlang will pre-empt the sending process. It’s a strange tradeoff, but it makes sense when you realize that Erlang was designed for network transparency: the same message send/receive interface can be used to send messages to processes on remote machines as well.
No network is truly transparent, however. At the most basic level, the performance of network sends should be much slower than local sends. Whereas a message sent to a remote process has to be written out byte-by-byte over the network, there is no need to copy immutable data within the same address space. The complexity of a remote message send is O(n) in the size of the message, whereas a local immutable send is O(1). This suggests that hiding the different complexities behind one operator is the wrong thing to do. And indeed, given byte read and write operators over sockets, it’s possible to implement remote message send and receive as a process that serializes and parses messages between a channel and a byte sink or source. In this way we get cheap local channels, and network shims are under the programmer’s control. This is the approach that the Go language takes, and is the one we use in Fibers.
Structuring a concurrent program as separate threads that communicate over channels is an old idea that goes back to Tony Hoare’s work on “Communicating Sequential Processes” (CSP). CSP is an elegant tower of mathematical abstraction whose layers form a pattern language for building concurrent systems that you can still reason about. Interestingly, it does so without any concept of time at all, instead representing a thread’s behavior as a trace of instantaneous events. Threads themselves are like functions that unfold over the possible events to produce the actual event trace seen at run-time.
This view of events as instantaneous happenings extends to communication as well. In CSP, one communication between two threads is modelled as an instantaneous event, partitioning the traces of the two threads into “before” and “after” segments.
Practically speaking, this has ramifications in the Go language, which was heavily inspired by CSP. You might think that a channel is just a an asynchronous queue that blocks when writing to a full queue, or when reading from an empty queue. That’s a bit closer to the Erlang conception of how things should work, though as we mentioned, Erlang simply slows down writes to full mailboxes rather than blocking them entirely. However, that’s not what Go and other systems in the CSP family do; sending a message on a channel will block until there is a receiver available, and vice versa. The threads are said to “rendezvous” at the event.
Unbuffered channels have the interesting property that you can
select
between sending a message on channel a or channel
b, and in the end only one message will be sent; nothing happens
until there is a receiver ready to take the message. In this way
messages are really owned by threads and never by the channels
themselves. You can of course add buffering if you like, simply by
making a thread that waits on either sends or receives on a channel,
and which buffers sends and makes them available to receives. It’s
also possible to add explicit support for buffered channels, as Go
does, which can reduce the number of context switches as there is no
explicit buffer thread.
Whether to buffer or not to buffer is a tricky choice. It’s possible
to implement singly-buffered channels in a system like Erlang via an
explicit send/acknowlege protocol, though it seems difficult to
implement completely unbuffered channels. As we mentioned, it’s
possible to add buffering to an unbuffered system by the introduction
of explicit buffer threads. In the end though in Fibers we follow
CSP’s lead so that we can implement the nice select
behavior
that we mentioned above.
As a final point, select
is OK but is not a great language
abstraction. Say you call a function and it returns some kind of
asynchronous result which you then have to select
on. It could
return this result as a channel, and that would be fine: you can add
that channel to the other channels in your select
set and you
are good. However, what if what the function does is receive a
message on a channel, then do something with the message? In that
case the function should return a channel, plus a continuation (as a
closure or something). If select
results in a message being
received over that channel, then we call the continuation on the
message. Fine. But, what if the function itself wanted to
select
over some channels? It could return multiple channels
and continuations, but that becomes unwieldy.
What we need is an abstraction over asynchronous operations, and that
is the main idea of a CSP-derived system called “Concurrent ML”
(CML). Originally implemented as a library on top of Standard ML of
New Jersey by John Reppy, CML provides this abstraction, which in
Fibers is called an operation1. Calling
send-operation
on a channel returns an operation, which is just
a value. Operations are like closures in a way; a closure wraps up
code in its environment, which can be later called many times or not
at all. Operations likewise can be performed2 many times or not at all; performing an operation
is like calling a function. The interesting part is that you can
compose operations via the wrap-operation
and
choice-operation
combinators. The former lets you bundle up an
operation and a continuation. The latter lets you construct an
operation that chooses over a number of operations. Calling
perform-operation
on a choice operation will perform one and
only one of the choices. Performing an operation will call its
wrap-operation
continuation on the resulting values.
While it’s possible to implement Concurrent ML in terms of Go’s
channels and baked-in select
statement, it’s more expressive to
do it the other way around, as that also lets us implement other
operations types besides channel send and receive, for example
timeouts and condition variables.
Previous: Context, Up: Introduction
In Fibers, the unit of computation is the fiber, a lightweight
thread managed by Guile. A fiber communicates with the world via
normal Guile ports: get-bytevector
, put-string
, and all
that. Between themselves, fibers send and receive Scheme values over
channels.
Whenever a fiber tries to read but no data is available, or tries to write but no data can be written, Guile will suspend the fiber and arrange for it to be resumed when the port or channel operation can proceed. In the meantime, Guile will run other fibers. When no fiber is runnable, Guile will use efficient system facilities to sleep until input or output can proceed.
When a fiber would block, it suspends to the scheduler from the
current thread. The scheduler will arrange to re-start the fiber when
the port or channel becomes readable or writable, as appropriate. For
ports, the scheduler adds the file descriptor associated with the port
to an epoll
set. In either case, the scheduler remembers which
fibers are waiting and for what, so that the user can inspect the
state of their system.
If no scheduler has been installed in the current thread, the fiber will... well, we don’t know yet! Either it blocks its thread, or it aborts. We don’t know.
On the Scheme level, a fiber is a delimited continuation. When a scheduler runs a fiber, it does so within a prompt; when the fiber suspends, it suspends to the prompt. The scheduler saves the resulting continuation as part of the fiber’s state. In this way the per-fiber computational state overhead is just the size of the pending stack frames of the fiber, which can be just a handful of words.
Currently fibers are pinned to the kernel thread in which they are created. We should probably implement some kind of work-stealing behavior so that if you choose to devote multiple CPU cores to servicing fibers, that they can share the workload.
Ports are how fibers communicate with the world; channels are how fibers communicate with each other. Channels are meeting places between fibers. A fiber that goes to send a message over a channel will block until there is a fiber ready to receive the message, and vice versa. Once both parties are ready, the message is exchanged and both parties resume. There can be multiple fibers waiting to read and write on a channel, allowing channels to express not only pipelines but also common concurrency patterns such as fan-in and fan-out.
Unlike Erlang channels, channels in Fibers are purely local and do not attempt to provide the illusion of network transparency. This does have the positive advantage that we are able to provide better backpressure support than Erlang, blocking when no receiver is available to handle a message instead of letting the sender keep sending many messages.
On the other hand, currently fibers are not preemptively scheduled. A fiber will only suspend when it would block on channel receive or send, or on read or write on a port. This would be an OK point in the design space if only one kernel thread could be running fibers at once, as in Node.js. However given that this is not the case, Fibers does not have many of the concurrency invariants that such systems enjoy, so perhaps we should support preemption in the future.
To avoid starvation, a fiber can only run once within a “turn”. Each turn starts with a poll on file descriptors of interest and marks the associated fibers as runnable. If no fiber is runnable at the start of the poll, the poll call will ask the kernel to wait for a runnable descriptor. Otherwise the poll call will ask the kernel to return immediately. There is an additional FD added to the poll set that is used to interrupt a blocking poll, for example if a fiber becomes runnable due to I/O on a channel from a separate kernel thread while the first scheduler was still polling.
To enable expressive cross-kernel-thread communications, channel sends and receives are atomic and thread-safe.
To start scheduling fibers, user code will typically create a scheduler, instate it on the thread, add some fibers, then run the scheduler. That call to run the scheduler will only return when there there are no more fibers waiting to be scheduled.
Next: Pitfalls, Previous: Introduction, Up: Top
Fibers is a library built on Guile, consisting of a public interface, base support for asynchronous operations, implementations of operations for channels and timers, and an internals interface.
• Using Fibers: | User-facing interface to fibers | |
• Operations: | Composable abstractions for concurrency. | |
• Channels: | Share memory by communicating. | |
• Timers: | Operations on time. | |
• Internals: | Scheduler and fiber objects and operations. |
Next: Operations, Up: Reference
The public interface of fibers right now is quite minimal. To use it,
import the (fibers)
module:
(use-modules (fibers))
To create a new fibers scheduler and run it in the current Guile
thread, use run-fibers
.
-
Function: run-fibers [init-thunk=
#f
] [#:install-suspendable-ports?=#t
] [#:scheduler=(make-scheduler)
] [#:keep-scheduler?] -
Run init-thunk within a fiber in a fresh scheduler, blocking until the scheduler has no more runnable fibers. Return the value(s) returned by the call to init-thunk.
For example:
(run-fibers (lambda () 1)) ⇒ 1
(run-fibers (lambda () (spawn-fiber (lambda () (display "hey!\n"))))) -| hey!
Calling
run-fibers
will ensure that Guile’s port implementation allows fibers to suspend if a read or a write on a port would block. See Non-Blocking I/O in Guile Reference Manual, for more details on suspendable ports. If for some reason you want port reads or writes to prevent other fibers from running, pass#f
as the#:install-suspendable-ports?
keyword argument.By default,
run-fibers
will create a fresh scheduler. If you happen to have a pre-existing scheduler (because you used the internals interface to create one), you can pass it torun-fibers
using the#:scheduler
keyword argument.The scheduler will be destroyed when
run-fibers
finishes, unless the scheduler was already “current” (see Internals). If you need to keep the scheduler, either make sure it is current or explicitly pass#t
as the#:keep-scheduler?
keyword argument.
-
Function: spawn-fiber thunk [#:scheduler=
(require-current-scheduler)
] -
Spawn a new fiber that will run thunk. Return the new fiber. The new fiber will run concurrently with other fibers.
The fiber will be added to the current scheduler, which is usually what you want. It’s also possible to spawn the fiber on a specific scheduler, which is useful to ensure that the fiber runs on a different kernel thread. In that case, pass the
#:scheduler
keyword argument.Currently, fibers will only ever be run within the scheduler to which they are first added, which effectively pins them to a single kernel thread. This limitation may be relaxed in the future.
- Function: current-fiber
Return the current fiber, or
#f
if not called within the dynamic extent of a thunk passed tospawn-fiber
.
- Function: sleep seconds
Wake up the current fiber after seconds of wall-clock time have elapsed. This definition will replace the binding for
sleep
in the importing module, effectively overriding Guile’s “core” definition.
Next: Channels, Previous: Using Fibers, Up: Reference
Operations are first-class abstractions for asynchronous events.
There are primitive operation types, such as waiting for a timer
(see Timers) or waiting for a message on a channel
(see Channels). Operations can also be combined and transformed
using the choice-operation
and wrap-operation
from this module:
(use-modules (fibers operations))
- Function: wrap-operation op f
Given the operation op, return a new operation that, if and when it succeeds, will apply f to the values yielded by performing op, and yield the result as the values of the wrapped operation.
- Function: choice-operation . ops
Given the operations ops, return a new operation that if it succeeds, will succeed with one and only one of the sub-operations ops.
Finally, once you have an operation, you can perform it using
perform-operation
.
- Function: perform-operation op
Perform the operation op and return the resulting values. If the operation cannot complete directly, block until it can complete.
See Introduction, for more on the “Concurrent ML” system that introduced the concept of the operation abstraction.
There is also a low-level constructor for other modules that implement primitive operation types:
This is a low-level constructor, though; if you ever feel the need to
call make-base-operation
, make sure you’re familiar with the
Concurrent ML literature. Godspeed!
Next: Timers, Previous: Operations, Up: Reference
Channels are the way to communicate between fibers. To use them, load the channels module:
(use-modules (fibers channels))
- Function: put-operation channel message
Make an operation that if and when it completes will rendezvous with a receiver fiber to send message over channel.
- Function: get-operation channel
Make an operation that if and when it completes will rendezvous with a sender fiber to receive one value from channel.
- Function: put-message channel message
-
Send message on channel, and return zero values. If there is already another fiber waiting to receive a message on this channel, give it our message and continue. Otherwise, block until a receiver becomes available.
Equivalent to:
(perform-operation (put-operation channel message))
- Function: get-message channel
-
Receive a message from channel and return it. If there is already another fiber waiting to send a message on this channel, take its message directly. Otherwise, block until a sender becomes available.
Equivalent to:
(perform-operation (get-operation channel))
Channels are thread-safe; you can use them to send and receive values between fibers on different kernel threads.
Timers are a kind of operation that, you guessed it, let you wait until a certain time.
(use-modules (fibers timers))
- Function: wait-operation seconds
Make an operation that will succeed with no values when seconds have elapsed.
- Function: timer-operation expiry
Make an operation that will succeed when the current time is greater than or equal to expiry, expressed in internal time units. The operation will succeed with no values.
These internal interfaces are a bit dangerous, in the sense that if they are used wrongly, they can corrupt the state of your program. For example, the scheduler has some specific mechanisms to ensure thread-safety, and not all of the procedures in this module can be invoked on a scheduler from any thread. We will document them at some point, but for now this section is a stub.
(use-modules (fibers internal))
- Special Form: with-scheduler scheduler body ...
Evaluate
(begin body ...)
in an environment in which scheduler is bound to the current kernel thread and marked as current. Signal an error if scheduler is already running in some other kernel thread.
- Function: run-scheduler sched
Run sched until there are no more fibers ready to run, no file descriptors being waited on, and no more timers pending to run. Return zero values.
- Function: resume-on-readable-fd fd fiber
Arrange to resume fiber when the file descriptor fd becomes readable.
- Function: resume-on-writable-fd fd fiber
Arrange to resume fiber when the file descriptor fd becomes writable.
- Function: resume-on-timer fiber expiry get-thunk
Arrange to resume fiber when the absolute real time is greater than or equal to expiry, expressed in internal time units. The fiber will be resumed with the result of calling get-thunk. If get-thunk returns
#f
, that indicates that some other operation performed this operation first, and so no resume is performed.
- Function: create-fiber sched thunk
Spawn a new fiber in sched with the continuation thunk. The fiber will be scheduled on the next turn.
- Function: kill-fiber fiber
Try to kill fiber, causing it to raise an exception. Note that this is currently unimplemented!
- Function: fold-all-schedulers f seed
Fold f over the set of known schedulers. f will be invoked as
(f name scheduler seed)
.
- Function: scheduler-by-name name
Return the scheduler named name, or
#f
if no scheduler of that name is known.
- Function: fold-all-fibers f seed
Fold f over the set of known fibers. f will be invoked as
(f name fiber seed)
.
- Function: suspend-current-fiber [after-suspend]
Suspend the current fiber. Call the optional after-suspend callback, if present, with the suspended thread as its argument.
- Function: resume-fiber fiber thunk
Resume fiber, adding it to the run queue of its scheduler. The fiber will start by applying thunk. A fiber must only be resumed when it is suspended. This function is thread-safe even if fiber is running on a remote scheduler.
- Function: scheduler-kernel-thread sched
Return the kernel thread on which sched is running, or
#f
if sched is not running.
Running Guile code within a fiber mostly “just works”. There are a couple of pitfalls to be aware of though.
• Blocking: | Avoid calling blocking operations. | |
• Mutation: | Avoid unstructured mutation of shared data. |
When you run a program under fibers, the fibers library arranges to make it so that port operations can suspend the fiber instead of block. This generally works, with some caveats.
- The port type has to either never block, or support non-blocking I/O. Currently the only kind of port in Guile are file ports (including sockets), and for them this condition is fulfilled. However notably non-blocking I/O is not supported for custom binary I/O ports, not yet anyway. If you need this, get it fixed in Guile :)
- You have to make sure that any file port you operate on is opened in
nonblocking mode. See Non-Blocking I/O in Guile Reference
Manual, for the obscure
fcntl
incantation to use on your ports. - You have to avoid any operation on ports that is not supported yet in
Guile for non-blocking I/O. Since non-blocking I/O is new in Guile,
only some I/O operations are expressed in terms of the primitive
operations. Notably, Scheme
read
,display
, andwrite
are still implemented in C, which prevents any fiber that uses them from suspending and resuming correctly. What will happen instead is that the call blocks instead of suspending. If you find a situation like this, talk to Guile developers to get it fixed :) - You can enable non-blocking I/O for local files, but Linux at least will always say that the local file is ready for I/O even if it has to page in data from a spinning-metal device. This is a well-known limitation for which the solution is apparently to do local I/O via a thread pool. We could implement this in Fibers, or in Guile... not sure what the right thing is!
You also have to avoid any other library or system calls that would
block. One common source of blocking is getaddrinfo
and
related network address resolution library calls. Again, apparently
the solution is thread pools? Probably in Fibers we should implement
a thread-pooled address resolver.
The (fibers)
module exports a sleep
replacement. Code
that sleeps should import the (fibers)
module to be sure that
they aren’t using Guile’s sleep
function.
Finally, a fiber itself has to avoid blocking other fibers; it must
reach a “yield point” some time. A yield point includes a read or
write on a port or a channel that would block, or a sleep
.
Other than that, nothing will pre-empt a fiber, at least not
currently. If you need to yield to the scheduler, then at least do a
(sleep 0)
or something.
Although code run within fibers looks like normal straight-up Scheme,
it runs concurrently with other fibers. This means that if you mutate
shared state and other fibers mutate the same shared state using
normal Scheme procedures like set!
, vector-set!
, or the
like, then probably you’re going to have a bad time. Although
currently fibers aren’t scheduled pre-emptively, multi-step
transactions may be suspended if your code reaches a yield point in
the middle of performing the transaction.
Likewise if you have multiple kernel threads running fiber schedulers, then it could indeed be that you have multiple fibers running in parallel.
The best way around this problem is to avoid unstructured mutation, and to instead share data by communicating over channels. Using channels to communicate data produces much more robust, safe systems.
If you need to mutate global data, do so within a mutex.
It’s early days. At the time of this writing, no one uses fibers in production that we are aware of. Should you be the first? Well maybe, if you feel like you understand the implementation, are prepared to debug, and have some time on your hands. Otherwise probably it’s better to wait.
See the TODO.md
file in the repository for a current list of
to-do items.
CML uses the term event, but we find this to be a confusing name.
In CML, synchronized.