Skip to content

Abstract Machines

Alex edited this page Nov 29, 2018 · 2 revisions

Just as, say, Java compiles down to bytecode to be run on a virtual machine, functional programming languages similarly compile down to some bytecode. But functional languages confusingly use the term "abstract machine" instead of "virtual machine", even though there's not really much of a difference between the two.

Really, abstract machines are where the "rubber meets the road". Want to change semantics from eager evaluation to lazy? It's in the abstract machine. Swap out the garbage collector? Look at the abstract machine.

There has been some attempt to systematically study abstract machines.

Basic Structure

The basic structure of an abstract machine is like that of a virtual machine (usually a stack machine) or CPU (at least, mimicking its instruction cycle). There's some limited "instruction set", some binary bytecode or low-level "assembly code", a main loop which schematically looks like:

void main_loop(struct MachineState *state, bytecode *program, size_t programSize) {
    bool isRunning = true;
    size_t ip = 0;
    while (isRunning) {
        switch(program[ip]) {
            case /* instruction 1 */:
                isRunning = instruction_1(state, program, programSize, &ip); 
                break;
            // other cases
            case INST_TERMINATE:
                isRunning = false;
                break;
            default:
                // panic!!!
        }
    }
}

There are other ways to do this, e.g., using function pointers to try to emulate threaded code. (It's unclear to me if using inline assembly and goto to avoid function calls would be faster/better --- i.e., if coroutines would be superior in performance.)

typedef void (*Instruction)(struct MachineState *state);

void main_loop(struct MachineState *state, Instruction *program) {
    while (NULL != program) {
        (*program)(state); // run the current instruction
        program++; // then move onto the next one
    }
}

(Perhaps using inline functions would be the best of both worlds? The advantage of making a function for each instruction is that it makes unit testing easier. The disadvantage is it can penalize performance noticeably. The approach to take is (i) make it work, (ii) make it fast.)

There is no typechecking at the abstract machine level, it is all done at the previous stages.

But there is a need for memory management via garbage collection.

Overviews

SECD Machine

Landin's SECD machine was the first "abstract machine" for functional programming languages, at least in theory.

Categorical Abstract Machine

ZINC Machine

A slight optimization of the CAM.

Krivine Machine

Warren Abstract Machine

Three Instruction Machine

G-Machine

This has various enhancements, the spineless G-machine and the spineless tagless G-machine (STG-machine for short). From what I understand, the first people who worked on G-machines for graph reduction were:

Subsequent reviews and enhancements culminated in the STG:

Calling Conventions

There are two main calling conventions for functional programming languages: push/enter for eager evaluation or call-by-value strategies, and eval/apply for lazy evaluation. Or rather, this is where they excel. We could, if we really wanted, create a lazy push/enter scheme (or an eager eval/apply scheme), but this is really "swimming against the current". Marlow and Peyton-Jones discuss the two strategies.

  • Simon Marlow & Simon L. Peyton Jones. "Making a fast curry: push/enter vs. eval/apply for higher-order languages." Eprint, 35 pages.

  • Maciej Piróg, Jeremy Gibbons, "From Push/Enter to Eval/Apply by Program Transformation". arXiv:1606.06380

  • Beniamino Accattoli, Giulio Guerrieri Implementing Open Call-by-Value (Extended Version), 39 pages

  • Stephen Chang, David Van Horn, Matthias Felleisen, "Evaluating Call-By-Need on the Control Stack". arXiv:1009.3174

Formal Verification

Abstracting Abstract Machines

David van Horn and Matt Might seemed to have invented the notion of "abstracting abstract machines". The first papers they wrote on the subject are:

  • David Van Horn and Matthew Might, "Abstracting abstract machines: a systematic approach to higher-order program analysis." arXiv:1105.1743, 8 pages (includes discussion of "abstract garbage collection")
  • David Van Horn and Matthew Might, "Systematic abstraction of abstract machines". arXiv:1107.3539, 39 pages

Follow up papers appeared:

  • J. Ian Johnson, Nicholas Labich, Matthew Might, David Van Horn, "Optimizing Abstract Abstract Machines". arXiv:1211.3722, 12 pages
  • J. Ian Johnson, David Van Horn, "Abstracting Abstract Control (Extended)". arXiv:1305.3163, 14 pages
  • David Darais, Nicholas Labich, Phuc C. Nguyen, David Van Horn, "Abstracting Definitional Interpreters". arXiv:1707.04755, 25 pages.

Verification

  • Ranjit Jhala, Rupak Majumdar, Andrey Rybalchenko, "HMC: Verifying Functional Programs Using Abstract Interpreters". arXiv:1004.2884, 12 pages
  • Sigurd Schneider, Gert Smolka, Sebastian Hack, "Verifying Programs via Intermediate Interpretation". arXiv:1705.06738, 37 pages.

Algebraic Data Types