Skip to content
Chris Marrin edited this page Jun 18, 2017 · 32 revisions

Welcome to the m8rscript wiki

A new IoT scripting language

As it's name implies, m8rscript is a scripting language which follows C style syntax. But it's much more than that. It's designed to be a system for Internet of Things devices like the ESP8266. In fact right now it runs only on the ESP8266 and the Mac, for debugging purposes. The language allows high level access to the features of IoT platforms: Wifi, Web, sensors, and general purpose computing.

But why make yet another scripting language? There are already tons of scripting language systems out there in use on the ESP8266. But almost all are attempts at making a tiny version of an existing complete language (Lua, JavaScript, Python). The result is either a crippled version or one that just barely fits in memory and crashes regularly.

I wanted to make a new language whose capabilities were tuned to the capabilities of the platform. I also wanted to make one that had familiar syntax (e.g., C) and modern features (e.g., OOP, block scoping). But which didn't try to duplicate all the features of a language like JavaScript or Python. These languages have great features and run well on full size computers. But on a small, memory constrained platform you need to trim down the features and expectations of the language.

m8rscript is pretty general purpose, it can be used for many things. But it is very tuned to the abilities and feature set of the ESP8266. When you flash the firmware onto a new part it comes up in a Smart Config mode where it listens for a special UDP packet telling it the ssid and password of a wifi network. You run an app on an iPhone, Android or laptop to send this packet and the device connects to your network, gives itself a default Bonjour name and you're ready to write scripts. So it assumes your application needs wifi, but not much else.

The system comes with a macOS app which is used to write scripts, test them in a built-in simulator and then communicate with ESP8266 devices to upload the scripts and start them running.

The Basics

MaterScript is written in C++11, which works well with the toolchains I'm using. It can run on Mac, using the XCode project file, or ESP8266, using the supplied Makefile. It has a separate Parser class which generates a Program object (which will eventually be serializable). And a separate ExecutionUnit which can execute the Program object. You can include one or both, depending on your footprint requirements. And that's the first tradeoff. MaterScript neither has nor depends on built-in source compilation (e.g., eval()). Perhaps I'll add an optional object that will supply this feature, but it will be optional.

Issues

m8rscript is really small. An empty ESP8266 app takes up 217K of flash and 26K of data. I started with a parser written in Bison. The problem is that Bison is table driven. That makes it super fast but those tables have to go into data memory, 11K worth in the current version. It's possible to put those tables into flash, but doing so would require special accessors to align reads to 4 byte boundaries (a flash memory constraint). And adding those to generated code would be difficult and unmaintainable. So I replaced that with a hand coded recursive decent parser, which uses hardly any data storage other than the stack.

Language Choices

m8rscript started with a JavaScript-like language. It is typeless and follows a very JavaScript like syntax. But it is somewhat simplified to make it smaller. There are also several language differences, chosen to solve some of the more confusing and unfortunate design choices made by JavaScript long ago.

Semicolons

JavaScript allows semicolons to be optional in most places. This was done to make the language simpler in the most common use cases of one statement per line. But it causes several ambiguities and is considered by many to be a poor design choice for the language. So m8rscript requires semicolons at the end of each statement.

Variables

m8rscript has no implicit variable creation. Attempting to use a variable before it is declared using the var statement is an error. This is true for members of Objects as well. To introduce a new member you would write:

var a = { }; // Create a new empty Object
a.b = 5;     // ERROR: b does not exist
var a.b = 5; // Correct: add b to a and set its value to 5.

Variables have block scope. A var statement inside a block makes those variables visible only inside that block.

Iteration

The iteration form of the for loop (JavaScript's for...in) Iterates over a predefined set of values in an object. For Array and objects deriving from Array, the elements are iterated and the value of the iteration variable is the element value. For Object and objects deriving from Object, the properties are iterated and the value of the iteration variable is the property name.

The for loop is written with a colon in place of the in keyword. This C++ style syntax is used to distinguish the iteration behavior. In JavaScript for...in iterates overall elements and properties which is distinctly different from m8rscript.

Switch statements

m8rscript does not use the break keyword to identify the end of a case statement. Each case has a single statement, which can be a block statement. So there is no fall-through in m8rscript. That doesn't mean you can't have multiple cases for a single statement. The following is legal m8rscript:

switch(a) {
    case 1:
    case 2:
        b = 3;
    case 3:
    case 4: {
        b = 5;
        c = 10;
    }
    case 5: { }
    default: b = 0;
}

Note in the example above that an empty statement is used to distinguish between multiple cases for a single statement and a case which performs no actions.

The motivation for this break from the "tradition" of C-like languages is because I have found in 40 years of programming not a single case of fall-through which could not have been done more clearly and efficiently using another technique, like a call to an inline function or the use of nested switch statements. But I have had innumerable cases of errors introduced through the omission of a break in a switch case.

Floating Point Numbers

Because m8rscript uses 8 byte values, floating point numbers are 64 bits wide. And since the ESP8266 doesn't have a floating point unit, they are represented as a fixed point number with a one bit sign, 39 bit integer part and 26 bit fraction. The fraction actually loses 4 bits because they're used to hold the value type. So that leaves 22 bits of fraction. That is enough for a range of around +/-1e11 with 6 decimal digits of fraction.

Memory sizes

The table below shows the current size of the code (as of 12/10/16). The ParseEngine based parser saves considerably on ram usage over the yyparse based parser.

Build Option Flash Ram Remaining Ram (81,920 total)
Parser + EU 321,765 30,440 51,480
EU only 307,657 29,892 52,028

Performance

m8rscript started with a stack based execution unit using a threaded dispatcher. Execution time was dominated by stack operations so performance was relatively poor. It was then switched to a register based unit, taking many ideas from the Lua 5 runtime. This resulted in an initial 2.5x speedup. After optimizations to reduce the overhead of accessing registers and constants that improvement went up to 6.5x compared to the stack based interpreter.

Here are some raw perf numbers. all times in ms

Test m8rscript (stack) m8rscript - reg (first try/optimized) lua javascript python
timing-mac (1000) 560 249/85
timing-mac (3000) 4950 2084/750 296
timing-ESP (200) 871/405

Debugging

GDBStub is used for source level debugging. If you compile with

make clean
make DEBUG=1

GDBStub will be automatically compiled in. To run the debugger the gdb command from the Espressif SDK is used. The esp directory has a gdbcmds script to make running the debugger easier. It opens the .elf file, sets a log file, and sets the remote target to use the serial port connected to the ESP8266. Note that /dev/cu.xxx is used rather than /dev/tty.xxx, which is used to upload programming. The latter uses flow control, which the uploader understands, but GDB does not. The former does not use flow control and is required, at least on macOS.

Invoking the debugger

To invoke the debugger you simple run xtensa-lx106-elf-gdb and pass the gdbcmds file. For instance:

~/esp8266/tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-gdb -x gdbcmds

If everything is set up correctly, this will show some gdb spew and then stop in gdbstub_do_break_breakpoint_addr, which is the beginning of the program. The hit c and m8rscript runs.