DesignHints

We all have opinions about good design. Here are a couple of good, short papers about software design:

TODO

Do not use assert instead use raise AssertionError. See https://github.com/radiasoft/sirepo/issues/3592
Names should be consistent. In Sirepo, we changed some short names in codes to longer more descriptive names in the schema, but this makes it harder to debug. The names in the schema should match the codes.
Security Permeates Systems
Do not qualify imports in functions; Defer imports to avoid circularities in functions
Comments mean write a function
Use command line with pipe for complex APIs, e.g. docker and sftp, instead of non-language native API. The cli generated can be tested manually and debugged. Easier to figure out the "physics" of the API.
When implementing https://github.com/radiasoft/sirepo/pull/5064 I found it useful to add qcall=qcall to calls. This is a quasi typechecking thing which helped me out. I was adding this parameter to a lot of calls, and it caught some mistakes where the order of the parameters had to be different due to existing keywords vs new, required positional param (qcall).

I was thinking of noting in our style guide that passing keyword args is useful in this way, and also future proofs the interface more easily than passing positional args.

Random Hints

The rest of this page is a collection of principles, which you might find useful. There's some organization, but most hints stand alone.

All these hints have exceptions. However, please limit them for the sake of your users.

Fail Fast

Fail Fast by Jim Fowler

[https://www.usenix.org/legacy/events/usits03/tech/full_papers/oppenheimer/oppenheimer.pdf

Why Do Computer Systems Stop and What Can Be Done About it?

YAGNI: Keep it simple, stupid

You Aren't Gonna Need It (YAGNI)

OAOO: Don't repeat yourself

Once and Only Once (OAOO)

Explicit Coupling: Bind values programmatically

One way to bind values is through names. Constants are the most obvious example of this principle

Encapsulation is the more general concept, where you rely on a module to abstract a value via a name, which you do not make any assumptions about.

Use introspection to cascade values throughout a program. Do not assume a type, for example, use type to extract it.

Ensure Loops Exit

Loops that depend on a single exit can be problematic. Consider this loop:

while i:
    i -= 1

If i is negative when it enters the loop, it will not behave like you expect. Rather, do this:

while i >= 0:
    i -= 1

This ensures the loop exits when i is outside the expected range.

Another issue is when you are depending on an external event. For example, this code:

while not srdb.runner_socket_path().exists():
    time.sleep(0.1)

When depending on an external event, you should always have a timeout or some other counter that terminates the loop. If it never occurs, the loop will never terminate. Write the code this way:

for _ in range(30):
    if srdb.runner_socket_path().exists():
        break
    time.sleep(0.1)
else:
    pkunit.pkfail("runner daemon did not start up")

Here we ensure the loop exits after 3 seconds if the file never shows up.

Output for Programmers (logging)

Programmers need to know what a program is doing at various stages of deployment. This type of output goes by many names: print, debug, trace, logging, etc. We'll just call it output in this section.

The first type of output is when you are first developing software. Programmers often put "naked print" statements. In Python this is print or sys.stderr.write. This is a problem if you forget that you left that print statement in the code. Then, in production, you'll see random, useless output, and usually not know where it came from.

The next type of output is when you want to have a permanent output in production. In Python, people recommend to use the logging module for this. While that's not a "naked print", it is contextless output by default. In order to get context, you have to create a Logger object that puts the code execution context in the output. And, with logging, you have to specify a "level", which is a traditional means by which people specify the importance of the output. Usually, only a particular level (or higher) is output by default.

Python's logging module is too complex for this last type of output: filtered, or what is sometimes called trace output. The idea is that you would turn on "debug" output for a whole system simply by changing a level from "info" to "debug" is silly. What you want is to know about something, e.g. show me all output related to socket I/O. You want to filter on the context with a regular expression or other complex filter, not a simple integer.

In PyKern, the pkdebug module defines these three cases clearly:

pkdp is used for contextful output during development. It should never appear in commits, and you can easily write a grep assertion to validate that it doesn't appear in commits.
pkdlog is for permanent log messages that will always come out.
pkdc is for filtered log messages that only come out when the filter is turned on. Otherwise, they are almost no cost.

All three methods output the context (including process id and timestamp, if configured) of the output call. This means that you can always find the output message in your code. This is crucially important when you are trying to figure out a production problem in real-time.

Output Useful Information

When an error occurs, give more information than you think the user might need. I was running into a problem with JupyterHub, and it was printing:

docker: Operation not permitted

What operation? What parameter was causing the problem? It turned out it was an NFS issue with SELinux. The print statement didn't help with that.

Dump whatever data you might have. Catch exceptions on external operations and log (don't print!) the object and operation along with the error message.

Dispatch, Don't Decorate

https://www.robnagler.com/2015/08/24/Dispatch-Do-not-Decorate.html

Define kwargs for Dispatched Functions

In order to support forwards compatibility (future proof), functions which are dispatched to should accept **kwargs. This allows adding arguments to the dispatch without having to change all dispatched functions at once. Only certain functions will need the new argument (typically) so this reduces complexity around interface changes. For example, when adding qcall to python_source_for_model, the interace changed, and all functions had to be updated even when only a few functions needed access to qcall.

Type-Consistent Function Returns

Sometimes a function does two different things and returns two different results. For example, pksetdefault (briefly) returned self with multiple argument pairs and return the value of its first argument with one pair. This is confusing and error prone. Instead pksetdefault always returns self, and if you need to get at the first value, you can just reference it off the self return, e.g. in sirepo.sim_data:

return data.pksetdefault(computeJobHash=_op).computeJobHash

Impedance Matches: Data structures align with algorithms

When a data structure is misaligned with an algorithm, the code gets longer and more verbose. For example,

paths = []
for lf_prop in LOADED_FILE_PROPS:
    paths.extend(
        map(
            lambda file_name: _pkg_relative_path_static(
                file_type + "/" + lf_prop.dir, file_name
            ),
            schema[lf_prop.source][file_type],
        )
    )
return paths

LOADED_FILE_PROPS happens to be only used for this purpose and looks like:

LOADED_FILE_PROPS = [
    pkcollections.Dict(source="external", dir="ext"),
    pkcollections.Dict(source="local", dir=""),
]

A better data structure would be:

LOADED_FILE_PROPS = (
    ("external", "ext"),
    ("local", ""),
)

And the algorithm is shorter and clearer:

paths = []
for prop, path in LOADED_FILE_PROPS:
    paths.extend(
        map(
            lambda p: _pkg_relative_path_static(file_type + "/" + path, p),
            schema[prop][file_type],
        )
    )
return paths

Bonus change: Replacing file_name to f allows the lambda to be on one line, which makes it clearer. lambdas should be short. Ideally, _pkg_relative_path_static would be shorter, and can be, but thats another topic.

Functions match uses

_pkg_relative_path_static in the previous section is actually used in this only one place. The use of the lambda means there's a problem with the definition of the private function. Here's the definition:

def _pkg_relative_path_static(file_dir, file_name):
    if file_name[0] == "/":
        return file_name
    root = str(pkresource.filename(""))
    return str(static_file_path(file_dir, file_name))[len(root) :]

The function could easily be a loop, and hoisting the loop invariant:

def _pkg_static_paths(prefix, paths):
    root_len = len(str(pkresource.filename("")))
    return map(
        lambda p: p if p[0] == "/" else str(static_file_path(prefix, p))[root_len:],
        paths,
    )

Then the use becomes:

return list(
    itertools.chain_from_iterable(
        _pkg_static_paths(file_type + "/" + p, schema[k][file_type])
            for k, p in LOADED_FILE_PROPS,
    ),
)

Learn Libraries: There's an API for that

Often there's an API for some code you are writing. It's good to use other APIs. Then again, there's the anti-pattern: a little copying is better than a little dependency. This is where judgment comes in.

In the previous section, we have:

def _pkg_static_paths(prefix, paths):
    root_len = len(str(pkresource.filename("")))
    return map(
        lambda p: p if p[0] == "/" else str(static_file_path(prefix, p))[root_len:],
        paths,
    )

static_file_path returns a py.path.local, which can be used to compute the relative path. Really, though, this is computing a route from the pkresource so make a function for it, which describes what it is doing:

def _pkg_static_paths(prefix, paths):
    return map(
        lambda p: p if p[0] == "/" else _static_file_uri(prefix, p),
        paths,
    )


def _static_file_uri(prefix, path):
    return "/" + pkresource.filename("").bestrelpath(static_file_path(prefix, path))

This is another case of a function use mismatch. static_file_path was only used in this case so really, that function could be eliminated. Also, pkresource.filename('') could be a constant.

Differentiate Differences

We often have to balance between doing the simplest thing (YAGNI) and not repeating yourself (OAOO). One reason for not OAOO is that you want to decrease the cognitive load on the reader. Here's an example of a sequence of code with a high cognitive load:

axes.x.values = plotting.linspace(fullDomain[0][0], fullDomain[0][1], json.x_range[2]);
axes.y.values = plotting.linspace(fullDomain[1][0], fullDomain[1][1], json.y_range[2]);
axes.x.scale.domain(fullDomain[0]);
axes.x.indexScale.domain(fullDomain[0]);
axes.y.scale.domain(fullDomain[1]);
axes.y.indexScale.domain(fullDomain[1]);

The differences:

axes.x and axes.y
fulldomain[0] and fulldomain[1]
x_range and y_range
rows neither alternate nor group x and y

The problems with making programmatic:

0 and 1 are alternate names for x and y
x_range and y_range aren't directly indexable

Here's a refactoring assuming no global variables:

var i = 0;
for (d in ['x', 'y']) {
    var f = fullDomain[i++];
    var a = axes[d];
    a.values = plotting.linspace(f[0], f[1], json[d + "_range"][2]);
    a.scale.domain(f);
    a.indexScale.domain(f);
}

While this is more lines, each line is different. It reduces the cognitive load on the reader, because differences are differentiated.

Locals, Parameters, and Globals

Naming variables is hard. In the previous section, we used single letter variable names, which people would argue is bad, but this section argues differently.

Global variables should be self explanatory, because they are used out of context. If they are a constant (with respect to the module), then use upper case. Otherwise, lower. Avoid redundancy. Lots of global names usually indicates the module is too large. Try to use Dict's to reduce the global name space clutter and make the objects accessible programmatically.

zIf there's no existing external (non-private) for a global, prefix it with an underscore so that people know this is not a public interface.

Parameters should describe what they are with respect to the function.

Locals should be avoided. Holding state in intermediate locals is often unnecessary, just pass it on directly. Locals can be single letters. If you find yourself getting confused between the letters, that's a good sign that there are too many locals. Either break up the function or don't hold so much state in locals.

Single letter local names reduce the cognitive load in the function. The single character indicates that it is a local, not a parameter or global, so you can look for it's definition with a narrow range of lines. The single character shortens a line with other concepts on it that are more important: globals, parameters, and other functions.

Here's a function that has too many locals with names longer than necessary.

def api_pythonSource(simulation_type, simulation_id, model=None, report=None):
    data = simulation_db.read_simulation_json(simulation_type, sid=simulation_id)
    template = sirepo.template.import_module(data)
    sim_name = data.models.simulation.name.lower()
    report_rider = "" if report is None else "-" + report.lower()
    py_name = sim_name + report_rider
    py_name = re.sub(r"[\"&\'()+,/:<>?\[\]\\`{}|]", "", py_name)
    py_name = re.sub(r"\s", "-", py_name)
    return _as_attachment(
        flask.make_response(template.python_source_for_model(data, model)),
        "text/x-python",
        "{}.py".format(py_name),
    )

Here's the function after a simple refactoring:

def api_pythonSource(simulation_type, simulation_id, model=None, report=None):
    d = simulation_db.read_simulation_json(simulation_type, sid=simulation_id)
    n = d.models.simulation.name.lower()
    if report:
        n += "-" + report.lower()
    return _as_attachment(
        flask.make_response(
            sirepo.template.import_module(d).python_source_for_model(d, model),
        ),
        "text/x-python",
        "{}.py".format(
            re.sub(
                r"\s",
                "-",
                re.sub(r"[\"&\'()+,/:<>?\[\]\\`{}|]", "", n),
            ),
        ),
    )

Note that the function is longer, but the concepts are isolated on a line. The regular expressions are the most complex concepts to understand, and then are more easily distinguishable in the text.

The creation of the name n (formerly py_name) still takes three lines, but those lines have far fewer characters (79 vs 140). The number of symbols is the same so it just makes it easier to recognize.

If you have a constant within a function however, don't name it with a single letter. This isn't an exception to the single letter local variable rule, because a constant within a function is not a temporary variable, it's a named declaration. This improves readability similar to naming a function.

For example, the function below uses a as a named constant bound to the value 1000:

const millisecondsUntilNextFrameRequest = () => {
    if (! isPlaying || isHidden) {
        return 0;
    }
    const a = 1000;
    var x = appState.models[modelName].framesPerSecond;
    if (! x) {
        return  .5 * milliseconds;
    }
    return  milliseconds / parseInt(x);
}

There is no way for the reader to derive the meaning of 1000. Better to simply name it with its meaning as follows:

const millisecondsUntilNextFrameRequest = () => {
    if (! isPlaying || isHidden) {
        return 0;
    }
    const milliseconds = 1000;
    var x = appState.models[modelName].framesPerSecond;
    if (! x) {
        return  .5 * milliseconds;
    }
    return  milliseconds / parseInt(x);
}

The semantics of a constant can often be important like that of a function declaration.. The named constant gives context, which clears up the meaning of the contextless expression 1000.

The next section is cumbersome, but now it is clearer to the reader why: there are too many concepts. The conversion of the file name to a safe file name (stripping characters) should be moved into its own function. There might even be a function available.

Three Local Variables

Often people say keep functions "short", e.g. 15 lines, which is a pretty good rule. A better rule (perhaps) is to limit to three local variables. Once you hit four local variables, it's time to factor out a (nested) function.

Consider the original implementation of pykern.pkio.walk_tree:

def walk_tree(dirname, file_re=None):
    fr = file_re
    if fr and not hasattr(fr, "search"):
        fr = re.compile(fr)
    dirname = py_path(dirname).realpath()
    dn = str(dirname)
    res = []
    for r, d, files in os.walk(dn, topdown=True, onerror=None, followlinks=False):
        for f in files:
            p = py_path(r).join(f)
            if fr and not fr.search(dirname.bestrelpath(p)):
                continue
            res.append(p)
    return sorted(res)

There are too many local variables. Here's the improved implementation, which is actually longer, but reduces the reader's cognitive load:

def walk_tree(dirname, file_re=None):
    def _walk(dirname):
        for r, _, x in os.walk(dirname, topdown=True, onerror=None, followlinks=False):
            r = py_path(r)
            for f in x:
                yield r.join(f)

    if not file_re:
        res = _walk(dirname)
    else:
        if not hasattr(file_re, "search"):
            file_re = re.compile(file_re)
        d = py_path(dirname)
        res = []
        for p in _walk(dirname):
            if file_re.search(d.bestrelpath(p)):
                res.append(p)
    return sorted(res)

Several improvements happened as a result of this refactoring:

The tree walk and the filtering/result were separated cleanly so one can understand the two major concepts of walking and filtering.
The iterator simplifies the code, because it separates the local variables r, files, and f, which are only required for creating p, the output from _walk.
The py_path(r) was hoisted to eliminate an unnecessary loop invariant call without sacrificing clarity. _walk is focused on creating the paths to be returned.
The not file_re case was greatly simplified (one line), because and speeded up, because a loop invariant was hoisted and the unnecessary list creation was removed.

Even though it is longer, it's easier to read.

Deny by Default

Many APIs are permissive by default. For example, in Flask or Django, you create a view function, and any request can visit it. This is great for "hello world," but in all other cases, it's problematic. Here's how you require a user in Django:

from django.contrib.auth.decorators import login_required


@login_required(redirect_field_name="my_redirect_field")
def my_view(request):
    ...

What happens if you leave off @login_required, anybody can execute the view. In Sirepo, the decorator is similar:

@api_perm.require_user
def api_runSimulation():

However, without the decorator, the server won't start. There's explicit coupling between APIs (functions named api_<something>) and permissions (api_perm decorator).

Deny by default comes up in all kinds of contexts. Continuing on the example in the previous section, we have this formulation for eliminating problematic characters in file name attachments:

"{}.py".format(
    re.sub(
        r"\s",
        "-",
        re.sub(r"[\"&\'()+,/:<>?\[\]\\`{}|]", "", n),
    ),
)

What is missing? Semicolon, dollar, and a few others. Rather, we should use deny by default, that is, something like:

"{}.py".format(re.sub(r"[^\w]+", "-", n).strip("-") or "script")

Anything that's not an alphanumeric gets turned into a dash. Multiple sequences of dashes are replaced with one, and trailing/leading dashes are removed. As a bonus, if the string is empty, it's replaced with a word script, just so we don't have a blank file name.

This is solution is shorter and safer in multiple ways.

Gelignite Screwdrivers

Brian Reid helped deal with the Stanford Breakins in 1986. He got a lot of hate mail (for being the messenger). He responded with an admonition: do not make screwdriver handles out of plastic explosives. This would seem obvious, of course, but it is a hard lesson to internalize.

A modern day example of a gelignite screwdriver is app.config['SECRET_KEY'] in Flask. Miguel Grinberg, author of Flask and its gelignitely-named SECRET_KEY feature, warns programmers:

Many times I hear people say that user sessions in Flask are encrypted, so it is safe to write private information in them. Sadly, this is a misconception that can have catastrophic consequences for your applications and, most importantly, for your users.

Names are important, and SECRET_KEY was a poor choice. A better one would be COOKIE_HASH_PRIVATE_KEY. The word SECRET_KEY has nothing to do with secrets except that the value should be kept secret (private). The SECRET_KEY is a private value used to ensure the integrity of the hash of the clear text cookie value. Flask cookies are sent in the clear albeit obfuscated so it's pretty easy to assume that SECRET_KEY is used to encrypt the cookie, which is merely compressed, not encrypted.

It's unclear why Flask does not encrypt the cookie to ensure integrity and privacy. If that was done, it would be a well-designed tool. Instead, not only is it is poorly designed, it tricks the programmer into thinking it does something it doesn't. All the blog entries in the world (including this one) won't prevent programmers from misusing it.

POSIT: Assume the existence of

I propose in this section a new term for comments called POSIT, which documents the implied connections or assumptions in the subsequent piece of code. It's different than an assertion, which can be tested (asserted). When you posit a state in a piece of software, you are documenting a connection with another piece of code that cannot coupled explicitly through a bound reference. The reason might be explained in the posited comment, but it is not necessary. The purpose is merely to document the implicit connection between two or more pieces of code so that the reader can find all places related to the POSIT.

So, when you are starting to make an assumption, you should write it like these two examples:

# POSIT: "/jupyter" in salt-conf/srv/pillar/jupyterhub/base.yml
local notebook_dir_base=jupyter

and

# POSIT: notebook_dir_base in containers/radiasoft/beamsim-jupyter/build.sh
"bind": "/home/{{ pillar.jupyterhub.jupyter_guest_user }}/jupyter",

When a value becomes unposited, you won't necessarily fail fast. You probably can't know how the posited value. Something might fail, but you might not know why. The definition and reference are unbound, which means anything can happen, which is what makes posited values dangerous and something that should be avoided.

Sometimes it is easier (lazier, cleaner?) to make an implied connection between two pieces of code. It's a hard problem to always create explicitly bound names for things to be used in common with disparate code. Sometimes these unbound connections are posited in two different programming languages, two different systems/repos, etc.

We make a lot of suppositions in software. Many are almost postulates. For example, we assume the following commands on "Unix" (another basket of assumptions) do the same thing:

cd
cd $HOME
cd ~
cd ~$USER

No one would argue that these are all mean the same thing. There are syntactic differences and an error (environment variable references should be quoted). Yet, the concept being named by these four commands is the same.

It's important to acknowledge that command sequences are names for things. They identify the action know as "switching the execution thread's current directory to the user's home directory." This concept is important. It's the premise of all programming. The web relies on this view of naming: A link/URL/URI/etc. is binds a remote procedure call and its arguments into a single, structure name. We save that name in bookmarks and include it in email messages and documents such as this one.

Links do get "broken", which is a good thing. You have a name for something, and that name no longer refers to it. You get an error message. It is not posited that the name shall stay in existence forever. A "404" is a really important part of the web, because it's fail fast, which is a very important principle in software.

Don't Exit or Print

The introduction of web APIs has eliminated the problem of random error messages and system exits from APIs. The implementer of a web API cannot print error messages to the console of the client nor can it force the client to exit. Ordinary programming libraries (aka APIs) should follow this same principle.

System exits and print statements are side effects that assume a particular execution path, which in turn assumes the end-user has all the context necessary to interpret the event (exit or print). Libraries can't know all their uses, and by making assumptions about execution context, library designers limit their applicability.

When a library executes a system exit, it assumes that the caller cannot continue after the event that caused the system exit. That is a big assumption, especially given the complexity of execution contexts that exist today. You might be calling the function from within a web API, for example, which makes the server's job more complicated, because a response is expected on the other end of the HTTP request, always. You can't know that.

Instead of a system exit, through a reasonable exception which contains the context describing the error. If the programming language doesn't have exceptions, return an error, which describes the error. Give enough structured context that it can be passed on programmatically to the end-user. Consider that it might have to pass through an internationalization layer or be translated to some other medium besides text.

A common use for print statements is to provide information about the state of the computation. This might be to display error messages or to announce progress has been made. The former is often for programmers' eyes only and the later is often not delivered in a way that can be practical to transmit to the user -- again, think of progress bars in GUIs.

Instead of print statements, use a logger to help provide the context to the programmer debugging a problem. The log function can collect the execution context, e.g. from whence the function is called, the process ID, and the user's browser. That context can be passed along with the error message to give the programmer a complete picture of what's going on.

Stick to Conventions

Programming has been around long enough that we have some pretty good conventions to follow. Any organization should agree to a set of conventions and stick to them. The more conventions, the better. Why? Less thinking. Too much thought goes into arbitrary style decisions. Despite your brilliance, you aren't likely to think up something better than what 60+ years of programming high level languages.

Being consistent is hard work at first, but you'll get used to it. It saves a lot of time, because you can read other people's code and know where to find things. We can also build tools around the style so that it makes it easier to program.

In Python, we use the Google Python Guide for the most part. We use Sphinx Napoleon Google Style Docstrings. We follow PEP 8, of course.

There are always exceptions, but less is really more in this case.

Avoid Magic Names

The sh module dynamically imports names like this:

sh.ls("/")

The ls name does not exist in the module sh. It's dynamically generated through some quite complex code. The intention is to save the user "boilerplate".

The purpose of sh is to run shell commands. /bin/sh does this, too. If you find yourself writing a lot of shell commands in a row, the best thing to do is use /bin/sh or alternatively use itertools or a loop to reduce the repetition.

Magic names should be used very rarely, because they make reading code very difficult. You have to assume that people will want to chase references. Make it easy for them.

Another type of magic name is an operator overload. There was a time when people were afraid that operator overloading would corrupt languages like C++. That didn't really happen, but it seems to be a pattern in Python. SQLAlchemy uses == to mean "create a join" to be evaluated when the statement executes. That's arguably useful syntax, but it could be just as easy to say x.join(y). That's polymorphism, too, but it's easier to find when grepping and clearer what's going on.

It's important to understand that inline operators in math are the way they are because they are used frequently, and brevity is something valuable in this case. It's also arguable that simply saying add is not exactly onerous.

Using overloaded operators has a subtle cost in long term maintenance: you can't easily find all uses of a particular method. For example, py.path uses / to mean concatenation, but within a string, a / means something different (a literal). This is significant, because:

>>> os.chdir("/tmp")
>>> py.path.local("a/b")
local("/tmp/a/b")
>>> py.path.local("a")/py.path.local("b")
local("/tmp/a/tmp/b")

That's a problem. Is it a bug? If it is a bug and the semantics of py.path changes such that it produces the same result, all the existing uses of the / operator may need to be fixed. (This is identical to the Python 3 change to /, btw.) How do you find all uses of /? If the name of the method was join_paths, it would be a lot easier to find all uses.

Temporary Directories

pkunit.work_dir creates a directory relative to the test that executes the function. The directory stays around after the test finishes. The name is constant. This means you can always find the "work" done during the test after the test runs either successfully or not. This facilitates debugging.

If you must have a temporary file or directory that disappears after a program runs, make sure you can override the removal in an obvious way. That's not how tempfile.TemporaryFile works. This tool makes it hard to debug code which generates other code, which is already hard enough to debug.

Qualify Imports

The purpose of from is bring names into another module without further qualification. Any serious Python programmer never imports with *, but even important module attributes explicitly is often unnecessary or confusing. If you end up using a reference from another module multiple times, the Rule of Three would say to encapsulate that use to avoid repeating yourself. If you don't, then the further qualification necessary to use a module's attribute is not only clearer, but also not much extra typing.

When maintaining lots of code, you are unlikely to know all the modules involved. You need an easy path to finding the code and/or documentation. Unqualified imports require an extra step for the reader to chase down the reference. The writer is increasing readability at the moment of writing, not for the reader who shows up two years later to figure out a bug.

Longer article discusses in detail.

Versions in requirements.txt

When you specify a package, avoid specifying a version. With Docker, we will get the latest version of the package unless otherwise specified.

The problem with specifying a version is that you get version conflict very easily with other packages that require newer version of the package.

With local libraries, you can accidentally get your "editable versions" uninstalled. Unfortunately, an "editable version" has a version number specified, and that version doesn't increase as you edit the code. It stays the same. If a newer version of the library gets installed in pip, a pip install might install the library twice.

More info about setup.py:

Multipass vs Backpatching

Backpatching can sneak into a design pretty easily, especially emergent designs. It starts out small, which is fine, but sometimes it gets big, and turns into convoluted control flow. At this point, you want to switch to multipass programming. Fundamentally, backpatching was an optimization technique. In emergent design, it is cause by deferred refactoring.

Example: RSConf was generating a file called /etc/postfix/local-host-names by having modules backpatch the file. It also was backpatching /etc/postfix/main.cf using postconf. The refactoring to multipass greatly simplified the code and explicitly coupled at build time instead of at run-time. This not only simplified the code it made it more robust.

Unchecked destroy/cancel

Destroys and cancels (Destroys) should be idempotent. For example, HTTP DELETES are idempotent. Having Destroys be unchecked means that even if what you are trying to Destroy is already in the desired state (e.g. canceled or non-existent), or totally unknown to the system, the response to the Destroy is still the same as a successful Destroy. This encourages more concise code with less need for error handling. For example, imagine you have a client and a server. The client sends a request to run a job and the server starts the job and replies that is started. As the server is responding, the connection between the two dies and the client never receives the "started" message. If the client then sends a cancel, the server should reply back "canceled" regardless of whether the job was actually ever started and canceled. In this scenario the client doesn't care if the job was started or not. It just wants the job to be canceled. Bash "unset" works the same way. Regardless of whether the variable is set or not calling unset always succeeds.

Generate code simply

In code generator templates, avoid expressions in substitute values. This makes the templates harder to read.
The template should declare what is to be done, and call imported functions that interpret the declarations.
Prototype first manually, and then modularize.

Test What Might Break

One principle to keep in mind while testing is to test what might break. There are many ways to interpret this, but one of the main takeaways is to avoid testing too much, and instead prioritize testing what is most likely to actually break. Take the following example from this PR comment:

pkunit.pkeq(r.models.simulation.name, "Scooby Doo")

Instead of testing whether r.models.simulation.name is equal to a string that we’ve hard-coded somewhere, it would be more helpful to know if we’ve gotten models back at all. The following change makes the test more robust.

pkunit.pkok(r.get("models"), "no models r={}", r)

Quality Test Failures Output

When a test fails, the output should be informative about the cause of failure. Using the same example from the section above, changing from pkeq to pkok allows the use of an error message ('no models r={}'), which is more informative than a KeyError which may be thrown by the pkeq.

Guard Clauses

Deeply nested code can be hard to read. This is especially true in Python where nested code is only separated by indentation. One way to reduce nesting is to use guard clauses.

Invert the `if` condition

A close cousin of guard clauses (and usually used in conjunction with them) is inverting the if condition.

if my_sentinel:
    do_foo()
    do_bar()
    return x
return y

# Inverting the if condition to reduce nesting
if not my_sentinel:
    return y
do_foo()
do_bar()
return x

Explicit concatenation

Python will concatenate strings for you implicitly

print("hello" " world")

This has a tendency to lead to bugs. For example, someone working on that code may add a comma after 'hello' thinking they are different args to the print function. That leads to the string hello world. Notice they difference? After adding the comma there are now two spaces between hello and world. Subtle behavior like that leads to subtle bugs which are never fun to debug.

print("hello" + " world")

Use the Program Stack

If you find yourself implementing something using an explicit stack ask yourself, "how can I do this with the program stack?". For example, this code used an explicit stack that the program had to manage. It was refactored to just use the program stack. Each time we pass through that code p is put on the program stack and the programmer doesn't have to manually manage it.

Do What I Mean (DWIM)

DWIM is a programming construct where the program tries to do the right thing based on the context/args it is called with. For example, in Emacs M-; is a hotkey for the (comment-dwim) function. This function tries to figure out what it should do based on context. So, if the line of code is already commented-out then it will uncomment it and vice-versa.

DWIM is hard to get right. It is best used sparingly. If you can't cover all cases then make the user/caller be explicit.

In Sirepo we DWIM in cases like handling a string vs dict input. If it is a string we use it. Otherwise, we get the string out of a known place in the dict.

LBYL vs EAFP

I ran into a LBYL vs EAFP issue. The code before:

_timedelta = None
try:
d = int(days)
if d != 0:
_timedelta = datetime.timedelta(days=d)
except Exception:
pass
return _timedelta or 0

The code after:

_timedelta = None
if not days:
return 0
d = int(days)
if d != 0:
_timedelta = datetime.timedelta(days=d)
return d

The only time days would be a problem is if it were not set. This shouldn't technically happen, but it's a reasonable flexibility on the function (srtime.adjust_time).

The interface of adjust_time had to change. I mistakenly return _timedelta when I should have returned d. That was the defect that was fixed in the "code after" (above).

The defect would have been caught by a type check if the receiving variable had been typed, too. Python/mpy/pytype doesn't support this. The receiving variable was being used in a message send, which would have caught this:

days = adjust_time(days)
(
            await self.call_api("adjustSupervisorSrtime",
            kwargs=PKDict(days=days))
            ).destroy()
            ```

LBYL is more appropriate for this code, and would have given me a clue
as to the error. The `pass` above was far too coarse and the try block
was too broad.

EAFP should only be used in narrowly checked circumstances. It may be
that you understand the code at the time. That's not the problem. It's
when you are modifying the code that something like this is missed.

The test that was failing was supervisor_purge_free_sims_test.py. I
was also running into troubles in conftest.py which caused enough
confusion that I had no idea what was going on. To find the defect, I
had to fix conftest.py first, and then slowly figure out what was
failing where.

This is definitely one chalked up to "type checking is fail fast".