Skip to content

Latest commit

 

History

History
155 lines (140 loc) · 10 KB

background.org

File metadata and controls

155 lines (140 loc) · 10 KB

Genie - Background info

Introduction

Here we give some background information, including design decisions and challenges. If you just want to create scripts and use Genie, you probably don’t need to read this.

Class loading

Setting and using the root-loader

The clojure.lang.Compiler/LOADER is used when loading new libraries and source files. So we aim to set this to a dynamic classloader. This is done using both bindRoot() and set() methods on this class. If a current thread binding is already set, it will also be replaced by this new dynamic classloader. See bind-root-loader in classloader.clj. We can also pass this new dynamic classloader to Pomegranate (in add-dependencies).

Investigation

Finding a working solution took quite some time. Here are some pointers:

  • Loading a namespace (using Pomegranate) and using it (require) should be done in the same session.
  • After a namespace is loaded and required, it is available for later scripts as well.
  • Some libraries are needed by the daemon itself, and are marked as already loaded.
  • It looks like pomegranate uses the (.. Thread currentThread getContextClassLoader) to start the search of the classloader to modify, whereas core/load uses RT/baseLoader.
  • (pg/add-dependencies) can take a :classloader parameter

Currently only Maven/Clojars style libraries are supported. This includes local libraries in your .m2 folder. See todo.org

See also the diagnostics.clj source file.

The require function uses load-libs and load-lib, which uses loaded-libs and calls clojure.lang.RT/load.

The baseLoader function:

static public ClassLoader baseLoader(){
	if(Compiler.LOADER.isBound())
		return (ClassLoader) Compiler.LOADER.deref();
	else if(booleanCast(USE_CONTEXT_CLASSLOADER.deref()))
		return Thread.currentThread().getContextClassLoader();
	return Compiler.class.getClassLoader();
}

So the (dynamic) contextClassLoader could be used, but Compiler.LOADER is set/bound, so this one is used.

When setting a new (dynamic) loader, make sure to create it with the parent set as the existing loader.

When loading/compiling a script, a temporary classloader is created.

To investigate all this, some functionality as given in diagnostics.clj was used. For normal operation it should not be necessary, but it will give detailed classloader-hierarchy information running in –verbose mode. It will then show the information for two trees: one for the current thread, and one for the base-loader.

nRepl supports an operator ‘hotload-dependency’, but this is disabled for now with message ‘Temporarily disabled until a solution for java 10 is found.’

Links

Clients

Two clients are available for communicating with the daemon to start scripts. The second one in Babashka is the most complete one. See todo.org for ideas on other clients, mainly to improve performance.

Take 1 - Tcl and rep

The first client was made in Tcl using the rep binary for communicating with the nRepl server. It is mainly a proof-of-concept, but might serve as a base for future clients. The current version needs to start two processes (one for Tcl and one for Rep), so will be less efficient.

Take 2 - Babashka

The second client version runs in Babashka. It is quite big - about 800 lines - and implements the following main functionalities:

  • converting local and user (~) based paths to absolute paths
  • determine main namespace from the script
  • passing the script, context and cmdline parameters to the daemon
  • handling input, output and error streams from the environment to the nRepl session in the daemon, including logging and blocking I/O.
  • client logging functions
  • check cmdline options for minimal functionality: noload, nocheckdaemon, nosetloader, nomain, nonormalize
  • admin commands: list-sessions, kill-sessions, start/stop/restart daemon
  • a few wrappers around the bencode protocol for communicating with nRepl
  • killing the script daemon-side when a client shutdown signal is received

Protocol versions

Some preparations have been made to pass and check the protocol versions between client and daemon. This is mainly for future use, if needed.

What happens when the daemon starts?

See core.clj, but in short:

  • Initialize the logger
  • Initialize the dynamic classloader
  • Mark the libraries in project.clj as already loaded
  • Load the libraries mentioned in genie.clj in config-dir.
  • Save the out and err streams for later use.
  • Start the nRepl deamon on the given port
  • Initialize the client functions

What happens when a script is executed?

Client

  • Create the context for passing to the daemon. Including current-working-dir (cwd), specific deps.edn file, and name of the script.
  • Determine the main namespace and function to call by reading the script.
  • Normalize the given command line parameters
  • Open a TCP connection to the local nRepl server and create a new session
  • Pass an eval-command to the daemon (nRepl server)
  • Then, in a loop:
    • Get stdout/stderr output from the daemon and print it to the local stream
    • Pass local stdin to nRepl session stdin when a :need-input message is received
    • If an exception occurs, print it and stop the script.

Daemon

On the daemon side, when client/exec-script is called:

  • The dynamic classloader is set to the one created at startup
  • Script libraries are loaded by checking a deps.edn file in the same dir, the parent dir or a client command line parameter given
  • The script is loaded with the standard load-file function:
  • The main function is executed. This is a function called ‘main’ in the last namespace declaration in the script

Logging

Logging can be somewhat complicated in Clojure. Moreso with client sessions, as the correct out and err stream needs to be used. Some notes:

  • We use the logger library as a wrapper around log4j. This does not need any config XML.
  • With a new nRepl sessions the dynamic vars out and err get bound to a new instance.
  • The logger uses this err stream
  • When the script logs something (e.g. log/info), this is received by the Genie client in the :err slot of the result and put on the stderr stream connected to the client.
  • When the Genie daemon wants to log something in its own log, it needs to rebind the err stream first. This and the out stream are kept in state.clj.
  • The Genie client does not use an external library; it uses some simple logging functions including generating a timestamp (you have to have timestamps)

Context

A JVM does not really have a concept of a changeable working directory. There is a constant startup-directory (user.dir property), but this is not useful for scripts, that have different working directories, especially when running at the same time. So we give the working directory in the context (ctx) from client to daemon.

The environment is currently not given from client to daemon at runtime, so it should be set at daemon start time.

Command line parameters

For the Genie daemon we use the cmdline library, which uses values according to this priority:

  • values given on the command line
  • values given in the config file (genie.edn)
  • default values defined in the tool

(Mutable) state

There is quite some state involved:

  • the dynamic classloader to use for all client sessions.
  • loaded libraries including different versions
  • required libraries
  • scripts loaded, could be different versions
  • sessions including streams (stdin, stdout, stderr)

See state.clj for specifics.

Error handling

When an exception occurs, it should be communicated to the client. This is done quite trivially by catching and logging the error, and then rethrowing it. nRepl wil then catch it and communicate to the client.

Another possible issue is a hanging or long running script, or a script that crashes but keeps file handles or similar open. For this we have the option of listing and killing client sessions, using –list-sessions and –kill-sessions. nRepl assigns a session-id for each new session, so this is relatively easy.

On the client side a Babashka shutdown hook is defined, which - when triggered with e.g. C-c - will also close the daemon-session.

Scripts

genie_new.clj

This script can be used to create new scripts. It will:

  • use the template.clj and deps.edn files as a base.
  • replace {{namespace}} and {{script}} with appropriate values
  • create a main function with a default implementation to call script-function using the cmdline library.
  • create a -main function so the script can also be executed by clj. For this it also set :paths [“”] in the deps.edn file.
  • create a namespace declaration with references to some popular libraries. You can change this in the template.
  • convert dashes and underscores according to Clojure rules
  • create a root-namespace, with just a single segment. This should be fine for scripts, not for libraries.

sync_project_libraries.clj

With this script we can check if the libraries marked as already loaded are the same ones as mentioned in project.clj (Leiningen project file). It also serves as an example script.

Linters

We use several linters to keep the code mostly clean:

  • bikeshed
  • clj-kondo
  • cljfmt
  • check-namespace-decls
  • eastwood
  • kibit
  • ancient

And sometimes:

  • yagni
  • vizdeps
  • lein deps :tree (to check conflicting libraries)

To prevent code from executing (genie.clj and install.clj) we use this Babashka trick:

;; see https://book.babashka.org/#main_file
(if (= *file* (System/getProperty "babashka.file"))
  (main)
  (println "Loaded as library:" (str (fs/normalize *file*))))