Quickstart: https://kaixhin.github.io/FGLab/
FGLab is a machine learning dashboard, designed to make prototyping experiments easier. Experiment details and results are sent to a database, which allows analytics to be performed after their completion. The server is FGLab, and the clients are FGMachines.
FGMachine tries to follow the SemVer standard whenever possible. Releases can be found here.
- Install Node.js from the website or your package manager.
- Either clone this repository or download and extract a zip/tar.
- Move inside the FGMachine folder.
- Run
npm install
. - FGMachine requires a
.env
file in this directory. For most installations, it should be possible to copy example.env to.env
, but it may require customisation for non-standard FGLab or FGMachine ports. An alternative is to set the following environment variables:
- FGLAB_URL (FGLab URL, including port if necessary)
- FGMACHINE_URL (FGMachine URL, including port)
Run node machine
(or npm start
) to start FGMachine. On the first run it will create specs.json
and register itself with FGLab. Please read the overview to understand how FGMachine can interface with your machine learning code.
Note: If you use a virtual environment, e.g. virtualenv
, activate the environment before running node machine
.
Note: If you delete your machine in FGLab, delete specs.json
before running FGMachine again to re-register.
To update, run npm run update
.
Start a FGLab container and link it to the FGMachine container:
sudo docker run -d --name fgmachine -h $(hostname) -v /var/run/docker.sock:/var/run/docker.sock -e FGLAB_URL=<FGLab URL> -e FGMACHINE_URL=<FGMachine URL> -p 5081:5081 kaixhin/fgmachine
The FGLab URL
will be the address of the host running FGLab, including the protocol ("http://") and port (:5080) - note that localhost
will not work but the local network IP/hostname should. The FGMachine URL
will be the address of the current host (as accessible by FGLab), including the protocol ("http://") and port (:5081). Docker's socket is passed to allow FGMachine to launch Docker containers itself. Note that as these are sibling containers, volume mounts (-v
) are relative to the host, not the FGMachine container.
To launch NVIDIA Docker containers, use the following:
sudo docker run -d --name fgmachine -h $(hostname) -v /var/run/docker.sock:/var/run/docker.sock --net=host `curl -s localhost:3476/docker/cli` -e FGLAB_URL=<FGLab URL> -e FGMACHINE_URL=<FGMachine URL> -p 5081:5081 kaixhin/fgmachine
Note that --net=host
is passed to allow access to the NVIDIA Docker API. When launching a sibling container, you will need to run `curl -s localhost:3476/docker/cli`
and manually add the arguments to the project implementation in the container, with docker
as the command (do not use nvidia-docker
).
After a project has been created on FGLab, a corresponding project implementation must be specified in projects.json
. If this machine is available to run experiments for the project created on FGLab, then add the following field to projects.json
(an example is available at example.projects.json). FGLab has an "Add to Machine" button which can automatically set up a template in projects.json
for you (creating projects.json
if it doesn't exist already). Note that <project_id>
links the created project on FGLab and FGMachine's project implemetations in projects.json
.
"<project_id>": {
"cwd": "<working directory (e.g. .)>",
"command": "<program (e.g. caffe)>",
"args": "<first command line options (e.g. train)>",
"options": "<command line options style for options (e.g. double-dash)>",
"boolean": "<optional: only pass flag if true, mandatory: pass flag and true/false argument>",
"capacity": "<machine capacity needed (as a fraction) (e.g. 0.5)>",
"results": "<absolute path to results directory (without experiment ID) (e.g. results)>"
}
cwd
is the working directory for the machine learning code. cwd
can either be an absolute path, or a relative path, in which case it it relative to the FGMachine directory. command
is the program/executable to be run. args
is the first set of command line options to be sent to the program, prior to the experiment options. options
processes the options in 4 different ways. For option settings: {seed: 123, model: "cnn.v2", L2: true}
, exemplar methods would be as such (with boolean
as "mandatory"
):
options |
Program | Command Line [command] [args] [options] |
---|---|---|
plain | node | node [args] seed 123 model cnn.v2 L2 true |
single-dash | th | th [args] -seed 123 -model cnn.v2 -L2 true |
double-dash | caffe | caffe [args] --seed=123 --model=cnn.v2 --L2=true |
function | matlab | matlab [args w/o final arg] [final arg]('seed',123,'model','cnn.v2','L2',true) |
boolean
can be set to "optional"
when boolean flags should be passed only when true, e.g., -L2
, or set to "mandatory"
if the value should always be passed, e.g., -L2 true
and -L2 false
. capacity
is a number between in the range 0-1 (inclusive) that represents (the inverse of) the amount of instances of the program the FGMachine host system can run in parallel (as a heuristic); for example a capacity
of 0.5 indicates that the host is only capable of running 2 instances of the program at once. results
is the directory in which the experiment results must be written into (see below for more details). results
can either be an absolute path, or a relative path, in which case it it relative to the FGMachine directory.
If you receive a "No machine capacity available" error message when submitting a new experiment, which can occur erroneously (for example, if experiments crash), then you can reset a machine's capacity on the machine's page in FGLab.
FGMachine automatically reloads the projects.json
file when it is changed.
In order to handle projects, which require GPUs to perform a task, you need to add two parameters for each project in projects.json
file:
{
"gpu_capacity": "<gpu capacity needed (as a fraction of one GPU capacity, e.g. 0.5)>",
"gpu_command": "<option to pass to script to identify card number, including command line option style (e.g. -gpu)>",
}
Note that gpu_capacity
represents (the inverse of) instances of the program the FGMachine host system can run on one GPU; for example a machine with 4 GPUs will be able to run 8 instances of the program with capacity
0.1 and gpu_capacity
0.5. However, if the capacity
was 0.25 in the previous example, the machine would only be able to run 4 instances of the program.
gpu_capacity
automatically assigns a GPU for experiments, which makes it easier to run batch experiments. Note that like nvidia-smi
, GPU IDs passed via gpu-command
are 0-indexed. For manual control, it is recommended to use a GPU flag as part of the experiment hyperparameters in the project schema.
Results and custom data must be saved as files into a subfolder in the specified results directory, where the name of the subfolder is the experiment ID, e.g. /data/mnist/55e069f9cf4e1fe075b76b95
. For an example that uses the following features, see rand.js.
Non-JSON files are uploaded to MongoDB GridFS via FGLab, which allows them to be downloaded later in their native format. Images and videos are automatically displayed on the experiment page, allowing plots to be created by the machine learning code. JSON files are automatically parsed, with fields being added to the experiment object. An example, notes.json
, may look like this:
{
"Framework": {
"Name": "Theano",
"Version Number": 0.7
},
"Notes": "Best parameters saved at epoch 55"
}
Multiple top-level fields can exist in the same file, but nested fields cannot be updated separately e.g. Framework.Name
. Note that fields preceded with _
are reserved for processing by FGLab. Currently supported fields are listed below:
The _scores
field is a map that can be used to store multiple floats that represent the performance of the model. For example:
{
"_scores": {
"F1": "float",
"BLEU": "float",
"METEOR": "float"
}
}
The _notes
field is a free-form text field. Its primary use is via the experiment page on FGLab, where text written in the "Notes" text box is automatically saved (at an interval of 0.5s), displaying on both the experiment page itself and the table of experiment results.
The _charts
field is a either an object or array of objects that can be used to store data that will be charted on FGLab using C3.js, and hence mimics its API. Given that FGLab renders uploaded images, this is to allow the interactivity afforded by C3.js. This means that it is possible to create different chart types and adjust plotting options, with a minor change in the API so that numeric arrays can be directly exported. Rather than prepending arrays in the columns
array with the column names, the columnNames
array is used to perform this on FGLab.
Charts with lots of values are downsampled for performance reasons, using the Largest-Triangle-Three-Buckets algorithm for visualisation purposes. By default the following options are added to disable points and enable zoom, but these can be overriden:
{
"point": {"show": false},
"zoom": {"enabled": true}
}
An example Multiple XY Line Chart would be structured as such:
{
"_charts": {
"columnNames": [
"train",
"val",
"x1",
"x2"
],
"data": {
"xs": {
"train": "x1",
"val": "x2"
},
"columns": [
[1.0, 0.8, 0.6, 0.4, 0.3, 0.2, 0.1, 0.1, 0.1, 0.1, 0.0],
[1.0, 0.9, 0.6, 0.4, 0.3],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[2, 6, 8, 9, 11]
]
},
"axis": {
"x": {
"label": {
"text": "Iterations"
}
},
"y": {
"label": {
"text": "Losses"
}
}
}
}
}
The usage of _charts
has an inherent tradeoff between storing numerical results in a more intuitive place in the experiment object and easily visualising data. The recommendation is to use _charts
for visualising data where desired (which may not be necessary if plots are generated by the machine learning code), and extract the data given the _charts
' structure. However, it is still possible to duplicate the numerical results in a separate array under a custom field in a JSON file.
Examples utilising the range of abilities of FGLab/FGMachine can be found in the examples folder.