- This file corresponds to my lab book for my doctoral thesis tackling artifact correction in Fluorescence Correlation Spectroscopy (FCS) measurements using Deep Neural Networks. It also contains notes taken during the process of setting up this workflow for reproducible research.
- This file contains explanations of how things are organized, of the workflow for doing experiments, changes made to the code, and the observed behavior in the “* Data” section.
- The branching model used is described in this paper. Therefore: if you
are interested in the “* Data” section, you have to
git clone
the data branch of the repository. The main branch is clean from any results, it contains only source code and the analysis. - This project is my take on Open-notebook science. The idea was postulated in
a blog post in 2006:
… there is a URL to a laboratory notebook that is freely available and indexed on common search engines. It does not necessarily have to look like a paper notebook but it is essential that all of the information available to the researchers to make their conclusions is equally available to the rest of the world —Jean-Claude Bradley
- Proposal on how to deal with truly private data (e.g. notes from a confidential meeting with a colleague), which might otherwise be noted in a normal Lab notebook: do not include them here. Only notes relevant to the current project should be taken
# This is a sh block for shell / bash scripting. In the context of this file,
# these blocks are mainly used for operations on my local computer.
# In the LabBook.html rendering of this document, these blocks will have a
# light green colour (#F0FBE9)
# This block can open and access tmux sessions, used for shell scripting on
# remote computing clusters.
# In the LabBook.html rendering of this document, these blocks will have a
# distinct light green colour (#E1EED8)
# This is a python block. In the context of this file, it is seldomly used
# (only for examplary scripts.)
# In the LabBook.html rendering of this document, these blocks will have a
# light blue colour (#E6EDF4)
# This is a jupyter-python block. The code is sent to a jupyter kernel running
# on a remote high performance computing cluster. Most of my jupyter code is
# executed this way.
# In the LabBook.html rendering of this document, these blocks will have a
# light orange colour (#FAEAE1)
;; This is a emacs-lisp block, the language used to customize Emacs, which is
;; sometimes necessary, since the reproducible workflow of this LabBook is
;; tightly integrated with Emacs and org-mode.
;; In the LabBook.html rendering of this document, these blocks will have a
;; light violet colour (#F7ECFB)
This is a literal example block. It can be used very flexibly - in the context of this document the output of most code blocks is displayed this way. In the LabBook.html rendering of this document, these blocks will have a light yellow colour (#FBFBBF)
- Create a new branch from
main
- Print out the git log from the latest commit and the metadata
- Call the analysis scripts, follow the principles outlined in * Organization of code
- All machine learning runs are saved in
data/mlruns
, all other data indata/#experiment-name
- Add a
** exp-<date>-<name>
” section to this file under * Data - Commit/push the results of this separate branch
- Merge this new branch with the remote
data
branch
gitflow-avh
(magit-flow
) to follow the flow- possibly https://github.com/magit/magit-annex for large files. Follow this: https://git-annex.branchable.com/walkthrough/
- maybe check out git-toolbelt at some point https://github.com/nvie/git-toolbelt#readme with https://nvie.com/posts/git-power-tools/
- emacs jupyter for running and connecting to kernel on server: https://github.com/dzop/emacs-jupyter
- if I actually still would use .ipynb files, these might come handy:
- jupytext: https://github.com/mwouts/jupytext
- nbstripout: https://github.com/kynan/nbstripout
- https://docs.faculty.ai/user-guide/experiments/index.html and https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/hls-image-processing/02-image-segmentation-dl.html
git log -1
import os
import pprint
ramlist = os.popen('free -th').readlines()[-1].split()[1:]
print('No of CPUs in system:', os.cpu_count())
print('No of CPUs the current process can use:',
len(os.sched_getaffinity(0)))
print('load average:', os.getloadavg())
print('os.uname(): ', os.uname())
print('PID of process:', os.getpid())
print('RAM total: {}, RAM used: {}, RAM free: {}'.format(
ramlist[0], ramlist[1], ramlist[2]))
!echo the current directory: $PWD
!echo My disk usage:
!df -h
if _long:
%conda list
pprint.pprint(dict(os.environ), sort_dicts=False)
rm ~/.tmux-local-socket-remote-machine
REMOTE_SOCKET=$(ssh ara 'tmux ls -F "#{socket_path}"' | head -1)
echo $REMOTE_SOCKET
ssh ara -tfN \
-L ~/.tmux-local-socket-remote-machine:$REMOTE_SOCKET
Different applications can be run on the remote compute node. If I want to access them at the local machine, and open them with the browser, I use this tunneling script.
ssh -t -t ara -L $port:localhost:$port ssh $node -L $port:Localhost:$port
Apps I use that way:
- Jupyter lab for running Python 3-Kernels
- TensorBoard
- Mlflow ui
Starting a jupyter instance on a server where the necessary libraries are installed is easy using this script:
conda activate tf
export PORT=8889
export XDG_RUNTIME_DIR=''
export XDG_RUNTIME_DIR=""
jupyter lab --no-browser --port=$PORT
On the compute node of the HPC, the users’ environment is managed through
module files using the system Lmod. The export XDG_RUNTIME_DIR
statements
are needed because of a jupyter bug which did not let it start. Right now,
ob-tmux
does not support a :var
header like normal org-babel
does. So
the $port
variable has to be set here in the template.
Now this port has to be tunnelled on our local computer (See #ssh-tunneling). While the tmux session above keeps running, no matter if Emacs is running or not, this following ssh tunnel needs to be active locally to connect to the notebook. If you close Emacs, it would need to be reestablished
- prerequisite: tmux versions need to be the same locally and on the server.
Let’s verify that now.
- the local tmux version:
tmux -V
- the remote tmux version:
ssh ara tmux -V
- the local tmux version:
- as is described in the ob-tmux readme, the following code snippet creates
a socket on the remote machine and forwards this socket to the local
machine (note that
socket_path
was introduced in tmux version 2.2)REMOTE_SOCKET=$(ssh ara 'tmux ls -F "#{socket_path}"' | head -1) echo $REMOTE_SOCKET ssh ara -tfN \ -L ~/.tmux-local-socket-remote-machine:$REMOTE_SOCKET
- now a new tmux session with name
ob-NAME
is created when using a code block which looks like this:#+BEGIN_SRC tmux :socket ~/.tmux-local-socket-remote-machine :session NAME
- Commands can be sent now to the remote tmux session, BUT note that the output is not printed yet
- there is a workaround for getting output back to our LabBook.org: A script
which allows to print the output from the tmux session in an
#+begin_example
-Block below the tmux block by pressingC-c C-o
orC-c C-v C-o
when the pointer is inside the tmux block.
Emacs-jupyter
aims to be an API for a lot of functionalities of the
jupyter
project. The documentation can be found on GitHub.
- For the whole document: connect to a running jupyter instance
-
M-x jupyter-server-list-kernels
- set server URL, e.g.
http://localhost:8889
- set websocket URL, e.g.
http://localhost:8889
- set server URL, e.g.
- two possibilities
- kernel already exists
$→$ list of kernels andkernel-ID
is displayed - kernel does not exist
$→$ prompt asks if you want to start one$→$ yes$→$ type kernel you want to start, e.g.Python 3
- kernel already exists
-
- In the subtree where you want to use
jupyter-python
blocks withorg babel
- set the
:header-args:jupyter-python :session /jpy:localhost#kernel:8889-ID
- customize the output folder using the following org-mode variable:
(setq org-babel-jupyter-resource-directory "./data/exp-test/plots")
- set the
- I am partial to the twitter bootstrap theme of html, since I like it’s simple design, but clear structure with a nice table of contents at the side → the following org mode extension supports a seemless export to twitter bootstrap html: https://github.com/marsmining/ox-twbs
- when installed, the export can be triggered via the command
(org-twbs-export-as-html)
or via the keyboard shortcut for exportC-c C-e
followed byw
for Twitter bootstrap andh
for saving the .html - Things to configure:
- in general, there are multiple export options: https://orgmode.org/manual/Export-Settings.html
- E.g. I set 2
#+OPTIONS
keywords at the begin of the file:toc:4
andH:4
which make sure that in my export my sidebar table of contents will show numbered headings till a depth of 4. - I configured my code blocks so that they will not be evaluated when
exporting (I would recommend this especially if you only export for
archiving) and that both the code block and the output will be exported
with the keyword:
#+PROPERTY: header-args :eval never-export :exports both
- To discriminate between code blocks for different languages I gave each
of them a distinct colour using
#+HTML_HEAD_EXTRA: <style...
(see above) - I had to configure a style for
table
, so that thedisplay: block; overflow-x: auto;
gets the table to be restricted to the width of the text and if it is larger, activates scrollingwhite-space: nowrap;
makes it that there is no wrap in a column, so it might be broader, but better readable if you have scrolling anyway
- Things to do before exporting / Troubleshooting while exporting:
- when using a dark theme for you emacs, the export of the code blocks
might show some ugly dark backgrounds from the theme. If this becomes
an issue, change to a light theme for the export with
M-x (load-theme)
and choosesolarized-light
- only in the
data
branch you set the git tags after merging. If you want to show them here, execute the corresponding function in Git TAGs - make sure your file links work properly! I recommend referencing your files relatively (e.g. [ [ f ile:./data/exp-XXXXXX-test/test.png]] without spaces). Otherwise there will be errors in your *Messages* buffer
- There might be errors with your code blocks
- e.g. the export function expects you to assign a default variable to your functions
- if you call a function via the
#+CALL
mechanism, it wants you to include two parentheses for the function, e.g.#+CALL: test()
- check indentation of code blocks inside lists
- add a
details
block around large output cells. This makes them expandable. I added some#+HTML_HEAD_EXTRA: <style...
inspired by alhassy. That’s how thedetails
block looks like: - If you reference a parameter with an underscore in the name, use the
org markdown tricks to style them like code (
==
or~~
), otherwise the part after the underscore will be rendered like a subscript:under_score
vs under_score
- when using a dark theme for you emacs, the export of the code blocks
might show some ugly dark backgrounds from the theme. If this becomes
an issue, change to a light theme for the export with
- Things to do after exporting:
- In my workflow, the exported
LabBook.html
with the overview of all experiments is in thedata
folder. If you move the file, you will have to fix the file links for the new location, e.g. via “Find and replace”M-%
:- if you move the org file → in the org file find
[[file:./data/
and replace with[[file:./
→ then export withC-c C-e w h
- if you export first with
C-c C-e w h
and move the html file todata
→ in the html file find./data
and replace with.
- if you move the org file → in the org file find
- In my workflow, the exported
- contains all the source code in folder **src/** which is used for experiments.
- contains the **LabBook.org** template
- contains setup- and metadata files such as **MLproject** or **conda.yaml**
- the log contains only lasting alterations on the folders and files mentioned above, which are e.g. used for conducting experiments or which introduce new features. Day-to-day changes in code
- if an experiment is done, the code and templates will be branched out from main in an #experiment-name branch, ### meaning some meaningful descriptor.
- all data generated during the experiment (e.g. .csv files, plots, images, etc), is stored in a folder with the name **data/#experiment-name**, except machine learning-specific data and metadata from `mlflow` runs, which are saved under **data/mlruns** (this allows easily comparing machine learning runs with different experimental settings)
- The **LabBook.org** file is essential
- If possible, all code is executed from inside this file (meaning analysis scripts or calling the code from the **scr/** directory).
- All other steps taken during an experiment are noted down, as well as conclusions or my thought process while conducting the experiment
- Provenance data, such as Metadata about the environment the code was executed in, the command line output of the code, and some
- this is the branch I use for day to day work on features and exploration. All of my current activity can be followed here.
- contains a full cronicle of the whole research process
- all #experiment-name branches are merged here. Afterwards the original branch is deleted and on the data branch there is a Git tag which shows the merge commit to make accessing single experiments easy.
- the develop branch is merged here as well.
git push origin --tags
git tag -n1
- cloned from dwaithe with refactoring for Python 3-compatibility
- Add
#+HTML_HEAD_EXTRA: <style...
fortable
to enable scrolling if the table overflows
- Add
details
blocks, corresponding#+HTML_HEAD_EXTRA: <style...
and documentation in Notes on archiving
- Rename
master
branch tomain
branch
- Add
#+OPTIONS: H:4
and#+OPTIONS: toc:4
to show up to 4 levels of depth in the html (twbs) export of this LabBook in the table of contents at the side - I added Notes on archiving
- update “jupyter scripts” in Template for data entry and setup notes:
for new conda environment on server (now
conda activate tf-nightly
)
- extend general documentation in README
- Add code block examples
- extend documentation on experiment workflow
- move setup notes from README to “Template for data entry and setup notes”
- remove emacs-lisp code for custom tmux block functions (not relevant enough)
- change named “jpt-tmux” from starting a jupyter notebook to starting
jupyter lab. Load a conda environment instead of using Lmod’s
module load
- extend documentation on git model
- extend documentation on jupyter setup
- added parts of README which describe the experimental process
- added templates for system metadata, tmux, jupyter setup
- added organization of code
- set up lab book and form git repo accoring to setup by Luka Stanisic et al