This document describes how to install a working BlackLab server and client on a standalone machine running macos.
- Prerequisites
- Requirements
- Java
- Tomcat
- Run Tomcat
- Manage TomCat
- Set up managers
- BlackLab
- File organization
- Explanation
- Server
- Configure
- Example corpus
- Deploy within TomCat
- Test the query tool
- Client
- File organization
- macos 10.15.7 or higher (Catalina), earlier probably works as good
- Command line tools installed (part of XCode)
xcode-select --install
- HomeBrew (a macos package manager)
We use Homebrew to install a Java Development Kit (JDK), from an open source, using a mechanism of HomeBrew that is optimized for large binaries.
brew update
brew tap homebrew/cask
brew cask install adoptopenjdk
java -version
Results in:
openjdk version "15" 2020-09-15
OpenJDK Runtime Environment AdoptOpenJDK (build 15+36)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 15+36, mixed mode, sharing)
We install both TomCat and its native library, in order to get access to SSL.
brew install tomcat
brew install tomcat-native
We get a notification:
In order for tomcat's APR lifecycle listener to find this library, you'll
need to add it to java.library.path. This can be done by adding this line
to $CATALINA_HOME/bin/setenv.sh
CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/usr/local/opt/tomcat-native/lib"
If $CATALINA_HOME/bin/setenv.sh doesn't exist, create it and make it executable.
N.B This CATALINA
is a code name for TomCat and has nothing to do with the current
macos version 10.15, also named Catalina
.
OK, it appears that
CATALINA_HOME
is /usr/local/Cellar/tomcat/9.0.40/libexec
although there is no CATALINA_HOME
visible to the shell.
You can either set such a variable and use it in the commands below, or spell
the value out.
It seems that it is not needed to set this variable for TomCat to work.
Depending on whether you have chosen to set CATALINA_HOME
in your shell
say either
vim $CATALINA_HOME/bin/setenv.sh
or
vim /usr/local/Cellar/tomcat/9.0.40/libexec/bin/setenv.sh
and add the line
CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/usr/local/opt/tomcat-native/lib"
Then
either
chmod ugo+x $CATALINA_HOME/bin/setenv.sh
or
chmod ugo+x /usr/local/Cellar/tomcat/9.0.40/libexec/bin/setenv.sh
There are several ways to run TomCat:
As a service that starts when the Mac starts:
brew services start tomcat
Manually
catalina run
and stop it by Ctrl-C
.
As a background process:
catalina start
and stop it with
catalina stop
See also
catalina -h
In the browser, navigate to
You should see a page that says that you have successfully installed TomCat.
See the users in
cd /usr/local/Cellar/tomcat/9.0.40/libexec
vim conf/tomcat-users.xml
and add the lines
<role rolename="manager-gui"/>
<user username="dirk" password="dirk" roles="manager-gui"/>
where you can replace dirk
and dirk
by whatever you like.
To see what BlackLab is, see BlackLab intro.
A working BlackLab installation consists of a server, a client, and corpus data.
The following bit of file organization is not rigidly prescribed. You can also choose another organization. Whatever you do, it needs to be reflected in subsequent config files and shell commands.
This is an organization that I find convenient at this stage:
blacklab/
data/
incoming/
indexes/
program/
installation/
I have put it all under ~/local
i.e. my home directory and then
a subdirectory blacklab
.
The data
directory will receive corpus data.
The incoming
subdirectory receives downloaded data, the indexes
subdirectory is the destination of the BlackLab indexer.
The installation
directory receives the downloaded blacklab-server-2.1.0
war file.
This file is attached to a release of the BlackLab repo.
The releases are listed
here
and we pick release 2.1.0.
You see a file
blacklab-server-2.1.0.war
there, download it and place it in the installation
directory.
We will unzip it in place, and copy its WEB-INF/lib
directory to the
program
directory.
Over there, we move the blacklab-2.1.0.jar
file one level up, so that it is
directly beneath the program
dir.
When we cd
to the program dir, we can easily run the java program in the
BlackLab jar file, supported by the libraries in the jar files under the lib
subdirectory.
We'll need the BlackLab program soon: for indexing the first corpus.
We also need to download a front-end, a.k.a. client.
This is in the
INL/corpus-frontend repo.
Again, move to the
releases
page and there you find release 2.1.0.
You see a file
corpus-frontend-2.1.0.war
there, download it and place it in the installation
directory.
Set an environment variable to point to the BlackLab server config, do
this in your .zshrc
file.
Note that ~
will not work properly, so spell out the complete path
from the root of your system to the directory where your BlackLab config dir is:
BLACKLAB_CONFIG_DIR="/Users/dirk/local/blacklab"
export BLACKLAB_CONFIG_DIR
Then edit/create a server config file:
cd ~/local/blacklab
vim blacklab-server.yaml
and add the contents
---
configVersion: 2
# Where indexes can be found
# (list directories whose subdirectories are indexes, or directories containing a single index)
indexLocations:
- /Users/dirk/local/blacklab/data/indexes
N.B.
Note that in this config file you can not use the ~
abbreviation.
Deploying the BlackLab war now leads to a friendly message from BlackLab that there are no indexes. So, before we deploy, we create the indexes for an example corpus and put it in place.
We download the
Brown corpus,
a single XML file of 66 MB when unzipped, to be put in data/incoming
.
Run the BlackLab index tool by running the BlackLab jar:
cd ~/local/blacklab/program/
Then run the jar:
java -cp "blacklab-2.1.0.jar" nl.inl.blacklab.tools.IndexTool create ~/local/blacklab/data/indexes/brown ~/local/blacklab/data/incoming/brownCorpus.lemmatized.xml tei
In the terminal, give the command
catalina start
Then, in the browser navigate (again) to
Click the manage app and login with dirk
, dirk
(which is what have have put in the
TomCat config file for users, above).
In the list of applications, click the blacklab-server-2.1.0 entry
.
You should see something like:
<blacklabResponse>
<blacklabBuildTime>2020-06-22 16:01:41</blacklabBuildTime>
<blacklabVersion>2.1.0</blacklabVersion>
<indices>
<index name="brown">
<displayName>brown</displayName>
<description/>
<status>available</status>
<documentFormat>tei</documentFormat>
<timeModified>2020-11-24 11:47:37</timeModified>
<tokenCount>1008320</tokenCount>
</index>
</indices>
<user>
<loggedIn>false</loggedIn>
<canCreateIndex>false</canCreateIndex>
</user>
<helpPageUrl>/blacklab-server-2.1.0/help</helpPageUrl>
</blacklabResponse>
which means that all is well and that the Brown corpus indexes have been found.
Still in the program
directory you can run the query tool:
java -cp blacklab-2.1.0.jar nl.inl.blacklab.tools.QueryTool ~/local/blacklab/data/indexes/brown
You get a prompt CorpusQL>
.
Enter the query "egg"
followed by a newline.
You get results.
Exit by giving the command exit
.
CorpusQL> "egg"
1. [0106] is thick , much like an [egg] plant?s skin , so that poison
2. [0115] it with beaten yolk of [egg]
3. [0144] kitchen for coffee grounds and [egg] shells . All these materials and
4. [0303] On this , she builds an `` [egg] compartment ?? or `` egg cell ?? which
5. [0303] builds an `` egg compartment ?? or `` [egg] cell ?? which is filled with
6. [0303] the beebread loaf and the [egg] compartment is closed . The queen
7. [0303] retires to a life of [egg] laying . The first worker bees
8. [0303] in unobtrusively , to deposit an [egg] on a completed loaf of
9. [0303] before the bumblebees seal the [egg] compartment . The hosts never seem
10. [0303] which is provided with an [egg] plus a store of beebread
11. [0332] with the chicken and the [egg] . Which came first ? ? Is it
12. [0400] constructed a highboard around the [egg] case which he had placed
12 hits in 6 documents
105 ms elapsed
CorpusQL> exit
dirk:~/local/blacklab/program >
The main front-end for BlackLab is in a separate GitHub repo