Skip to content

Latest commit

 

History

History
390 lines (280 loc) · 9.49 KB

install.md

File metadata and controls

390 lines (280 loc) · 9.49 KB

About

This document describes how to install a working BlackLab server and client on a standalone machine running macos.

TOC

  • Prerequisites
    • Requirements
    • Java
    • Tomcat
      • Run Tomcat
      • Manage TomCat
        • Set up managers
  • BlackLab
    • File organization
      • Explanation
    • Server
      • Configure
      • Example corpus
      • Deploy within TomCat
      • Test the query tool
    • Client

Prerequisites

Requirements:

  • macos 10.15.7 or higher (Catalina), earlier probably works as good
  • Command line tools installed (part of XCode) xcode-select --install
  • HomeBrew (a macos package manager)

JAVA

We use Homebrew to install a Java Development Kit (JDK), from an open source, using a mechanism of HomeBrew that is optimized for large binaries.

brew update
brew tap homebrew/cask
brew cask install adoptopenjdk
java -version

Results in:

openjdk version "15" 2020-09-15
OpenJDK Runtime Environment AdoptOpenJDK (build 15+36)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 15+36, mixed mode, sharing)

TOMCAT

We install both TomCat and its native library, in order to get access to SSL.

brew install tomcat
brew install tomcat-native

We get a notification:

In order for tomcat's APR lifecycle listener to find this library, you'll
need to add it to java.library.path. This can be done by adding this line
to $CATALINA_HOME/bin/setenv.sh

  CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/usr/local/opt/tomcat-native/lib"

If $CATALINA_HOME/bin/setenv.sh doesn't exist, create it and make it executable.

N.B This CATALINA is a code name for TomCat and has nothing to do with the current macos version 10.15, also named Catalina.

OK, it appears that

CATALINA_HOME is /usr/local/Cellar/tomcat/9.0.40/libexec although there is no CATALINA_HOME visible to the shell. You can either set such a variable and use it in the commands below, or spell the value out.

It seems that it is not needed to set this variable for TomCat to work.

Depending on whether you have chosen to set CATALINA_HOME in your shell say either

vim $CATALINA_HOME/bin/setenv.sh

or

vim /usr/local/Cellar/tomcat/9.0.40/libexec/bin/setenv.sh

and add the line

CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/usr/local/opt/tomcat-native/lib"

Then

either

chmod ugo+x $CATALINA_HOME/bin/setenv.sh

or

chmod ugo+x /usr/local/Cellar/tomcat/9.0.40/libexec/bin/setenv.sh

Run TOMCAT

There are several ways to run TomCat:

As a service that starts when the Mac starts:

brew services start tomcat

Manually

catalina run

and stop it by Ctrl-C.

As a background process:

catalina start

and stop it with

catalina stop

See also

catalina -h

Manage TOMCAT

In the browser, navigate to

http://localhost:8080

You should see a page that says that you have successfully installed TomCat.

Set up managers

See the users in

cd /usr/local/Cellar/tomcat/9.0.40/libexec 
vim conf/tomcat-users.xml

and add the lines

<role rolename="manager-gui"/>
<user username="dirk" password="dirk" roles="manager-gui"/> 

where you can replace dirk and dirk by whatever you like.

BlackLab

To see what BlackLab is, see BlackLab intro.

A working BlackLab installation consists of a server, a client, and corpus data.

File organization

The following bit of file organization is not rigidly prescribed. You can also choose another organization. Whatever you do, it needs to be reflected in subsequent config files and shell commands.

This is an organization that I find convenient at this stage:

blacklab/
         data/
              incoming/
              indexes/
         program/
         installation/

I have put it all under ~/local i.e. my home directory and then a subdirectory blacklab.

Contents and downloads

The data directory will receive corpus data.

The incoming subdirectory receives downloaded data, the indexes subdirectory is the destination of the BlackLab indexer.

The installation directory receives the downloaded blacklab-server-2.1.0 war file.

This file is attached to a release of the BlackLab repo. The releases are listed here and we pick release 2.1.0. You see a file blacklab-server-2.1.0.war there, download it and place it in the installation directory.

We will unzip it in place, and copy its WEB-INF/lib directory to the program directory.

Over there, we move the blacklab-2.1.0.jar file one level up, so that it is directly beneath the program dir.

When we cd to the program dir, we can easily run the java program in the BlackLab jar file, supported by the libraries in the jar files under the lib subdirectory.

We'll need the BlackLab program soon: for indexing the first corpus.

We also need to download a front-end, a.k.a. client. This is in the INL/corpus-frontend repo. Again, move to the releases page and there you find release 2.1.0. You see a file corpus-frontend-2.1.0.war there, download it and place it in the installation directory.

Server

See BlackLab server overview

Configure

Set an environment variable to point to the BlackLab server config, do this in your .zshrc file.

Note that ~ will not work properly, so spell out the complete path from the root of your system to the directory where your BlackLab config dir is:

BLACKLAB_CONFIG_DIR="/Users/dirk/local/blacklab"
export BLACKLAB_CONFIG_DIR

Then edit/create a server config file:

cd ~/local/blacklab
vim blacklab-server.yaml

and add the contents

---
configVersion: 2

# Where indexes can be found
# (list directories whose subdirectories are indexes, or directories containing a single index)
indexLocations:
- /Users/dirk/local/blacklab/data/indexes

N.B. Note that in this config file you can not use the ~ abbreviation.

Deploying the BlackLab war now leads to a friendly message from BlackLab that there are no indexes. So, before we deploy, we create the indexes for an example corpus and put it in place.

Example data

We download the Brown corpus, a single XML file of 66 MB when unzipped, to be put in data/incoming.

Run the BlackLab index tool by running the BlackLab jar:

cd ~/local/blacklab/program/

Then run the jar:

java -cp "blacklab-2.1.0.jar" nl.inl.blacklab.tools.IndexTool create ~/local/blacklab/data/indexes/brown ~/local/blacklab/data/incoming/brownCorpus.lemmatized.xml tei

Deploy within Tomcat

In the terminal, give the command

catalina start

Then, in the browser navigate (again) to

http://localhost:8080

Click the manage app and login with dirk, dirk (which is what have have put in the TomCat config file for users, above).

In the list of applications, click the blacklab-server-2.1.0 entry.

You should see something like:

<blacklabResponse>
    <blacklabBuildTime>2020-06-22 16:01:41</blacklabBuildTime>
    <blacklabVersion>2.1.0</blacklabVersion>
    <indices>
        <index name="brown">
            <displayName>brown</displayName>
            <description/>
            <status>available</status>
            <documentFormat>tei</documentFormat>
            <timeModified>2020-11-24 11:47:37</timeModified>
            <tokenCount>1008320</tokenCount>
        </index>
    </indices>
    <user>
        <loggedIn>false</loggedIn>
        <canCreateIndex>false</canCreateIndex>
    </user>
    <helpPageUrl>/blacklab-server-2.1.0/help</helpPageUrl>
</blacklabResponse>

which means that all is well and that the Brown corpus indexes have been found.

Test the query tool

Still in the program directory you can run the query tool:

java -cp blacklab-2.1.0.jar nl.inl.blacklab.tools.QueryTool ~/local/blacklab/data/indexes/brown

You get a prompt CorpusQL> . Enter the query "egg" followed by a newline.

You get results. Exit by giving the command exit.

CorpusQL> "egg"
   1. [0106]               is thick , much like an [egg] plant?s skin , so that poison
   2. [0115]                it with beaten yolk of [egg]
   3. [0144]        kitchen for coffee grounds and [egg] shells . All these materials and
   4. [0303]            On this , she builds an `` [egg] compartment ?? or `` egg cell ?? which
   5. [0303] builds an `` egg compartment ?? or `` [egg] cell ?? which is filled with
   6. [0303]             the beebread loaf and the [egg] compartment is closed . The queen
   7. [0303]                  retires to a life of [egg] laying . The first worker bees
   8. [0303]      in unobtrusively , to deposit an [egg] on a completed loaf of
   9. [0303]        before the bumblebees seal the [egg] compartment . The hosts never seem
  10. [0303]             which is provided with an [egg] plus a store of beebread
  11. [0332]              with the chicken and the [egg] . Which came first ? ? Is it
  12. [0400]    constructed a highboard around the [egg] case which he had placed
12 hits in 6 documents
105 ms elapsed
CorpusQL> exit
dirk:~/local/blacklab/program > 

Client

The main front-end for BlackLab is in a separate GitHub repo