-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME_CHESHIRE
385 lines (323 loc) · 13.9 KB
/
README_CHESHIRE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
This is a VERY minimal description of how to compile and run the
Cheshire II system. Further, more complete documentation is being
worked on, (but other things always seem to claim priority).
CONTENTS:
This archive file contains both the Cheshire II client and server
source code, some small sample databases and some sample GUI scripts
for X windows. There are also sample scripts for the "webcheshire"
combined client and server CGI engine (in the "scripts" directory).
Files and directories in the distribution are:
BerkeleyDB
The BerkeleyDB software used in the Cheshire database and search engine.
Makefile
Main makefile
Makefile.bak
copy of main makefile
Makefile.solaris
copy of makefile with typical Solaris configuration
MakefileDM
copy of makefile with DMalloc error trapping (used only for development)
ac_list
Generic list handling functions.
ac_utils
Generic utility functions.
bin
Location of executable programs
cheshire2
Source code and test scripts for Cheshire clients (cheshire2, ztcl,
webcheshire and staffcheshire)
client
Support source code for the clients.
cmdparse
Command parser for the clients.
config
Configuration file parser and configuration management source code.
diagnostic
Z39.50 Diagnostic handling.
doc
Assorted documentation on cheshire and related programs.
fileio
low-level database record storage and retrieval code.
header
Include (.h) files for all source code.
index
Source for index creation programs.
jmcd_logtool
Source for tool to read and analyse client-side transaction logs
lib
Location of library files generated during compilation.
marc2sgml
Conversion routines for converting MARC records to SGML
marclib
MARC parsing and handling routines.
scripts
Sample Client, CGI, and utility scripts.
search
Cheshire II search engine.
sgml2marc
SGML to MARC conversion.
sgmlparse
SGML Parser.
socket
Low-level Z39.50 connection handling.
tclhash
Generic Hash table handling (used throughout the system)
utils
Utility programs for building, examining and counting databases.
wordnet
Source code for linking WordNet (not currently supported)
zpdu
Z39.50 V.3 Client and Server Libraries.
zserver
Cheshire II server source code.
---------------------------------------------------------------------
Building Cheshire --
Modify the Makefile to point to Tcl and Tk (version 8.3 preferred, but
many earlier versions should still work as well), the location of X Windows
libraries and include files, and set appropriate flags (if using Solaris).
If you are building a version with support for external relational
databases (such as PostgreSQL), you will need to set the appropriate
definitions (e.g., DBMS_FLAG = "-DPOSTGRESQL") and the location of the
appropriate include files and libraries. Any external relational
database system that you are planning to link in will need to be built first.
Once the Makefile is set up...
Type:
make newbin
If all goes well, the make process should end with building the zserver
and jserver programs and moving them to the bin directory.
The bin directory should contain the following programs:
buildassoc* db_recover* highpost* sgml2marc*
cheshire2* db_stat* in_test* staffcheshire*
countdb* dtd_parser* index_cheshire* test_config*
db_archive* dumpcomp* index_clusters* testsrch2*
db_checkpoint* dumpdb* jserver* webcheshire*
db_deadlock* dumppost* marc2sgml* zserver*
db_dump* dumprecs* opac* ztcl*
db_load* getnumrecs* parser*
These are:
buildassoc
a utility to build an associator file from an SGML file.
cheshire2
Main X windows client program.
countdb
a utility to count the number of items in an index and produce a
frequency count.
db_archive
db_checkpoint
db_deadlock
db_dump
db_load
db_recover
db_stat
utility programs associated with the BerkeleyDB system used in
Cheshire II indexes. See the manual pages in the doc directory.
dtd_parser
An SGML parsing program for testing database data.
dumpcomp
A program to dump information from component files.
dumpdb
utility program to print the contents of an index file.
dumppost
utility program to print the contents and postings of an index file.
dumprecs
utility program to print the contents of an SGML data file or a single
record from the file.
getnumrecs
utility program to report the highest record id number in a data file.
highpost
utility program to print all entries in an index with more than some
specified postings.
in_test
Test version of the indexing program (with voluminous output, useful
for tracking indexing data problems.)
index_cheshire
The main index creation program. It is suggested that the batch (-b)
flag be used for best performance. (NOTE: use of the batch flag
requires sufficient work space on the disks where the index will be
located to hold the index contents TWICE- but indexing is MUCH faster
than not using the flag)
index_clusters
If cluster files are generated during the index_cheshire run, this
program is used to finish generation of the cluster files and indexes.
jserver
A version of the server designed WITHOUT Z39.50, that will listen on
a specified socket and interact with the user/client program using a
simple command language
marc2sgml
Conversion utility to converting MARC records to SGML (using the
Berkeley DTD)
sgml2marc
Conversion utility to convert SGML conforming to the Berkeley USMARC
DTD to MARC records.
test_config
A program to validate (and echo back) configuration files.
testsrch2
Simple line-oriented command driven interface to the search engine.
useful for testing.
webcheshire
Combined elements of the cheshire client and server, used as a
scriptable CGI driver (see scripts directory for samples)
zserver
The Z39.50 server. The server is configured by a combination of
a "server.init" and the database configuration files for each
database being served (see docs/configfiles.ps)
ztcl
A version of the client software without X Window support. Can be
used for utilities or as a line-by-line interface (for those that
know a bit about Tcl/Tk).
Typing the name alone of any of the utility programs will show the usage
and required arguments for the command.
----------------------------
SERVER
----------------------------
Once the system is built you will need to set up the server. This will
require root access to modify some systems file.
A line like the following should be added to the /etc/services file,
(the 2100 can be any port you want -- the official "well-known port"
for z39.50 is 210)
cheshire 2100/tcp ir # Cheshire II server
Then a line like the following should be added to /etc/inetd.conf
#
# Z39.50 Information Retrieval access...
#
cheshire stream tcp nowait nobody /usr6/ray/Work/cheshire/zserver/zserver zserver -c /usr6/ray/Work/cheshire/zserver/server.init
#
This means that when the "cheshire service" on port 2100 is connected to,
it will start up a session as user "nobody" using the program at
/usr6/ray/Work/cheshire/zserver/zserver
The arguments passed to the zserver are what remains on the line above:
zserver -c /usr6/ray/Work/cheshire/zserver/server.init
This is argv[0] -- the program name -- argv[1] -- "-c" a flag indicating
that the following argument is the name of the initialization file for
the server. The -p option can also be used to specify a port number.
The following initialization file contains the following sorts of information:
(this comes from the zserver/server.init file)
#
# Z3950 Server Initialization File.
# The name and value must be in one line. Double quote should be
# placed around a value if the value contains more than one word.
# anything following a # on a line is a comment
#
# Most field names are based on Z39.50 parameter values
#
# FIELD_NAME VALUE
PREFERRED_MESSAGE_SIZE 32768
MAXIMUM_RECORD_SIZE 131072
IMPLEMENTATION_ID "1997"
IMPLEMENTATION_NAME "UC Berkeley V3 ZServer"
IMPLEMENTATION_VERSION "1.0"
PROTOCOL_VERSION "111"
OPTIONS "111011111000001"
PORT "2222" # testing port number
# note that this port is NOT used unless the server is started from the
# command line.
# Database names are the databases served by this server
# NOTE: DATABASE_NAMES is entirely optional -- the actual information used
# is now SOLELY derived from the configuration files filenames and filetags
DATABASE_NAMES "bibfile diglib scimarc"
# directories where the corresponding names above are located
DATABASE_DIRECTORIES "/usr6/ray/Work/cheshire/index/TESTDATA /usr6/ray/Work/cheshire/index/TESTDL /usr6/SCI_MARC"
# Name of the database configuration files associated with the databases
# (can use pathnames relative to the corresponding database directories,
# or full paths)
# At least one CONFIGFILE name MUST be supplied
DATABASE_CONFIGFILES "testconfig.new CONFIG.DL DBCONFIGFILE DBCONFIGFILE DBCONFIGFILE CONFIG.CSMP testconfig.dbms CONFIG.OTA" # list of configfiles
SUPPORT_NAMED_RESULT_SET 1 # 1 means YES
SUPPORT_MULTIPLE_DATABASE_SEARCH 1 # 0 means NO
MAXIMUM_NUMBER_DATABASES 10
MAXIMUM_NUMBER_RESULT_SETS 100
TIMEOUT 120 # in seconds
LOG_FILE_NAME "zserver.log"
# This directory will contain the server session log and resultset files and
# must be writeable by user "nobody" (or whatever user the server runs as).
RESULT_SET_DIRECTORY "/usr/tmp"
# supported attribute sets
ATTRIBUTE_SET_ID "1.2.840.10003.3.1 1.2.840.10003.3.2 1.2.840.10003.3.5" # BIB-1, EXP-1, and GILS
#
#
SUPPORT_TYPE_0_QUERY 1
SUPPORT_TYPE_1_QUERY 1
SUPPORT_TYPE_2_QUERY 0
SUPPORT_TYPE_100_QUERY 0
SUPPORT_TYPE_101_QUERY 1
SUPPORT_TYPE_102_QUERY 1
#
#
SUPPORT_ELEMENT_SET_NAMES 1
SUPPORT_SINGLE_ELEMENT_SET_NAME 0
#
# End of The Server Initialization File.
#
It is not uncommon for server failure or indexing failure to be caused
by incorrect configuration files or server.init files. The first place to
look is in any log files (zserver.log) left in the RESULT_SET_DIRECTORY
location. All error messages from the server are sent to the log file
(unless the server is started from the command line).
The database configuration files, as mentioned above, have their own document
in docs that describe what the database configuration files contain and
the various options.
If you are linking in an external relational DBMS (such as PostgreSQL),
and running the server via inetd, then you will need to ensure that any
dynamically linkable libraries of the external DBMS are accessible. A simple
way to do this is to create a symbolic link for these libraries in /usr/lib.
Examples of configurations files are available in the index directory.
To index a database you need a valid database configuration file. For
testing purposes the config/cf_test program (cd to config and do
a "make all") can be used to print out what the configuration file parser
sees as the contents of the configuration file, and report many possible
errors).
To index a complete database use the command:
index_cheshire -b DATABASE_CONFIG_FILE_NAME
A log file called INDEX_LOGFILE will be created in the current directory
that will contain error messages or processing information about the
indexing process.
Note that starting with version 2.20 the system now needs to have a
DATABASE ENVIRONMENT area set aside for use by BerkeleyDB for handling
locking and disk-backed buffering of data. The advantage gained by
these changes is that it should is now safe to run index-building
updates while the server is actively searching the same database. The
database environment should be set up as an empty directory accessible
to the server. The same database environment may be shared among all
of the databases accessed by a server. You tell the system where the
database environment area is located in one of two ways:
1) Set the environment variable CHESHIRE_DB_HOME to the full pathname of the
database environment directory.
or
2) Add a <DBENV> tag immediately after the <DBCONFIG> and before the first
<FILEDEF> in the configuration file.
The environment variable has priority and will override the DBENV tag in a
config file.
IF neither of these is done, the server or program (such as
index_cheshire, zserver, webcheshire, etc.) will quit with the message
"CHESHIRE_DB_HOME must be set OR config file <DBENV> set." sent to the
console or to the server log file.
When indexing is run (actually when any program that accesses or
updates the indexes is run) Three files are created in the database
environment directory, (__db.001, __db.002, and __db.004). Any program
or user who accesses the database must be able to read and write these
files. The simplest way to handle this is to make the files world read
and writable. Alternatively group write could be used, with all users
allowed to update and access the database -- including the user ids
associated with the inetd-started servers -- as members.
Once the indexing is completed, the configuration file, database name,
and database directory can be added to the server.init file and that
database will be available for Z39.50 searching via the Cheshire II server.
-----------------------------------
CLIENT
-----------------------------------
Help information on using the X Window-based Z39.50 client program is
available in docs as search.info, quickstart.help.txt,
syntax.error.help.txt and cheshire2.info
To set up the client, you need to edit two files (from one of the
client interface scripts in cheshire2/GUI?).
The first file, called "opac" needs to have the location of the
executable cheshire2 program substituted on the first line, and the
location of the ???/cheshire/GUI? directory substituted for the
"defaultPath" variable. The opac file can then be copied or moved to
any convenient location and marked as an executable file.
The second file, called tkinfo.tcl, needs to have the location of the
docs directory added to the "defInfoPath" list variable in order to
find the help files.
------------------------------------
------------------------------------
Ray Larson