-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathESSAY.txt
518 lines (430 loc) · 17.2 KB
/
ESSAY.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
Some thoughts about tsearch.
David Anderson
Updated 15 February 2014
Updated 24 June 2018 (fixing a typo and regularizing
line lengths).
Tsearch() was defined and implemented fairly early in the
history of UNIX, but I don't know exactly when. The earliest
documentation I have is from the 1991 edition of UNIX System
V Release 4 Programmer's Reference Manual.
It is useful because it enables one to make a searchable
tree of, well, any data one might have need to search.
Tsearch never needs to understand the format of the data.
The functions defined there are tsearch(), tfind(),
tdelete(), and twalk(). It requires careful attention
to understand it. Note the difference between
void *root and void**rootp !
The confusion is partly due to the interface definition.
Until the late 20th century only limited attention was given
to interface design. By the late 20th century it seems clear
that any newly designed tsearch-like functionality would
have a significantly different interface. An example of an
improved interface is the GNU/Linux function hsearch_r().
Making the return value a small integer defining what the
function actually did (and using other arguments to return
additional values as necessary) significantly simplifies
the discussion.
Here though we are talking about the old and standard
interface. We attempt to clarify the messy parts. See the
examples provided for actual code.
The functions and tables of tsearch are not thread safe.
Nor thread-aware. Any access to one of tables at the same
time the table is being updated will lead to chaos eventually.
Any table tsearch maintains consists of records of undefined
(meaning the definition is local to the internals) content
each record of which contains, as one element, a copy of a
KEY pointer.
In the following I freely use tsearch for dwarf_tsearch
etc. The functionality and interfaces are identical.
=========
Further reading
The canonical reference for binary trees and the algorithm
reference for most of the tsearch code is
Donald E Knuth, The Art of Computer Programming
Volume 3, Second Edition. Section 6.2.2 Binary
Tree Searching, page 426. Chapter 6.2.3 Balanced
Trees, page 458.
"Algorithms" Fourth Edition, Robert Sedgewick and Kevin Wayne,
is the basis for the red-black tree coding.
Wikipedia has quite a few entries of interest.
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
http://en.wikipedia.org/wiki/Tree_traversal
http://en.wikipedia.org/wiki/Red–black_tree
are just a few.
After implementing tsearch I looked briefly at
some other projects, but these references may be
out of date.
freecode.net: See the 'GNU libavl' project.
The libavl libary is a GNU project in C with each tree type
having a uniquely-named but consistent set of interfaces.
The interface design is much more sensible than
Unix TSEARCH. You will probably find it in most any
Linux distribution. The 2.0.3 tarfile source has no
configure step and the 2.0.3 Makefile failed for me.
ftp://ftp.gnu.org/pub/gnu/avl/avl-2.0.3.tar.gz
is rather old but shows internal evidence of
last being updated in 2007 (*.pdf and other file timestamps
in the tar file). The Makefile fails for me doing plain 'make'.
Doing 'make program' to just build the source works fine.
The interface definitions are more sensible than tsearch,
rb_delete returns the deleted item when it succeeds,
for example.
freecode.net: See the 'libredblack' project.
This project tarfile was last updated in 2003.
It is on sourceforge at
http://sourceforge.net/projects/redblacktree/
Implemented in C.
Configure worked (version 1.3) but the make step failed
running the python app ./rbgen due to the presence
of the string %prefix on line 9 of rbgen.
Removing that one line of python commentary from Eric Raymond
fixed the build!
There is a good amount of C #define magic making it harder to
read the source than one might expect.
The interface definitions are more sensible than tsearch,
rb_delete returns the deleted item when it succeeds, for example.
freecode.net: See the project 'Template based B+ Tree'.
Is in C++ and can be used for searches which are file or
memory based through use of C++ templates by
the implemenation. Have not attempted to build it.
=========
tsearch:
void *tsearch(const void *key, void **rootp,
int (*compar)(const void *l, const void *r));
Our terminology comes from the above declaration. tsearch()
returns a pointer. If tsearch fails due to an out of memory
or internal error it returns NULL. Otherwise it returns a
non-null pointer (see Return value, below).
KEY:
First we will define KEY as it is ordinarily used. KEY is a
pointer to an object you define and initialize. The object
must remain in stable storage for as long as the table has
a copy of the KEY pointing to the object. Example:
struct mystruct *key_a = malloc(sizeof(struct mystruct));
initialize(key_a, my data to fill in struct...);
void *r = tsearch(key_a,&treeroot,comparfunc);
ROOTP:
This must be the address of a void* datum. Before the first
call of tsearch the value of *rootp must be NULL (set by you
somehow). The contents are maintained by tsearch thereafter.
COMPAR:
A function you write whose argments (when tsearch internals
call it) are KEY pointers from two records (your records).
The function should return -1 if the record KEY l points to is
considered less than the recorda KEY r points to. Return zero
if the values are considered to match. Return 1 otherwise.
Example: int comparefunc(const void *l,const void *r) {
struct mystruct *lp = l;
struct mystruct *rp = r;
if(lp->myv < rp->myv) return -1;
if(lp->myv == rp->myv) return 0;
return 1; }
Of course the comparison need not be of simple values, it
could involve anything. This is just a simple sketch.
Return value:
If tsearch() returns NULL something went wrong.
There was insufficient memory
or an internal error of some kind.
Lets call the KEY passed in KEYa.
Otherwise tsearch() returns a pointer to a KEY for this object.
Dereference to get an actual KEY (a pointer to your object).
Lets call this dereferenced KEY key_deref.
struct mystruct *key_deref = *(struct mystruct *)r;
if KEYa == key_deref then:
KEYa was added to the tree.
Hence the tree now has a copy of
the value of key_a so we must not free it.
else:
KEYa matched a record in the tree and did
nothing so
you need to free any space you allocated
to build the KEYa for this tsearch call.
In our example:
free(key_a).
I hope the above clarifies the use of tsearch somewhat.
============
Tfind:
void *find(const void *key, void **rootp,
int (*compar)(const void *l, const void *r));
The interfaces are the same but the return value is (in
contrast with tsearch) reasonably clear:
A non-null return is a key (pointer).
It never adds a new record, it just tries to find a record.
A NULL return means either that the search-ed for record
did not exist or that
something internal went wrong or perhaps that something incorrect
was detected at runtime in the arguments passed in.
On return you must free the key you passed in.
============
Tdelete:
void *tdelete(const void *key, void **rootp,
int (*compar)(const void *l, const void *r));
The arguments are the same as tsearch/tfind,
but the return value is a bit odd.
If something goes wrong or if the record does not exist,
it returns NULL. That is straightforward.
If the record is deleted tdelete is supposed to return
a pointer to the parent of the deleted record.
The content of the deleted record is NOT deleted.
The assumption implicit here is that key is
the real record (as found by tfind, for example)
so by freeing key you complete the deletion.
A fake key that 'matches' a real key might
return a value, but you have no way to know
what the node pointer of the real key is
so cannot possibly free it.
If the record deleted was the last record in the
tree, a NULL is returned and *rootp is set to NULL.
Thus restoring initial conditions of an empty tree.
If the record deleted was the root (of the records
in the tree as of the call) then what is this supposed
to return? No available documentation suggests an
answer. The current dwarf_tdelete implementations
return some record or other in this case.
In the case of dwarf_tdelete() for a hash 'tree'
if the node was the last in a hash chain then
NULL is returned. (ugly, but it is difficult to
figure out what else to do).
All this means that it is a waste of time to
inspect the return value from tdelete.
So to do a tdelete and do any necessary
memory free do:
key = xxxx
t = tfind(key,&root,compar_func)
if (t) {
tdelete(t,&root,compar_func)
}
free key
free t
where what 'free key' means is to free whatever 'key = xxxx'
allocated. Which means do the same for 'free r'.
Of course if your keys point into to a static array of data
then free is unnecessary, but how often will the data be in
static memory? And if it is in static storage, why do any
free at all?
============
Tdestroy:
void *tdestroy((void * root,
void (*free_node)(void * key));
This frees all memory the tree system has and along the way
it calls 'free_node' for each node in the tree.
It empties the tree, but, oddly, the root argument
is a simple pointer to the root, not a pointer-to-pointer.
Hence this routine can free the tree, but cannot
assign a NULL to the root. So you should do
root = 0;
after calling tdestroy().
============
Twalk:
void twalk((const void * /*root*/,
void (* /*action*/)(const void * /*nodep*/,
const DW_VISIT /*which*/,
const int /*depth*/));
On each visit to a node the 'action' function
is called with a pointer to a node, a visit code
of preorder, postorder, endorder, or leaf,
and a depth (in the tree). Interior nodes will be
visited up to three times.
============
Tdump:
void tdump(const void *rootp,
char *(*keyprint)(const void *key),
const char * msg);
This is unique to the dwarf_tsearch library. It prints (to
stdout) a representation of the tree. The output is
indented one space per level and ordered such that turning
the output 90 degrees clockwise shows a kind of picture of the
tree structure in the way trees are normally shown in books.
It may be of slight interest for debugging trees if the trees
are not too large.
The 'msg' argument is just a character string of interest
to you, something identifying the output. It is printed once
and otherwise ignored.
The 'keyprint' argument is a pointer to a function you write.
The function is called for each node in turn. It should
return a pointer to a string with whatever data from the
passed-in-by-tdump key you wish to show. Normally the pointer
returned from keyprint should be pointing to a static area
so there is no issue with memory leakage to worry about.
tdump will print the returned string immediately and will
not refer to it again.
============
tsearch use using pointers (declarations left out):
The struct here is entirely ours: tsearch
neither knows nor cares
how it is laid out.
mt = struct example_tentry *mt =
(struct example_tentry *)calloc(
sizeof(struct example_tentry),1);
mt->mt_key = keyvalue;
mt->mt_data = datavalue;
errno = 0;
/* tsearch adds an entry if its not present already. */
retval = dwarf_tsearch(mt,tree, mt_compare_func );
if(retval == 0) {
printf("FAIL ENOMEM in search on %d, give up "
"insertrecsbypointer\n",i);
exit(1);
} else {
struct example_tentry *re = 0;
re = *(struct example_tentry **)retval;
if(re != mt) {
/* found existing entry. */
mt_free_func(mt);
} else {
/* New entry mt was added. */
}
}
In this case the mt_compare_func() might if using a
struct look like:
int
mt_compare_func(const void *l, const void *r)
{
const struct example_tentry *ml = l;
const struct example_tentry *mr = r;
/* If the key were a string, one could use strcmp()
instead of the comparisons here */
if(ml->mt_key < mr->mt_key) {
return -1;
}
if(ml->mt_key > mr->mt_key) {
return 1;
}
return 0;
}
============
tsearch use using values (declarations left out):
void *mt = (void *)key;
errno = 0;
/* tsearch adds an entry if its not
present already. */
retval = dwarf_tsearch(mt,tree, value_compare_func);
if(retval == 0) {
printf("FAIL ENOMEM in search on %d, "
"give up insertrecsbypointer\n",i);
exit(1);
} else {
/* successful search.mt might have been
added by the call,
or maybe it was already in the tree.
It is impossible to tell which
in this special use of 'mt' value. */
}
In this case the value_compare_func might look like:
int
value_compare_func(const void *l, const void *r)
{
VALTYPE lp = (VALTYPE)l;
VALTYPE rp = (VALTYPE)r;
if(lp < rp) {
return -1;
}
if(lp > rp) {
return 1;
}
return 0;
}
In this case the free_node function would look like
void free_node { /* Do nothing. */ }
In this case, there is never any free() calls you need to make
as the key is not a pointer to anything.
===================
If I were designing the interfaces in 2014 I might do
as follows.
I think the GNU hsearch_r interfaces are a fine
approach and could be extended to tsearch(). But
I suggest something slightly different.
tse*)_
#define TSE_OK 1
#define TSE_MEM 2
#define TSE_ERR 3
#define TSE_NOTFOUND 4
#define TSE_DUPLICATE 5
#define TSE_STOP 6
struct tsearch_base; /* opaque struct, content defined by
the search code, content not made public */
/* you define comparison function with this prototype */
int compare_func(void *l, void*r);
/* you define free function with this prototype */
int free_func(void *n);
None of the proposed interfaces touch errno.
All functions return TSE_OK or
some other TSE value depending on the call.
Here we present the compare func and free_func
just once (quite unlike normal tsearch,
and maybe not the best idea).
int tsecreate(struct tsearch_base **base,compare_func,free_func)
allocate a struct_tsearch_base record and puts a pointer to
it in *base. Records the compare and free_func pointers.
A call on this by uses is a requirement .
returns:
TSE_MEM if out of memory.
TSE_ERR if something wrong with an argument or an
internal problem.
TSE_OK if the base was created (allocated).
int tsesearch(void *key,struct tsearch_base *base);
returns:
TSE_MEM if out of memory.
TSE_ERR if something wrong with an argument or an
internal problem.
TSE_DUPLICATE if new record not added, that key in tree
already.
TSE_OK if new record added.
int tsefind(void *key,struct tsearch_base *base void **foundrec);
returns:
TSE_OK if record returned in *foundrec
(up to you to free key)
TSE_ERR if something wrong with an argument or an
internal problem.
TSE_NOTFOUND if not found.
(up to you to free key)
int tsedelete(void *key,struct tsearch_base *base,void **keyout);
If successful, tdelete deletes the record identified by key and
and returns the actual record's recorded key through *keyout.
So if a free()or other action is required you can take
that action.
If key is a malloc-ed instance (not a key in the tree) you will
clean up key and keyout.
If key is a literal instance (a key in the tree) then
key and keyout be the same pointer value, so clean up just one.
The 'base' struct tsearch_base record is NOT freed,
it remains, whether the tree has any records left or not.
see tdestroy() to clean up.
returns:
TSE_NOTFOUND if key not found.
TSE_OK if key found and the found key
is returned via keyout.
TSE_ERR if something wrong with an argument or an
internal problem.
int tsewalk(struct tsearch_base *base,
int (* /*action*/)(const void * /*nodep*/,
const DW_VISIT /*which*/,
const int /*depth*/)
The action function returns TSE_OK normally,
TSE_ERR if something is wrong, stop the walk
and return TSE_ERR. Or
TSE_STOP a soon as an action function returns
TSE_STOP.
twalk returns:
TSE_ERR if something wrong with an argument or an
internal problem.
TSE_OK if all is well.
TSE_STOP if an action function call returned TSE_STOP.
int tsesize(struct tsearch_base *base,unsigned long*count_out);
Stores a count of the records currently recorded
in the tree in *count_out.
TSE_ERR if something wrong with an argument or an
internal problem.
TSE_OK if all is well.
int tsedestroy(struct tsearch_base **base);
A call is a requirement on users -- to free up the tree space.
Calls free_func on each node in the tree and
frees all tree memory and does
*base = NULL
to reset your tree pointer.
returns:
TSE_ERR if something wrong with an argument or an
internal problem.
TSE_OK if the the tree is destroyed.
base is zeroed.
=================