-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
403 lines (373 loc) · 20.4 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
v0.99.??
- XML modes: use UTF-8 internally and native codepage externally
- don't choke on one-file-per-line mode in FSC if file doesn't exist
(closes ticket #33)
- parse character data including character references correctly in FSC
- activate all rules after lex-exhaustive parsing (not just syntactic rules)
for compatibility with LKB
- regular expression literals introduced with slashes in TDL
- assert that fss in unify(), unify_restrict() and unify_np() are valid
- added preliminary XML-RPC server mode
- added chart mapping engine:
* chart mapping rules are resource-sensitive rewrite rules
* chart mapping rules operate on a generalized chart
* two phases: token mapping and lexical filtering
- new generic instantiation mode `default-les=all': try to instantiate
all available generics for all available input items. selection should
then be accomplished by constraints on the input fs
- new tokenizer for the XML-based feature structure chart format (FSC)
- input fs can optionally be unified into a lexical item's fs
- input items can now be described by input feature structures
- case-sensitive string and YY tokenizer
- memory limit is respected during unpacking
- added unit tests for types
- added the ParseNodes class to be able to produce nice small parse trees as
in the online demo, and an item printer to test it
- added dag_get_path_value_check_dlist for cases where an empty diff list is
required to end the path search
- removed old path_to_lpath0 which produced memory leaks
- escape XML delimiters in native MRS output
- added make target "bindist" which builds a tarball with the binaries and all
dependent shared libraries
- native MRS code now uses logging and prints with iostreams
- made inflr_todo and part-of-speech tags part of tInputItem constructor
- normalize chart item printing for the different types of items: prefix with
"I", "L" or "P" (for input, lexical and phrasal items, respectively), print
id for each item, normalize start and end nodes and external positions
- stable type order: proper types and leaf types are ordered as defined in TDL
& grammar dump version increased to 17
- added option -sm
- clear memory and stats before rather than after analysis
- added dag_find_paths(), fs::find_paths()
- renamed pop_rest() to pop_first(), added pop_last()
- remerge of configuration branch
- crash when disabling rule filter fixed (morph.cpp)
- clean up of settings file name handling
- printing of most data structures now uses C++ streams
- more harmonization with chart-mapping branch:
* added dag_nth_element()
* implemented fs::fs(char *path, type_t type)
* added fs::fs(const list_int *path, type_t type)
* added fs::get_path_value(const list_int *path) const
* added fs::get_list() const
* renamed dag_get_path_value_l to dag_get_path_value
- made cheap/options.* less messy:
* added doxygen comments for options
* one option variable per line
* grouped options by function
* removed opt_fullform_morph
* renamed memlimit to opt_memlimit
- turn cheap.cpp/print-grammar and -pg into a useful function
- fix type addenda problem when using same coreference names (closes
ticket #16)
- fix wrong \returns keywords for doxygen
- rename lex-tdl.cpp/optional to consume_if
- minor code restructuring in parse-tdl.cpp
- remove as many #include from flop.h as possible
- found error concerning two or more corefs in one conjunction at type
definition (opened ticket #17)
- removed zillions of deprecation warnings concerning the use of
char * literals as non-const char *, although not all
- implemented new printing functions for items and dags that use visitor
objects (ItemPrinter and DagPrinter) and C++ streams instead of FILE *.
Old functions are still in-place and have to be removed. This is a
preliminary check-in.
- made get_surface_string() in cheap.cpp a method of class chart
- better integration of static and dynamic types:
* dynamic types are subtypes of BI_STRING
* dynamic and static types now use the same data structures for their
print and type names, namely std::vector<std::string>
* added std::string get_printname() and std::string get_typename()
* consistent naming scheme for 'type', 'static type', 'dynamic type'
renamed is_resident_type() to is_static_type(), ntypes to nstatictypes,
nleaftypes to nstaticleaftypes, last_dynamic to ntypes
* new naming scheme for a lookup without vs lookup with registering of
new dynamic types (the latter is called "retrieve" and only registers
an unknown type if DYNAMIC_SYMBOLS is defined)
renamed lookup_symbol() to retrieve_type(),
added retrieve_string_instance(std::string)
renamed lookup_unsigned_symbol() to retrieve_string_instance(int),
* added is_string_instance(type_t)
- moved several existing hash functions to hashing.h
- revamped build system (please run autoreconf -i after svn checkout/update)
* made `make distcheck' work again;
* listed common C/C++ sources with full path as _SOURCES;
* added m4 directory for local and 3rd party autoconf-macros;
* made doxygen documentation buildable in any build path;
* respect user-provided variables such as CXXFLAGS in ./configure;
* don't try to override predefined -g -O flags with AM_CXXFLAGS;
* removed all Makefile.base files;
* removed Jamfiles (one build system suffices)
- now compiles with ecl-0.9j
- moved tTokenizer definition from lingo-tokenizer.cpp to input-modules.cpp
- resolve escaped characters in TDL strings
- removed obsolete `nbest' mode and other obsolete code
- Lexparser.next_input now returns true if there's still input
- fixed lots of unclean allocations which where not or not properly free'd
- changed handling of lexical rules (affix and non-affix). Now closer to LKB.
No more infl-rule-status, only lex-rule. Affixation rules MUST have
lex-rule status.
- created new rulefilter class to encapsulate this functionality
- major restructuring of morph.cpp, inner classes now in new morph-inner.h
- replacement of orthographemics-cohesive-chains (kludge) by enhanced filter
for morphological processing a la LKB. Uses boost graph library, so this
is now required for cheap, too. (closes ticket #10)
- added native MRS implementation in C++ (preliminary)
- finish extended restrictor functionality
- bind reading of new input to the registered tokenizer to get a clean
interface that works for PIC and tsdb, too
- allow dumping of cyclic dags in flop
- fix problem of fspp/Makefile.am
- rmrs-list output is activated correctly again when -mrs=rmrx is used
- changed order in SUBDIRS so that libpreprocessor is build before cheap, if
enabled
- Remove the need for the QCCOMP #ifdefs and the corresponding switch in the
configuration using code unfolding a la C++
- Adding `:+' type addendum TDL syntax to flop (closing ticket #1) and doing
some cleanup in parse-tdl.cpp
- fixed wrong characterization introduced by a fix to another
characterization bug (ticket #9)
- fixed bug SMAF mode (provide built-in default if no config file provided)
- changed init() of tLexItem, such that characterization during exhaustive
unpacking works correctly
- Moved functionality of fresh_constraint_p into cached_constraint_of
- Default for memory allocation is now NO mmap, because of the notorious
problems with the ecl-based modules. Works fine for cheap and flop.
- re-ordered library configuration checks because of problems with current
version of ICU; fixed bug that caused wrong type fs inconsistencies in
flop when used without mmap (part of ticket #4)
- further SMAF XML support (Ersatz, OSCAR named entities)
- fixed creation of additional (wrong) coreferences with characterization
(ticket #6)
- change message "no lexicon entries for ..." to include POS tag info when
using '-default-les'
(minor `new' feature by oe)
- re-enable `-interactive-morphology' mode (requested by tim);
- add new feature: lexical type prediction for lexical gaps
- fix incompatible blocking application of non-inflectional lexical rules on
phrasal items (resolves issue of ticket #2 by Francis)
- fix broken handling of file input when using PIC XML input format
- remove all (currently) unused options from configure.ac
add morphology computation for generic entries, too (requested by
Berthold): this requires proper blocking of these rules in the grammar,
in case it is not wanted.
updated README file to reflect current version (almost)
- Adapted tsdb dump code to match new version; surface string printed on
error, too, as well as relations file. (changeset 291)
- efficiency improvement for selective unpacking by i) avoid propagating
hypotheses that failed to instantiate ii) better recording of
seen indices for decompositions (changeset 290)
- fixed error which caused cheap crash with SIGFPE under certain conditions
in XML reader code. Copying of uninitialized double values have lead to
underflow exception in FPU (changeset 287)
- problem with map not being included in grammar.h (ticket 3)
wrong test in configure.ac concering xerces fixed (changeset 284)
- major cleanup of header files, remove ubiquitous pet-system.h
changes in configure.ac: path separator char detection, default system
locations handled by empty variables, support for Mac OS X, better handling
of "rpath" linker options, removed obsolete headers (changesets 275-276)
(two changes by oe)
- repair damage to feature extraction in tSM::score_hypothesis() that i had
introduced when trying to suppress signed vs. unsigned comparison compiler
warnings;
- pass .readings. vector into unpacking routines by reference, so that the
accumulation of results actually becomes visible to the caller.
(small fix etc. bk)
- take identity2-id change back and replace it with a more direct access (for
compatibility)
- use ICU lowercase handling in lingo-tokenizer
- extract exhaustive/selective unpacking functions from collect_readings for
better readability.
- adapt configure.ac for (non-working) ecl 0.9i header file locations. Still
have to use 0.9h because of segfaults in 0.9i with fedora 5
(a couple of minor changes by oe)
- allow selective unpacking by default if `-packing' option is used;
- ignore nsolutions limit in forest construction phase when packing is on;
the rationale here is that (a) forest construction is cheap and (b) we need
to have the full forest available for selective unpacking to compute the
correct sequence of n-best results;
- fix bug in YY tokenizer: sort tokens by start position (possibly this could
be considered a bug in the lexparser neighborhood, actually: it generally
seems okay with tokens coming in out-of-order, but for multi-word lexical
entries, lexical lookup goes wrong);
- make SM reader robust on all current *feature- parameter values;
- ditch legacy tLexItem::identity2() method and use id() instead;
- update options `help' summary to reflect latest developments;
- eliminate compiler warning (on signed vs. unsigned integer comparison)
v0.99.13
- oe integrated a change for the tsdb to escape the yield strings correctly
- fixed bug 'assertion: is_type() failed' which occured with packing and
dynamic symbols
- updated option description to contain the -tok=fsr option
- some reformatting
- added "EOM" so that Uli can properly handle MRS and partial output in HoG
- fixed syntax error in lingo-tokenizer.h which was detected by gcc 4.1.0
- SMAF XML input (incomplete)
v0.99.12
- remove borland directory from configure/make process
- updated TODO list according to Jeres meeting
- integration of x-preprocessor code: code split for ecl/mrs/preprocessor
and change of library build, command line option -tok=fsr now activates
preprocessor code (if compiled in)
- changed mrs.system such that it puts the libmrs.a and mrs.h into the
cheap subdirectory
- make item trait filtering work correctly (with oe in Jerez), which means
that main is now doing as well as the oe branch
- some code reformatting and code doc adaptation
- chart item copy and assignment constructor inhibition went into a macro
- fixed ecl_decode_string, which in some conditions returned improperly
terminated C strings
- generalized the find_file ugliness in cppbridge.cpp, sm.cpp and
settings.cpp and added some nice pathname functionality in utility.cpp
v0.99.11
- fixed error with incomplete characterization (hopefully)
- removed warnings on 64 bit machines in dag-tomabechi.h (only for gcc 3.4.4)
v0.99.10
- renamed config.h(.in) to pet-config.h(.in) (avoid ecl like problems)
- removed problematic sub-shell executions of AC_CHECK_HEADERS
- added position-mapper.h to cheap_SOURCES in cheap/Makefile.am
v0.99.9
- work on lexical psql database continued (unfinished)
- changed characterization to make it work with unfilling
- fixed performance problem in boost version of flop
- moved propagate_status from flop.cpp to hierarchy.cpp where it logically
belongs
- fixed wrong dumping of inflection rules in some situations with many
leaf types
- added morph.cpp patches provided by oe (oe@csli.stanford.edu) for escaping
certain characters in morph rules and applying irregular entries on every
level of analysis
- added some more comments
- added some virtural destructors gcc4.0 complains about
- added fs modifications for stem and form for Berthold
- fixed wrong chart position mapping. Now put into its own class and file
position-mapper.h again
- corrected behaviour of dependency filter in case lex_exhaustive is used
- added printing of feature structures for fegramed
v0.99.8
- patches provided by Eric Nichols <eric-n@is.naist.jp> (thanks a lot)
, fixing the following problems:
* flop bug on 2.6 kernels due to mmap problem
* correct creation of mrs.h file in the autoconf/make scripts (no longer
distributed by LKB)
* fix problems with kernel/architecture specific subdirectories in LKB
- patch provided by Francis Bond (thanks) concerning tsdb MWE's output
included
- in all chart items that replay infl rules, it is attempted to unify either
the base or the surface form into "orth-path"."path-to-inflpos". The base
form is taken for all but the last of these, the surface form for the one
that satisfies the last infl rule
- added function dag_collect_symbols to dag_tomabechi.cpp to collect a type
symbol table for a given dag
- only tLexItems copy the _inflrs_todo list (saves a bit space)
- fixed filtering of duplicate morphology analyses
- moved code to collect readings from the chart into its own function
- cleaned up handling of global variables in chunk allocator
v0.99.7
- merge in oe branch: pass_through_comments, LUI dumping, orthographemic-...
- minor code cleanup
- bug fix in derivation printer concerning multi word entries
v0.99.6
- small fix in common/hashing.h and configure.ac to be able to configure
properly with cygwin
- optional application of the rule filter during morphological analysis
and fixed code to restrict the number of spelling rule applications
- bug fix: blanks were not stripped correctly on the right side of
orthographemic rules
- bug fix in code assigning input items chart positions derived from external
positions
v0.99.5
- dump chart in jxchg format even if an error occured, e.g., exhaustion of
passive edges (only if enabled)
- deleted inactive code in grammar.cpp (punctuationp, translate_iso_chars)
- fixed wrong is_stem_token predicate in item.h
- fixed wrong string decoding in XML mode
- XML tokenizing works from stream as well as from file, DTD is now optional
- flop returns different exit codes depending on the error
v0.99.4
- bugfixes and additions for XML input mode: missing infl tag, translate_iso,
state names are static members of the state classes, state factory now
creates new states, exception are correctly thrown and catched and are
decorated with the error location ; read XML from stdin directly rather
than from file.
- the isomorphix translation has been put completely into the tokenizers
- temporarily removed the error "Duplicate failure path"; it is triggered
when the german grammar with new restrictors is used to compute unification
quickcheck paths (during computation of the rule filter)
- chart::shortest_path is now a template function that gets an additional
weight function object as argument
v0.99.3
- restrictor can specify paths mixed with features (which are paths of length
one anyway)
v0.99.2
- build process with automake && autoconfig && configure for flop and cheap,
boost and dumpgram binaries may be revived if used by anybody
- adaptation of source files to use automatically build config.h
- make the use of ICU really optional, i.e. make cheap work without ICU
altogether.
- corpus approximation (jxchg) output format provided anew
- separate switches for unification and subsumption quickcheck computation
- changed version mechanism to be supported by the auto... tools
v0.99.1
- make XML input mode really optional, fixes in Makefile and cheap.cpp
v0.99.0
Since this is the first entry, change descriptions are quite coarse
grained. This should maybe change in the future.
- Added doxygen compatible comments to most of the .h files, and some
comments to source files too.
- New input/lexical processing stage to allow more modularization and
flexible exchange of tokenization, morphology, etc.
- "japanese multiword bug" fixed
- Application of inflection and lexical rules can now be completed before any
syntactical processing takes place (which might be beneficial for
chart dependencies in german)
- fixed bug in acyclic transitive reduction in the boost version of flop
- expansion failures (flop) now report the failure path
- XML input mode (also as an replacement for the whiteboard version)
- first version of fragmentary results in case of parse failure, maybe needs
more flexibility for better heuristics.
- activation of packing without restrictor setting does no longer lead to a
segmentation fault; packing is simply not activated.
- translation of iso chars to isomorphix in YY input mode
- [incr tsdb()] file dump mode
- version string now included in flop and cheap binaries. version number is
printed with usage information
- printer for hierarchies in VCG tool style, can be used in cheap and flop
- support for dynamic symbols
- dag_expand now does the job correctly using a scheme similar to delta
expansion.
- moved the whole agenda code into the .h file with the hope of some positive
inlining effects (and, besides, to get rid of another file).
- more flexible restrictor functionality
- lots of minor cleanup issues
- first attempt to CHANGELOG, TODO, BUGS, version.h
Done previously (from old ToDo file, partially redundant)
+ XML input mode
+ complete DTD specification (Uli S. and bk did this)
+ build SAX parser
+ supersedes integration of bernd's (whiteboard) version
+ fragmentary results in case of parse failure (v1.0)
Fragmentsuche/-ausgabe im Falle von Parse-failures
+ integrate ecls LISP (seems to work now)
+ unfilling in PET leads to wrong / incomplete results (German grammar)
re-expansion (dag_expand) gefixt
+ packing/unpacking does characterization too
+ leda -> boost migration done and checked
+ CFROM/CTO fix: toplevel errors
+ bei packing ohne packing-restrictor: segfault, jetzt: Warning & disable
+ Nullfehler bei MRS muss Ausgabe produzieren
+ YY-mode macht kein translate-iso-chars
+ Schreiben von TSDB-Tabellen aus PET
+ Erzeugen von item, parse und result tabellen, wenn PET in der HOG laeuft.
yy.cpp ausschlachten: TSDBFILEAPI !!
+ Optionsbeschreibung einbauen
+ Counts fuer lexikalische Ambiguitaet
+ correct sorting of results according to score
+ -results=n option to get only the best n results
+ fullform-morphology gibt beim Drucken den Stem mit raus
+ Restricting the number of inflection rule applications
+ positions and counts for YY and XML tokenizer
+ japanese multiword bug (requires input chart redesign)
+? implement mrs/rmrs code - processor interface ?Is this implemented or not?
+ yy_tokenizer removed from yy.cpp
+ runtime selection of online-morphology vs full-forms