-
Notifications
You must be signed in to change notification settings - Fork 301
/
Release.txt
863 lines (742 loc) · 35.9 KB
/
Release.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
4.6 (2020.09.02)
=================
New features:
- CUDA support for DNN computation
- build by nvcc to enable it.
- detailed parameter can be given by .dnnconf option "cuda_mode=...". See Sample.dnnconf for details.
- tested on Linux with CUDA 8.0, 9.0 and 10.2.
- 1-pass grammar recognition
- Per-grammar basis: can be enabled for each grammar only when it has additional ".dfa.forward" file.
- The ".dfa.forward" file will be generated by "mkdfa" of recent version. Leave it to enable, or delete it to let Julius work as previous version.
- Support non-log10nized state priors in DNN model
- New .dnnconf option "state_prior_log10nize=yes/no" to switch the behavior
- Feature normalization pattern added: mean = input self, variance = static
- New option "-cvnstatic" to choose this behavior
- See the updated doc "doc/Normalize.md" to know how to set feature normalization in Julius.
Updates:
- Now delivered under simplified BSD License
- Added Python version of "mkdfa.py"
- Update build for Visual Studio 2017, support building more tools.
- Re-write documentations in markdown format under "doc" (WIP)
- Place README.md in each directory, remove *.txt instead
- mkdfa (mkfa): now outputs detailed error message (line num etc.)
Bug fix
- "mkbingram" ignores charset conversion options, performs no conversion
4.5 (2019.01.02)
=================
New features:
- Improve voice detection by integrating "libfvad", a voice activity
detection library based on WebRTC's VAD engine [https://github.com/dpirch/libfvad]
- Now Julius has dual-mode VAD:
- old module (input level and zero-cross based)
- new module (libfvad = model based)
- both runs in parallel: both modules runs for an audio input stream
concurrently
- detect speech only when **both module triggers**!
- new module is disabled by default
- apply "-fvad arg" option to enable
- arg is a switch, "-fvad 0" for moderate mode and "-fvad 3" for aggressive mode.
- new module is available on all audio modules
- julius
- adinrec
- adintool
- adintool-gui
- typical usage:
- "-fvad -1" to use old VAD only (same as older versions)
- "-fvad 3 -lv 1" to use new VAD only. "-lv 1" forces the old VAD
to "always triggering", thus the final VAD result fully depends on
the new module.
- New multi-threaded DNN computation
- added "num_threads" option to dnnconf to specify number of CPU
threads to be used on DNN computation.
- default number of thread to be used is 2.
Modified:
- module output now performs XML Escape
- characters <, >, ", & and ' in output string are now escaped to
<, >, ", & and '. This escaping is enabled by
default from this version, however you can switch off this
escaping and keep old behavior by applying "-noxmlescape" option.
Bug fixes:
- fix Makefile for parallel build (make -j N)
- fix adintool-gui sometimes segfault with arguments
- fix several build warnings
- fix several memory leaks
- fix mis-compilation on some OS
[New run-time options]
[-fvad mode] set libfvad mode. "mode" is an integer value from -1 to 3, -1 to disable,
0 for moderate detection, 3 for aggressive detection (more likely to
drop speech-like noises). Default value is -1 (disabled)
[-fvad_param nFrame thres] set libfvad detailed parameter. "nFrame"
is the number of smoothing frame. "thres" is the threshold to
detect speech trigger [0.0-1.0]. Default values are 5 and 0.5
respectively.
[New configure options]
- "--disable-libfvad": disable libfvad integration
[Document update in progress]
- adinrec,adintool README.md README.ja.md
- julius README.md, Options.md
4.4.2.1 (2016.12.20)
====================
- Small fixes for Android and iOS.
- Clean up msvc dir.
4.4.2 (2016.09.12)
===================
- Improved handling of file paths in dnnconf, now correctly handled as relative to the dnnconf file.
- Improved DNN decoding that sometimes goes too slow and stack on 2nd pass.
- Fix segfault on old non-AVX Intel CPU with DNN.
- Fix errors in build process on ARM and VisualStudio.
4.4.1 (2016.09.07)
===================
- more stable and fast SIMD code: SSE, FMA and ARM_NEON
- automatically select suitable SIMD code at run time for DNN computation
- msvc support updated: PortAudio and zlib sources are now included in dist.
- fix incorrect reading of binary hmmlist made by "mkbinhmmlist"
- fix SDL detection in adintool-gui
- "INSTALL.txt" to share how to build Julius on various platform.
- pkg-config support
- other fixes
4.4 (2016.08.30)
=================
- DNN-HMM computation support
- "adintool-gui": adintool with input monitoring (see adintool/README-GUI.txt)
- "binlm2arpa": convert binary LM to ARPA format
- "mkbingram" now can convert text encoding of an LM by "-c" option
- fix not to exit at disconnection on module mode, wait for next instead.
- fix compilation errors in some recent OS
- fix memory leaks
- work on autoconf >=2.6
- added README.md, CONTRIBUTING.md and other files for GitHub hosting
- added document to use Julius with DNN-HMM AM: "00readme-DNN.txt"
- update support for VS2013
4.3.1 (2014.01.15)
===================
Fixed bugs:
- Compilation error on OS X.
- Unnecessary debug messages in adintool.
- Several bugs around reading / applying "-cmnload".
4.3 (2013.12.25)
=================
New features:
- FBANK and MELSPEC support.
- Network-based feature vector and outprob vector input.
- Static mean/variance for cepstral mean/variance normalization.
- State output probability (i.e. outprob) vector input for DNN-HMM decoding.
- State ID "<SID>" extension of hmmdefs for DNN-HMM decoding.
- Real-time feature extraction and network transmittion by 'adintool'.
Modified:
- "mkbinhmm" now keeps the state order and id of the original hmmdefs.
- For portaudio, pause / resume operation synced between engine and audio I/O
- Load / save cepstral mean/variance of CMN/CVN in HTK text format.
New options:
[-input vecnet] read feature / outprob vectors from network
[-input outprob] read outprob vectors from HTK parameter file
[-outprobout [file]] save computed outprob vectors to HTK file (for debug)
4.2.3 (2013.06.30)
==================
New features:
- Add function "j_reload_adddict()" to reload dictionaries.
- Add option "-lvscale factor" and func "j_adin_change_input_scaling_factor()"
to scale the amplitude of captured audio by the factor.
- Add option "-rejectlong msec" to reject too long input.
- Add minimum bayes risk decoding, contributed by H. Nanjo and R. Furutani
- Support binary N-gram symbol charset conversion by "mkbingram".
Fixes:
- Fix sending audio stream via network with incorrect byte order at
big-endian machines.
- Fix occasional failure of closing audio device at j_close_stream().
- Fix segfault when reading binary hmm created at 64bit env. with embedded parameters.
- Fix memory leak when failed to read an N-gram file.
- Fix memory leak when input length overflow is detected.
- Fix unable to load feature vector plugin.
- Update microphone input code for recent MacOSX.
4.2.2 (2012.08.01)
==================
Fixes:
- Now can be compiled without flex library
- Fix failure of reading binary N-gram when compiled with "--enable-words-int"
- Fix incorrect handling of file paths with backslash in jconf file at Windows
- Fix segfault when reading an errorous word dictionary.
- Fix occasional segfault which may occur while search.
4.2.1 (2011.12.25)
===================
New features:
- Add support for per-word insertion penalty setting at grammar
recognition. You can set different word insertion score for each word
entry at .dict file. For example, if you have an entry
15 [a] a
in .dict file and want to assign word insertion score of "-2.0" to
this word, you can write like this:
15 @-2.0 15 [a] a
The figure after "@" is the insertion penalty. The third
element should be the same as the first element.
- New option "-chunk_size" can specify the audio fragment size in
number of samples. The default value is 1000.
- At "adintool", enable input detection by default for standard input.
Fixed bugs:
- (IMPORTANT) CMN is not performed for C0 coef. This bug exists in
the versions from 4.1.3 to 4.2.
- "-forcedict" won't work for additional dictionaries given by "-adddict".
- Corrupted header of recorded WAV file when interrupted by CTRL+C.
- Occasional segfault when reading a wrongly formatted dictionary.
- Won't compile with configure option "--enable-word-graph".
- Segfault of "mkbingram" and "generate-ngram" at cygwin.
4.2 (2011.05.01)
=================
New features:
- Additional score-based pruning at the 1st pass. It is disabled by
default, you can enable by using an option "-bs arg". The argument
is score range.
- New support for PulseAudio (--with-mictype=pulseaudio)
- New Option "-adddict", "-addword" to read additional dictionaries / words.
- Portaudio library updated to V19. Audio capture device can be
changed by env. "PORTAUDIO_DEV_NUM". The device list will be
output at start up.
Changed behavior:
- "mkbinhmmlist" now saves pseudo phone list extracted from AM for
faster start up. The output should be used with the same AM
specified at generation. Note that the converted binhmmlist file
can not be used with older Julius.
- Audio library linking was modified at configure script.
When "--with-mictype=..." is explicitly specified, Julius will link
ONLY the audio library. If not specified, Julius will link all the
audio devices whose development file was detected by the configure.
Library functions:
- j_config_load_string_new(char *str): like j_config_load_file(), but
parse the given string to set parameters.
- add_dict(), add_word(): the same as "-adddict" and "-addword".
(They should be called at start up before starting engine)
- (portaudio/Windows) j_open_stream(recog, NUMSTR) to choose device NUM.
ex. 'j_open_stream(recog, "1")' will open device number one.
- (portaudio/Windows) get_device_list(): obtain list of available devices.
Fixes:
- Improved tree lexicon structure for better memory management.
- Reduce malloc calls at reading N-gram.
- Eliminated memory leaks using Valgrind.
- Workarounds to avoid crash with j_close_stream().
- Now allow "-iwsp" only with multi-path acoustic model.
4.1.5.1 (2010.12.25)
=====================
Modified:
- Fixed problem related to the license.
4.1.5 (2010.06.04)
===================
Bug fixes:
- Language model / decoding (these bugs may affect the ASR performance):
- Several wrong word insertion penalty handling on grammar was
found and fixed.
- Now correctly add the prob. of the first word at the second pass.
- MFCC computation:
- Support MFCC computation when liftering parameter (CEPLIFTER) = 0.
- Compilation:
- Fixes to build Julius on cygwin and MSVC.
- Supports "gcc -mno-cygwin" on cygwin.
- Compilation error with configure "--disable-plugin"
- Module mode:
- Unable to send grammar from jcontrol.
- Not working "DELPROCESS" command when SR and LM has different names.
- Other fixed bugs:
- wrong parsing of "-mapunk" option.
- "-htkconf" in a jconf file now correctly handles the file path as
relative to the jconf file.
- "-input stdin" now supports WAV format.
- not working "-plugin DIRNAME" on Win32/MSVC.
4.1.4 (2009.12.25)
===================
New feature:
- added function to choose input audio device on MSVC compiled Julius,
by specifying a device ID with env. var. "PORTAUDIO_DEV_NUM".
The available device IDs will be listed in the system log at start up.
- You can now set a locale for a LM in Julius.cpp.
Bug fixes:
- now can be compield on Mac OS X (OS X 10.6 SDK).
- fixes around portaudio for smaller latency and compatibility (Windows).
4.1.3 (2009.11.02)
===================
New features:
- new MSVC support: please read "msvc/00README.txt"
- extended N-gram to support arbitrary N
- portaudio external library (V19) can be used instead of internal V18.
When configure detects portaudio library installed in your system,
Julius will use it instead of internal V18. You can also choose
input device by "PORTAUDIO_DEV" env. var. at V19library. See the
log text at start up to know how to set it.
- allow word alignment output (-walign) in module mode
Modified:
- ! now Julius do not perform CMN on 0'th cepstral coefficients,
which is the same as the old 4.0.x versions.
- j_get_current_filename() added on JuliusLib
- improved "--enable-wpair" handling
Bug fixes:
- many bugs around audio open/close API on JuliusLib
- fail to do make in julius-simple
- unable to record inputs at cygwin
- segfault on adintool with "-server"
- occasional segfault at grammar recognition
4.1.2 (2009.02.12)
===================
[SRILM support]
- Added swapping "<s>" and "</s>" when reading BACKWARD ARPA file
trained by SRILM. It will be automatically detected. If detection
fails, you can specify an option "-swap" in mkbingram to do that.
- Internally modify the unigram probability of "<s>" or "</s>", since
they may be set to "-99" in SRILM model. The same value as
opposite will be assigned.
[N-gram]
- Size limit extended from 2GB to 4GB for big N-gram.
- "<unk>" and "<UNK>" can be changed by "-mapunk".
- More strict check for unknown words: Julius now terminates with
error when dictionary has OOV words and N-gram is not open (no unk word).
[Improvements]
- Faster successor list building algorithm
- Update yomi2voca.pl to cover more minor Japanese pronunciation.
- Workaround for audio buffer overrun in ALSA
[JuliusLib]
- Added API function "j_close_stream()" to exit main recognition loop.
[Bug Fixes]
- Fixed segfault on adintool when specifying multiple servers.
- Fixed compilation error on cygwin (libesd)
- Fixed segfault when not specifying "-input" option.
4.1.1 (2008.12.13)
===================
Bug fixes:
[N-gram]
- sometimes could not read an ARPA N-gram file trained by SRILM.
[A/D-in]
- "-input stdin" does not work.
- "SOURCERATE" at "-htkconf" is ignored.
[Forced alignments]
- now can be used in isolated word recognition and with "-1pass".
- "-palign", "-walign" and "-salign" can not be run together at a time.
[Module mode]
- freezes when a grammar is specified by its ID number.
- wrong grammar ID in recognition result (GRAM=.. always 0)
- "SYNCGRAM" will cause crash at isolated word recognition.
- unable to receive/activate/dactivate on isolated word recognition.
[Others]
- fails to compile on several OS (needs "-ldl").
- does not handle backslash escaping correctly in Jconf file.
- does not output the 1st pass result as a final result with "-1pass".
[Tools]
Jcontrol
- does not support "graminfo" command.
- can not send a dictionary to Julius running isolated word recognition.
mkdfa
- segfault on mkfa
- fails to read a grammar file on DOS format.
adintool
- wrong behavior when splitting a long audio file.
- now output time of each segment.
4.1 (2008.10.3)
================
New plugin extension:
- supported types:
- A/D-in plugin
- feature vector input plugin
- audio input monitor / postprocess plugin
- feature vector monitor / postprocess plugin
- result plugin
- can add arbitrary JuliusLib callback via plugin
- sample codes is included, with full documentation of function spec.
- run on Linux, Windows and other unix variants with dlopen() capability
Newly supported features:
- multi-stream feature input
- MSD-HMM (compatible with "HTS" toolkit)
- CVN
- frequency warping for VTLN (no estimation yet)
- "-input alsa", "-input oss" and "-input esd"
- perl version of jcontrol client "jclient-perl"
Modified:
- Restrict option orders when multiple instances defined (-AM, -LM, -SR):
- Option should be just after correspondence instance declaration.
(ex. LM options should be placed after "-LM" and before other
instance declaration.)
- Global option should be before any instance declaration, or
just after "-GLOBAL" option.
This new restriction can be removed by "-nosectioncheck" option.
Fixed bugs:
- "-record" fails to record the first silence part!
- Not working "-multigramout"
- environment variable expansion sometimes fail within jconf file.
- limits extended:
maximum HMM name length = 256 char, Number of HMM states unlimited.
- Module mode error message on grammar command.
Documents:
- Alpha version of "Juliusbook" (contains only manuals at this time)
- Unix manuals are moved to "man" directory.
4.0.2 (2008.5.27)
==================
New features:
- New option "-fallback1pass" will output 1st pass result as final result
when the 2nd pass fails.
- Added support for "USEPOWER=T" on feature extraction.
Modified:
- "-AM_GMM" becomes optional: GMM will share AM params if not specified.
Fixed:
- GMM rejection does not work (since 4.0.1)
- Cannot specify other A/D device on Linux/ALSA correctly.
- Sometimes fails to read a big N-gram.
- Sometimes crush with "-record" option.
- Callback timing modified on real-time input with sp-segment/GMM/VAD.
- Other minor fixes.
4.0.1 (2008.3.12)
==================
New features:
A/D-in
- ALSA now become default on Linux instead of OSS.
Module mode
- "ACTIVATEGRAM", "DEACTIVATEGRAM" and "DELGRAM" now accepts
grammar name as arguments in addition to grammar ID number.
- new command "GRAMINFO" to get list of current grammars.
Fixed bugs:
A/D-in
- ALSA codes updated to work on 1.x drivers.
- segfault with "-48".
- segfault on MFCC input with zero frames with "-spsegment".
VAD
- CMN not working on spsegment/GMM-VAD/decoder-VAD with microphone input.
Acoustic model
- Error when no short-pause model defined in multi-path mode.
N-gram
- incorrect 2-gram prob on 1st pass with backward N-gram only.
- incorrect 1-gram prob for unknown words.
- fail to read some ARPA files with no back-off compaction.
- read failure or segfault on big N-gram with over 24bit entries.
- redundant index for back-off weights in some case.
Word recognition
- incorrect N-best output with "-output N" on word recognition.
Installation
- "make install" fails on cygwin.
Source code
- Static variables in functions that are not meant to be static
are made local.
- Global variables in search are moved to StackDecode.
4.0 (2007.12.19)
=================
For more detail about new features in 4.0, please see other document.
- Re-constructed all data structures and re-organize source code.
- Core engine now becomes a library called JuliusLib, with API and callbacks.
- Multi-model decoding now available.
- Modularize language model handling, and merge Julian to JuliusLib.
- Support longer N-gram (N > 3).
- User-defined LM function support.
- Handy isolated word recognition mode.
- Confusion network output.
- Improvements in short-pause segmentation, especially for live input.
- GMM-based VAD.
- Decoder-based VAD.
- Integrated many compile-time options.
- Reduce memory usage.
- Sample application to use the JuliusLib is included: "julius-simple".
- Update tools:
- "adintool" supports multi-server mode.
- "generate-ngram" newly added to generate sentences from N-gram
3.5.3 (2006.12.29)
===================
o Improved Performance:
- acoustic computation optimized: now becomes 20%-40% faster!
- optimize memory access: re-use work area of deleted hypothesis
in the 2nd pass.
- some memory allocation improvement on dictionary and word trellis.
o New Grammar Tools:
- "dfa_minimize", "dfa_determinize" will minimize/determinize DFA.
mkdfa.pl now calls dfa_mimize in it.
- "slf2dfa": a toolkit to convert HTK slf to Julian dfa (separate kit)
o Embedding HTK Acoustic Parameters:
- add option to load HTK Config file to set correct acoustic parameter
configuration at recognition time.
- the acoustic parameter configuration can be embedded into
header of a binary HMM file.
o Improved Word Graph:
- add an option to completely separate graph words: words with
different phone contexts can be output separatedly by
"-graphrange -1".
o Support for online energy normalization:
- Preliminary support for live recognition using acoustic model with
energy normalization. (approximate with maximum energy of last input)
o Code refinements:
- re-organize libsent/src/wav2mfcc.
- modularize acoustic parameter (Value) handling.
- output compile-time configuration of libsent with "--setting" option.
- Doxygen 1.5.0 support.
- "julius-info@lists.sourceforge.jp" becomes the official contact address.
- fixed typo on copyright notice.
o Fixed bugs:
- sometimes unable to read a binary LM on "--enable-words-int".
- memory leaks around option handling, global variables and local buffers.
- segmentation fault on very long input.
- doublely counted initial state of DFA.
- mkdfa.pl: unable to find mkfa on some OS.
- adintool: makes empty output file on termination.
- adintool: miss last inputs when killed.
- other small changes.
3.5.2 (2006.07.31)
===================
o Speed-up and improvement on Windows console:
- Support DirectSound for better input handling
- Support input threading utilizing callback API on portaudio.
- Support newest MinGW (tested on 5.0.2)
o More accurate word graph output:
- Add option to cut the resulting graph by its depth
(option -graphcut, and enabled by default!)
- Set limit for post-processing loop to avoid infinite loop
(option -graphboundloop, and set by default)
- Refine graph generation algorithm concerning dynamic word merging
and search termination on the second pass.
o Add capability to output word graph instead of trellis on 1st pass:
- 1st pass generates word graph instead of word trellis as
intermediate result by specifying "--enable-word-graph".
In that case, the 2nd pass will be restricted on the graph, not
on the whole trellis.
- With "--enable-word-graph" and "--enable-wpair" option, the
first pass of Julius can perform 1-pass graph generation based
on 2-gram with basically the same algorithm as other popular
word graph based decoders.
o Bug fixes:
- configure script did not work on Solaris 8/9
- "-gprune none" did not work on tied-mixture AM
- Incorrect error message for AM with duration header other than "NULLD"
- Always warns abount zero frame stripping upon MFCC
o Imprementation improvements:
- bmalloc2-based AM memory management
3.5.1 (2006.03.31)
===================
o Wider MFCC types support:
- Added extraction of acceleration coefficients (_A). Now you
can recognize waveform or microphone input with AM trained with _A.
- Support all MFCC qualifiers (_0, _E, _N, _D, _A, _N, _Z) and their
combination
- Support for any vector lenth (will be guessed from AM header)
- New option: "-accwin"
- New option "-zmeanframe": frame-wise DC offset removal, like HTK
- New options to specify detailed analysis parameters (see manual):
-preemph, -fbank, -ceplif, -rawe / -norawe,
-enormal / -noenormal, -escale, -silfloor
o Improved microphone / network recognition by MAP-CMN:
- New option "-cmnmapweight" to change MAP weight
- Option "-cmnload" can be used to specify the initial cepstral
mean at startup
- Cepstral mean of last 5 second input is used as an initial mean
for each input. You can inhibit updating of the initial mean
and keep the value loaded by "-cmnload" by option "-cmnnoupdate".
o Module issue:
- Julius now outputs "<STARTPROC/>" when recognition starts, and
"<STOPPROC/>" after recognition stopped by module command.
Use this for safer server-client synchronization.
- now can specify grammar name from client by specifying a name
after a command like "ADDGRAM name" or "CHANGEGRAM name".
o Bug fixes:
- Sometimes segfault on pause/resume command on module mode while input.
- Can not read N-gram with tuples > 2^24.
- Can not read HMM with 3-state (1 output state) model on multi-path.
- Sometimes omit the last transition definition in DFA file.
- Sometimes fails to compile the gramtools on MacOSX.
3.5 (2005.11.11)
=================
o New features:
- Input verification / rejection using GMM (-gmm, -gmmnum, -gmmreject)
- Word graph output (--enable-graphout, --enable-graphout-nbest)
- Pruning on 2nd pass based on local posterior CM (--enable-cmthres)
- Multiple/per-grammar recognition (-gram, -gramlist, -multigramout)
- Can specify multiple grammars at startup: "-gram prefix1,prefix2,..."
or "-gramlist listfile" where listfile contains list of prefixes.
- General output character set conversion "-charconv from to"
based on iconv (Linux) or Win32API+libjcode (Windows)
o Improved audio inputs on Linux:
- ALSA-1.x support. (--with-mictype=alsa)
- EsounD daemon input support. (--with-mictype=esd)
- Fixed some bugs on USB audio input.
- Audio capturing device can be specified via env. "AUDIODEV".
- Extra microphone API support using portaudio and spLib API.
o Performance improvements:
- Reduced memory size for beam operation on the 1st pass.
- Slightly optimized tree lexicon by removing redundant data.
- Reduced size of word N-gram index (reduced from 32 bit to 24 bit).
o Fixed bugs:
- Not working spectral subtraction.
- Memory leak when stack exhausted ("stack empty") on 2nd pass.
- Segmentation fault on a very short input of 1 to 4 frames.
- AM trained with no CMN cannot be used with waveform/mic input.
- Wrong short-pause word handling on successive decoding mode.
(--enable-sp-segment)
- No output of "maxcodebooksize" at startup.
- No output of the number of sentences found when stack exhausted.
- No output of "-separatescore" on module mode.
- Beam width does not adjusted when grammar has been changed and
full beam options (-b 0) is specified in Julian.
- Wrong update of category-aware cross-word triphones when
dynamically switching grammar on Julian.
- No output of grammar to stdout on multiple grammar mode.
- Unable to send/receive audio data between different endian machines.
- (Linux) crash when compiled with icc.
- (Linux) some strange behavior on USB audio.
- (Windows) confuse with CR/LF newline inputs in several text inputs.
- (Windows) mkdfa.pl could not work on cygwin.
- (Windows) sometimes fails to read a file when not using zlib.
- (Windows) wrong file suffix when recording with "-record" (.raw->.wav)
o Unified source code:
- Linux and Windows version are integrated into one source.
- Multi-path version has been integrated with the normal version
into one source. The multi-path version of Julius/Julian, that
allows any transitions of HMMs including model skip transition,
can be compiled by "--enable-multipath" option. The part of
source codes for the multi-path version can be identified
by the definition "MULTIPATH_VERSION".
o Other improvements:
- Now can be compiled on MinGW/MSYS on Windows
- Totally rewritten comments in entire source in Doxygen format.
You can generate fully browsable source documents in English.
Try "make doxygen" at the top directory (you need doxygen installed)
- Install additional executables of julius/julian with version and setting
names like "julius-3.5-fast" when "make install" is invoked.
- Updated LICENSE.txt with English translation for reference.
o Changed behaviors:
- Binary N-gram file format has been changed for smaller size.
The old files can still be read directly by julius, in which
case on-line conversion will be performed at startup.
You can convert the old files (3.4.2 and earlier) to the new
format with the new mkbingram by involing the command below:
"mkbingram -d oldbinary newbinary"
Please note that since mkbingram now output the new format
file, it can not be read by older Julius.
The binary N-gram file version can be detected by the first 17
bytes of the file: old format should be "julius_bingram_v3" and
new format should be "julius_bingram_v4".
- Byte order of audio stream via tcpip fixed to LITTLE ENDIAN.
- Now use built-in zlib by default for compressed files. This may
make the engine startup slower, and if you prefer, you can still
use the previous method using external gzip command by specifying
"--disable-zlib".
- (Windows) Changed the compilation procedure on VC++. You can build
Julian by only specifying "-DBUILD_JULIAN" at compiler option,
and do not need to alter "julius.h".
3.4.2 (2004.03.31)
===================
- New option "-rejectshort msec" to reject short input.
- More stable PAUSE/RESUME on module mode with adinnet input.
- Bug fixes:
- Memory leak on very short input.
- Missing Nth result when small vocabulary is used.
- Hang up of "generate" on small grammar.
- Cosmetic changes:
- Cleanup codes to confirm for 'gcc -Wall'.
- Update of config.guess and config.sub.
- Update of copyright to 2004.
3.4.1 (2004.02.25)
===================
- AM and LM computation method is slightly modified to improve search
stability of 2nd pass. These modification are enabled by default, and
MAY IMPROVE THE RECOGNITION ACCURACY as compared with older versions.
- fixed overcounting of LM score for the expanded word.
- new inter-word triphone approximation (-iwcd1 best #) on 1st
pass. This new algorithm now becomes default.
- Newly supports binary HMM (original format, not compatible with HTK).
A tool "mkbinhmm" converts a hmmdefs(ascii) file to the binary format.
- MFCC computation becomes faster by sin/cos table lookup.
- Bugs below have been fixed:
- (-input adinnet) recognition does not start immediately after speech
inputs begin when using adinnet client.
- (-input adinnet) together with module mode, speech input cannot
stop by pause/terminate command.
- (-input adinnet) unneccesary fork when connecting with adinnet client.
- (-input rawfile) error in reading wave files created by Windows
sound recorder.
- (CMN) CMN was applied any time even when acoustic models does not want.
- (AM) numerous messages in case of missing triphone errors at startup.
- (adintool) immediately exit after single file input.
- (sp-segment) fixed many bugs relating short pause word and LM
- (sp-segment) wow it works with microphone input.
- (-[wps]align) memory leak on continuous input.
- Add option to remove DC offset from speech input (option -zmean).
- (-module) new output message:
'<INPUTPARAM FRAMES="input_frame_length" MSEC="length_in_msec">'
- Optional feature "Search Space Visualization" is added (--enable-visualize)
- HTML documentations greatly revised in doc.
New argument: "-iwcd1 best #" "-zmean"
New configure option: "--disable-lmfix", "--enable-visualize"
3.4 (2003.10.01)
===================
- Confidence measure support
- New parameter "-cmalpha" as smoothing coef.
- New command "-outcode C" to output CM in module output
- Can be disabled by configure option "--disbale-cm"
- Can use an alternate CM algorithm by configure option "--enable-cm-nbest"
- Class N-gram support
- Can be disabled by configure option "--disable-class-ngram"
- Factoring basis changed from N-gram entry to dictionary word
- WAV format recording in "adinrec", "adintool" and "-record" option
- Modified output message
startup messages,
engine configuration message in --version and --help,
- Fixes:
some outputs in module mode,
bug in only several frame input (realtime-1stpass.c),
long silence at end of segmented speech
miscompilation with NetAudio,
word size check in binary N-gram,
bug in acoustic computation (gprune_none.c).
"-version" -> "-setting", "-hipass" -> "-hifreq", "-lopass" -> "-lofreq"
3.3p4 (2003.05.06)
===================
- Fixes for audio input:
- Fix segfault/hangup with continuous microphone input.
- Fix client hangup when input speech too long in module mode.
(now send an buffer overflow message to the client)
- Fix audio input buffering for very short input (<1000 samples).
- Fix blocking handling in tcpip adin.
- Some cosmetic changes (jcontrol, LOG_TEN, etc.)
3.3p3 (2003.01.08)
===================
- New inter-word short pause handling:
- [Julius] New option added for short pause handling. Specifying
"-iwspword" adds a short-pause word entry, namely "<UNK> [sp] sp sp",
to the dictionary. The entry content to be changed by using "-iwspentry".
- [multi-path] Supports inter-word context-free short pause handling.
"-iwsp" option automatically appends a skippable short pause model at
every word end. The added model will also be ignored in context
modeling. The short pause model to be appended by "-iwsp" can be
specified by "-spmodel" options. See documents for details.
- Fixes for audio input:
- Input delay improved: the initial response to mic input now
becomes much faster than previous versions (200ms -> 50ms approx.).
- Would not block when other process is using the audio device, but
just output error and exit.
- Update support for libsndfile-1.0.x.
- Update support for ALSA-0.9.x
(to use this, add "--with-mictype=alsa" to configure option.)
3.3p2 (2002.11.18)
===================
- [multi-path version] Supports model-skip transition. From
this version, you can use "any" type of state transition in HTK
format acoustic model.
- New feature: "-record dir" records speech inputs sucessively
into the specified directory with time-stamp file names.
- fix segfault on Solaris with "-input mfcfile".
- fix blocking command input when using module mode and adinnet together.
- modified the output flush timing to make sure the last recognition
result will be output immediately.
3.3p1 (2002.10.15)
===================
Following bugs are fixed:
- Fixed incorrect default value of language weights for second pass (-lmp2).
- Fixed sometimes read failure of dictionary file (double space enabled).
- Fixed wrong output of "-separatescore" together with monophone model.
3.3 (2002.09.12)
==================
The updates and new features from rev.3.2 is shown below.
- New features added:
- Server module mode - control Julius (input on/off, grammar switching)
from other client process via network.
- Online grammar changing and multi-grammar recognition supported.
- Noise robustness:
- Spectral subtraction incorporated.
- Support more variety of acoustic models:
- "multi-path version" is available that allows any transition
including loop, skip and parallel transition.
- A little improvement of recognition performance by bug fixes
- Other minor extensions (CMN parameter saving, etc.)
- Many bug fixes
English documents are available in
o online manuals (will be installed by default), and
o Translated full documentation in PDF format: Julius-3.2-book-e.pdf.
We are sorry that current release contains only documents for old rev.3.2.
We are now working to update it to catch up with the current rev.3.3 version.