Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: spelling correction #228

Merged
merged 25 commits into from
Dec 14, 2018
Merged

feat: spelling correction #228

merged 25 commits into from
Dec 14, 2018

Conversation

nameoverflow
Copy link
Member

@nameoverflow nameoverflow commented Nov 15, 2018

  • Search for similar spellings when building syllable graph (by radius nearest neighbour search and symmetric deleted search).
  • Show limited correction candidates.
  • Only work on script translator and standard QWERTY keyboard layout for now.
  • Set translator/enable_correction: true in schema to enable.

@lotem
Copy link
Member

lotem commented Nov 15, 2018

有實力啊

src/rime/algo/corrector.cc Outdated Show resolved Hide resolved
thirdparty/include/darts.h Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
@lotem
Copy link
Member

lotem commented Nov 19, 2018

有實力啊,

@nameoverflow
Copy link
Member Author

CI 不支持 C++ 17 ?

@nameoverflow nameoverflow requested review from Prcuvu and lotem November 27, 2018 15:25
Copy link
Member

@lotem lotem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to thank you for what you have done. Research, design and coding.
That's a lot of output and I haven't thoroughly read all the code. Send some early feedback for your consideration.

// let end_vertex_type be the best (smaller) type of spelling
// that ends at the vertex
if (end_vertex_type > props.type) {
if (end_vertex_type > props.type && !props.is_correction) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a position can only be reached via correction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be kept as normal spelling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. End vertex of a correction edge should be marked as normal spelling even there is a worse typed edge overlapped.

Copy link
Member Author

@nameoverflow nameoverflow Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the worse overlapped edge should be deleted?
OK. It should keep the behavior as there isn't a correction.

@@ -9,8 +9,11 @@
#include <boost/range/adaptor/reversed.hpp>
#include <rime/dict/prism.h>
#include <rime/algo/syllabifier.h>
#include <rime/gear/corrector.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency graph of possible split modules:

core <-- {algo, dict} <-- {gear, lever}

gear depends on algo, what happens here is a circular dependency.
Is it easy to move corrector to algo?

Copy link
Member Author

@nameoverflow nameoverflow Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was moved to gear because it is registered as a component in gears_module.cc and there is not a algo_module.cc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving corrector to src/rime/dict, and register the component in dict_module.cc ?


if (current_pos > farthest)
farthest = current_pos;
DLOG(INFO) << "current_pos: " << current_pos;

// see where we can go by advancing a syllable
vector<Prism::Match> matches;
prism.CommonPrefixSearch(input.substr(current_pos), &matches);
set<SyllableId> matches_set;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it match_set

matches_set.insert(m.value);
}
if (enable_correction_) {
// NearSearchCorrector corrector;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete it if the code is obsolete.

continue;
}
end_vertices[end_pos].swap(spellings);
// end_vertices[end_pos].swap(spellings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this line if it's obsolete.


auto file_name = resolver_->ResolvePath(prism_name).string();

auto edCorrector = New<EditDistanceCorrector>(file_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistent style, name it ed_corrector

#include <rime/service.h>
#include <rime/ticket.h>
#include <rime/schema.h>
#include "rime/gear/corrector.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it doesn't depend on anything in rime/gear and can live in rime/algo or rime/dict.
I see the use of Prism here. So probably the latter.

} while ( // limit the number of correction candidates
enable_correction_ && is_correction &&
corrections_count_ > max_corrections_);
if (is_correction) corrections_count_++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use prefix ++

DictEntryCollector::reverse_iterator phrase_iter_;
UserDictEntryCollector::reverse_iterator user_phrase_iter_;
size_t user_phrase_index_ = 0;

size_t max_corrections_ = 4;
size_t corrections_count_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it correction_count_

is_correction = syllabifier_->IsCandidateCorrection(*candidate_);
}
} while ( // limit the number of correction candidates
enable_correction_ && is_correction &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: break into two lines
to make it more obvious how many conditions are chained, use different lines for every item if one line cannot contain all items in the same level.

@nameoverflow
Copy link
Member Author

I think ojbk

@lotem lotem merged commit ad3638a into rime:master Dec 14, 2018
@osfans
Copy link
Contributor

osfans commented Dec 17, 2018

看來九宮格也有希望了?

osfans added a commit to osfans/trime that referenced this pull request Dec 17, 2018
@nameoverflow
Copy link
Member Author

九宫格要重定义 keyboard_map,可能之后可以改成能自定义的?

@lotem
Copy link
Member

lotem commented Dec 17, 2018

@osfans @nameoverflow 九宮格誤觸的可能性不高吧?似乎不太需要糾錯這個功能?

@osfans
Copy link
Contributor

osfans commented Dec 19, 2018

嗯,誤觸的可能性不大。
不過九宮格可以用來模糊編碼,缺的是顯示所有可能的正確編碼,供進一步篩選。
那可能就需要增加接口,訪問CandidateSyllableList之類的。

@cuichungman
Copy link

請問「拼寫糾錯」功能是否可應用於中文拼音(Jyutping)?

@Jreen
Copy link

Jreen commented Feb 15, 2019

你好,我发现当我开启拼写纠错的时候,比如说我要打hun,给候选栏给出了jin这些字的参考。因为使用太不方便了,我只好暂时注释了 "translator/enable_correction": true。我很好奇的是,在什么情况在会出现这样的情况呢?

@lotem
Copy link
Member

lotem commented Feb 15, 2019

你好,我发现当我开启拼写纠错的时候,比如说我要打hun,给候选栏给出了jin这些字的参考。

@nameoverflow 可以再調低些糾錯候選的權重。糾錯最有用的時候是準確匹配的音節或詞組缺失。否則可以讓糾錯結果排在後面一些些,作爲補充。

我做了一項重構,現在權重用log計算了。有空可以調調參數。

@jimmy54
Copy link

jimmy54 commented Apr 26, 2019

是否可以不依赖script_translator .独立一下translator出来呢?

@nameoverflow
Copy link
Member Author

现在每种 translator 都有自己写死的分词方式,要么都写一遍要么重构 translator 的分词。不过别的 translator 对这个功能真的有需求?

@jimmy54
Copy link

jimmy54 commented May 3, 2019

五笔等形码方案都不使用script_tran的。所以基本形码都用不了。

@jimmy54
Copy link

jimmy54 commented May 3, 2019

或者在table_translator实现一次逻辑,也是可以满足的。不过,是否有更好扩展的方案呢。这样就一劳永逸。

@zoenglinghou
Copy link

自动纠错感觉还是太敏感了,间接影响到了emoji的输入

@cuichungman
Copy link

cuichungman commented Feb 24, 2020 via email

@zoenglinghou
Copy link

@cuichungman 多謝邀請,不過輸入法始終都係敏感地帶,我希望可以用我可以信任嘅輸入法😂️

@ayaka14732
Copy link
Collaborator

It is not uncommon that correct words are mistakenly corrected into wrong words. Maybe this is a bug.

@lotem
Copy link
Member

lotem commented Sep 8, 2020

It is not uncommon that correct words are mistakenly corrected into wrong words. Maybe this is a bug.

如果候選包含「叫絕」且排在「交界」等糾錯結果後,應該屬於糾錯算法的問題。否則可能只是因爲詞典缺失了「叫絕」這個詞語。

@ayaka14732
Copy link
Collaborator

It is not uncommon that correct words are mistakenly corrected into wrong words. Maybe this is a bug.

如果候選包含「叫絕」且排在「交界」等糾錯結果後,應該屬於糾錯算法的問題。否則可能只是因爲詞典缺失了「叫絕」這個詞語。

經過檢查詞典中確有「叫絕」一詞

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Feb 20, 2021
Changelog:
## [1.7.2](rime/librime@1.7.1...1.7.2) (2021-02-07)


### Bug Fixes

* **chord_composer:** should clear raw input after committing text ([79b34ab](rime/librime@79b34ab))



## [1.7.1](rime/librime@1.7.0...1.7.1) (2021-02-06)


### Bug Fixes

* **chord_composer:** press Return key to commit raw key sequence ([2b25861](rime/librime@2b25861))



# [1.7.0](rime/librime@1.6.1...1.7.0) (2021-01-17)


### Bug Fixes

* **chord_composer:** more safely handle the placeholder ZWSP ([025d9fb](rime/librime@025d9fb))
* **cmake:** use full paths defined by GNUInstallDirs ([bb8c263](rime/librime@bb8c263)), closes [#424](rime/librime#424)
* **opencc:** update submodule to fix [#425](rime/librime#425) ([3fa1571](rime/librime@3fa1571))
* **script_translator:** always_show_comments also applies to phrases ([440a97c](rime/librime@440a97c)), closes [#272](rime/librime#272) [#419](rime/librime#419)
* **table_translator:** index out of bound access in string ([ff7acdc](rime/librime@ff7acdc))


### Features

* **chareset_filter:** add CJK Compatibility Ideographs in is_extended_cjk() ([3cb1128](rime/librime@3cb1128)), closes [#305](rime/librime#305)
* **setup:** find and load external RIME plugins as shared libs [#431](rime/librime#431) ([b2abd09](rime/librime@b2abd09))



## [1.6.1](rime/librime@1.6.0...1.6.1) (2020-09-21)


### Bug Fixes

* **rime_api.cc:** dangling pointer returned from RimeGetSharedDataDir ([78abaa8](rime/librime@78abaa8))



# [1.6.0](rime/librime@1.5.3...1.6.0) (2020-09-20)


### Bug Fixes

* **ascii_composer:** do not comsume Shift key release ([debc2c0](rime/librime@debc2c0))
* **ascii_composer:** first read ascii_composer/good_old_caps_lock from schema config ([3fc56c4](rime/librime@3fc56c4))
* **chord_composer:** commit raw input with uppercase letters ([cc983d5](rime/librime@cc983d5))
* **CMakeLists.txt:** ensure paths in pkgconfig file are absolute ([0e96e51](rime/librime@0e96e51))
* **CMakeLists.txt:** would not use signals v1 due to a typo ([6662a28](rime/librime@6662a28)), closes [#225](rime/librime#225)
* **custom_settings:** accept "*.schema" as config id ([604da0b](rime/librime@604da0b))
* **dict:** issues with user db recovery ([0f3d0df](rime/librime@0f3d0df))
* **dict_compiler:** build prism with loaded syllabary when not rebuilding primary table ([93fe827](rime/librime@93fe827))
* **plugins/CMakeLists.txt:** avoid rime_library linking to itself via rime_plugins_deps ([fe744db](rime/librime@fe744db))
* **rime_api.cc:** check struct has member of non-pointer type ([090dfa4](rime/librime@090dfa4))
* **rime_api.cc:** using unchecked fields introduced an ABI breakage ([62bbead](rime/librime@62bbead)), closes [/github.com/rime/librime/pull/328#pullrequestreview-335125464](https://github.com//github.com/rime/librime/pull/328/issues/pullrequestreview-335125464)
* **rime_test:** set data directories to working directory using rime::SetupDeployer API ([7c08a90](rime/librime@7c08a90))
* **simplifier:** opencc::DictEntry::Values() type change in opencc 1.1.0 ([beae5b1](rime/librime@beae5b1)), closes [#367](rime/librime#367)
* **user_db:** pointer cast error caused by multiple inheritance ([2ed780b](rime/librime@2ed780b))
* use official emoji 12.0 data ([#304](rime/librime#304)) ([75a60dc](rime/librime@75a60dc))


### Features

* **api:** implement capnproto api ([873f648](rime/librime@873f648))
* **api:** include candidate labels in proto message ([aae7a0c](rime/librime@aae7a0c))
* **charset_filter:** support charset options with emoji ([#293](rime/librime#293)) ([943c95b](rime/librime@943c95b))
* **charset_filter:** support CJK Unified Ideographs Extension G ([#393](rime/librime#393)) ([0a1573d](rime/librime@0a1573d))
* **chord_composer:** support chording with Shift keys ([94cf479](rime/librime@94cf479))
* **chord_composer:** use Control, Alt, Shift to input chord ([f3a2ad0](rime/librime@f3a2ad0))
* **dictionary:** packs extends the dictionary with extra binary table files ([930074c](rime/librime@930074c))
* **key_binder:** bind key to a key sequence ([3b5dbf6](rime/librime@3b5dbf6)), closes [#301](rime/librime#301)
* **logging:** setup min log level, log dir and set file mode to log files ([90839b0](rime/librime@90839b0))
* **selector:** support 4 combinations of horizontal/vertical text orientation and stacked/linear candidate list layout ([c498f71](rime/librime@c498f71))
* **selector:** support vertical UI ([dbb35c6](rime/librime@dbb35c6))
* **switcher:** enable schema in cases where conditions are met ([217c72b](rime/librime@217c72b))
* **tools/rime_proto_console:** demo for proto api ([d88ef9f](rime/librime@d88ef9f))


### Performance Improvements

* **poet:** optimize for performance in making sentences (~40% faster) ([0853465](rime/librime@0853465))



## [1.5.3](rime/librime@1.5.2...1.5.3) (2019-06-22)


### Bug Fixes

* **cmake, xcode.mk:** find optional dependency icu, while building xcode/release-with-icu target  [skip appveyor] ([17a80f8](rime/librime@17a80f8))
* **single_char_filter:** broken in librime 1.5.2 ([6948a62](rime/librime@6948a62))


### Features

* **appveyor:** build variant "rime-with-plugins" for tagged commits  [skip travis] ([eef8c30](rime/librime@eef8c30))
* **travis-ci:** build variant "rime-with-plugins" for tagged commits  [skip appveyor] ([cf11c27](rime/librime@cf11c27))
* **travis-ci:** deploy artifacts for macOS to GitHub releases  [skip appveyor] ([3f03784](rime/librime@3f03784))



## [1.5.2](rime/librime@1.5.1...1.5.2) (2019-06-17)


### Bug Fixes

* **user_dictionary, contextual_translation:** fix user phrase quality; order contextual suggestions by type ([69d5c32](rime/librime@69d5c32))



## [1.5.1](rime/librime@1.5.0...1.5.1) (2019-06-16)


### Bug Fixes

* **user_dictionary:** make user phrases comparable in weight to system words ([982f69d](rime/librime@982f69d))



# [1.5.0](rime/librime@1.4.0...1.5.0) (2019-06-06)


### Bug Fixes

* **ci:** update build script ([84a1a1b](rime/librime@84a1a1b))
* **ci:** use submodules in AppVeyor CI build script ([7b515b4](rime/librime@7b515b4))
* **cmake:** libboost Windows XP compatibility fix ([#270](rime/librime#270)) ([fecfe39](rime/librime@fecfe39)), closes [rime/weasel#337](rime/weasel#337)
* **CMakeLists.txt:** install header files in all platforms ([821d563](rime/librime@821d563))
* **CMakeLists.txt:** set "-std=c++11" in CMAKE_CXX_FLAGS ([5d8a836](rime/librime@5d8a836))
* **config/plugins.h:** memory leak caused by non-virtual destructor ([316a659](rime/librime@316a659)), closes [#259](rime/librime#259)
* **deploy:** treat schema dependencies as optional; do not report errors if missing ([ff3d5e9](rime/librime@ff3d5e9))
* **engine:** schema doesn't match the one used by switcher ([e41bb63](rime/librime@e41bb63)), closes [#269](rime/librime#269)
* **rime_levers_api.h:** customize_bool() misused `bool` type ([42bacc5](rime/librime@42bacc5))
* **syllabifier:** enable_completion not working ([2714131](rime/librime@2714131)), closes [#343](rime/librime#343)
* **table_translator:** null pointer exception when dict entries are filtered ([77438a9](rime/librime@77438a9))
* **test:** compile error in unit test ([7076d9e](rime/librime@7076d9e))
* **travis-install.sh:** working directory ([97220ce](rime/librime@97220ce))


### Features

* **appveyor:** install RIME_PLUGINS  [skip travis] ([c7ce66f](rime/librime@c7ce66f))
* **CMakeList.txt:** add plugin build support ([#257](rime/librime#257)) ([dfa341b](rime/librime@dfa341b))
* **contextual_translation:** weight and re-order phrases by context ([2390da3](rime/librime@2390da3))
* **dict:** specify vocabulary db name in dict settings ([dcdc301](rime/librime@dcdc301))
* **grammar:** compare homophones/homographs in sentence ([9248a6b](rime/librime@9248a6b))
* **install-plugins.sh:** git-clone or update plugins ([70483b4](rime/librime@70483b4))
* **poet:** find best sentence candidates ([b3f4005](rime/librime@b3f4005))
* **rime_api:** get candidate list from index ([c587900](rime/librime@c587900))
* **translator:** contextual suggestions in partially selected sentence ([12a7501](rime/librime@12a7501))
* **translator:** look at preceding text when making sentence ([6ae34de](rime/librime@6ae34de))
* **travis-ci:** install plugins specified in envvar RIME_PLUGINS ([c857639](rime/librime@c857639))


### Performance Improvements

* **dictionary:** refactor DictEntryIterator and do partial sort ([0258c7f](rime/librime@0258c7f))


### BREAKING CHANGES

* **rime_levers_api.h:** in signature of C API function `customize_bool()`,
change type `bool` to `Bool` (alias of `int`).

Impact: the changed function is not in use by any first party code,
known to be in use by osfans/trime.



# [1.4.0](rime/librime@1.3.2...1.4.0) (2019-01-16)


### Bug Fixes

* **config:** user_config should not fall back to shared data ([68c8a34](rime/librime@68c8a34)), closes [#271](rime/librime#271)
* **SymlinkingPrebuiltDictionaries:** remove dangling symlinks ([5ad333d](rime/librime@5ad333d)), closes [#241](rime/librime#241)
* **SymlinkingPrebuiltDictionaries:** remove dangling symlinks ([f8e4ebf](rime/librime@f8e4ebf)), closes [#241](rime/librime#241)


### Features

* spelling correction ([#228](rime/librime#228)) ([ad3638a](rime/librime@ad3638a))
* **Dockerfile:** for build ([#246](rime/librime#246)) ([cafd0d5](rime/librime@cafd0d5))



## [1.3.2](rime/librime@1.3.1...1.3.2) (2018-11-12)


### Bug Fixes

* **CMakeLists.txt:** do not link binaries when building static library ([99573e3](rime/librime@99573e3))
* **CMakeLists.txt:** do not require boost::signals, which will be deprecated in Boost 1.69 ([8a9ef3b](rime/librime@8a9ef3b)), closes [#225](rime/librime#225)
* **config_compiler:** ambiguous operator overload with cmake option ENABLE_LOGGING=OFF ([b86b647](rime/librime@b86b647)), closes [#211](rime/librime#211)
* **config_compiler:** support creating list in-place by __patch and __merge ([0784eb0](rime/librime@0784eb0))
* **table_translator:** enable encoding uniquified commit history ([74e31bc](rime/librime@74e31bc))


### Features

* **language:** shared user dictionary per language (Closes [#184](rime/librime#184)) ([#214](rime/librime#214)) ([9f774e7](rime/librime@9f774e7))
* always_show_comments option ([#220](rime/librime#220)) ([19cea07](rime/librime@19cea07))



## [1.3.1](rime/librime@1.3.0...1.3.1) (2018-04-01)


### Bug Fixes

* **config_file_update:** clean up deprecated user copy ([#193](rime/librime#193)) ([8d8d2e6](rime/librime@8d8d2e6))
* **thirdparty/src/leveldb:** do not link to snappy library ([6f6056a](rime/librime@6f6056a))



# 1.3.0 (2018-03-09)


### Bug Fixes

* **CMakeLists.txt, build.bat:** install header files (public API) ([06c9e86](rime/librime@06c9e86))
* **config_compiler:** "/" mistaken as path separator in merged map key ([#192](rime/librime#192)) ([831ffba](rime/librime@831ffba)), closes [#190](rime/librime#190)
* **ConfigFileUpdate:** no need to create user build if shared build is up-to-date ([cafd5c4](rime/librime@cafd5c4))
* **SchemaUpdate:** read compiled schema from shared build if there is no user build ([45a04dd](rime/librime@45a04dd))
* **simplifier:** fix typo ([9e1114e](rime/librime@9e1114e)), closes [#183](rime/librime#183)
* **user_db:** unwanted implicit instantiation of UserDbFormat template ([3cbc9cb](rime/librime@3cbc9cb)), closes [#188](rime/librime#188)


### Chores

* **release tag:** deprecating tag name prefix 'rime-' in favor of semver 'X.Y.Z'


### BREAKING CHANGES

* **release tag:** After 1.3.0 release, we'll no longer be creating tags in the format 'rime-X.Y.Z'. Downstream packagers please change automated scripts accordingly.



## 1.2.10 (2018-02-21)


### Bug Fixes

* **config_compiler:** linking failure on blocking root node of a dependency resource ([ecf3397](rime/librime@ecf3397))
* table_translator not making sentence if table entry is hidden by charset filter. ([77eb12e](rime/librime@77eb12e))
* **appveyor.install.bat:** switch to a more stable download server for libboost ([bcc4d10](rime/librime@bcc4d10))
* **appveyor.yml:** archive header files ([c8b1e67](rime/librime@c8b1e67))
* **ascii_composer:** support key binding Shift+space in ascii mode ([7077389](rime/librime@7077389))
* **build.bat:** fix build errors with VS2015 build tools ([ec940c6](rime/librime@ec940c6))
* **calculus, recognizer:** memory leak due to unchecked regex error ([19ddc1e](rime/librime@19ddc1e)), closes [#171](rime/librime#171)
* **chord_composer:** allow editor to define BackSpace key behavior ([7f41f65](rime/librime@7f41f65))
* **chord_composer:** letters with modifier keys should not be committed by a following enter key ([aab5eb8](rime/librime@aab5eb8))
* **ci:** call cmake under /usr/local with sudo by passing $PATH environment variable ([a0e6d2f](rime/librime@a0e6d2f))
* **cmake:** fix build break for mingw ([939893c](rime/librime@939893c))
* **config:** auto save modified config data; fixes [#144](rime/librime#144) ([2736f4b](rime/librime@2736f4b))
* **config:** treat "@" as map key rather than list index ([a1df9c5](rime/librime@a1df9c5))
* **config_compiler:** duplicate PendingChild dependencies happen from multiple commands on the same node ([25c28f8](rime/librime@25c28f8))
* **config_compiler:** enforce dependency priorities ([69a6f3e](rime/librime@69a6f3e))
* **config_compiler:** null value should not overwrite a normal key in a merged tree ([4ecae44](rime/librime@4ecae44))
* **config_compiler:** template operator overload had compile error with NDK ([71817a0](rime/librime@71817a0))
* **config/build_info_plugin:** referenced but unavailable resources should also be recorded ([cd46f7a](rime/librime@cd46f7a))
* **ConfigFileUpdate:** should succeed if shared copy does not exist ([8a3e25c](rime/librime@8a3e25c))
* **custom_settings:** fall back to $shared_data_dir/build when loading config ([caf8ebb](rime/librime@caf8ebb))
* **custom_settings:** load built settings from $user_data_dir/build directory ([463dc09](rime/librime@463dc09))
* **deployment_tasks:** symbols.yaml is no longer a build target ([f920e4f](rime/librime@f920e4f))
* **dict_compiler:** prism should load compiled schema ([c2fd0cf](rime/librime@c2fd0cf)), closes [#176](rime/librime#176)
* **key_event:** KeySequence::repr() prefer unescaped punctuation characters ([aa43e5e](rime/librime@aa43e5e))
* **levers:** update deployment tasks for copy-free resource resolution ([1f86413](rime/librime@1f86413))
* **Makefile:** make install-debug; do return error code on mac ([1177142](rime/librime@1177142))
* **rime_api:** use user_config_open() to access user.yaml ([4e4a491](rime/librime@4e4a491))
* **rime_console:** not showing switcher's context ([632cf4b](rime/librime@632cf4b))
* **schema:** create a "schema" component that opens Config by schema_id ([555f990](rime/librime@555f990))
* **simplifier:** fix crash if no opencc file ([091cb9d](rime/librime@091cb9d))
* **simplifier:** tips option for show_in_comment simplifier ([e7bb757](rime/librime@e7bb757))
* **uniquifier:** half of the duplicate candidates remain after dedup [Closes [#114](rime/librime#114)] ([2ab76bc](rime/librime@2ab76bc))


### Features

* **build.bat:** customize build settings via environment variables ([#178](rime/librime#178)) ([1678b75](rime/librime@1678b75))
* **chord_composer:** accept escaped chording keys ([79a32b2](rime/librime@79a32b2))
* **chord_composer:** support chording with function keys ([48424d3](rime/librime@48424d3))
* **config:** add config compiler plugin that includes default:/menu into schema ([b51dda8](rime/librime@b51dda8))
* **config:** best effort resolution for circurlar dependencies ([2e52d54](rime/librime@2e52d54))
* **config:** build config files if source files changed ([0d79712](rime/librime@0d79712))
* **config:** config compiler plugins that port legacy features to the new YAML syntax ([a7d253e](rime/librime@a7d253e))
* **config:** config_builder saves output to $rime_user_dir/build/ ([e596155](rime/librime@e596155))
* **config:** references to optional config resources, ending with "?" ([14ec858](rime/librime@14ec858))
* **config:** save __build_info in compiled config ([45a7337](rime/librime@45a7337))
* **config:** separate out config_builder and user_config components ([9e9493b](rime/librime@9e9493b))
* **config:** support append and merge syntax ([04dcf42](rime/librime@04dcf42))
* **customizer:** disable saving patched config files ([88f5a0c](rime/librime@88f5a0c))
* **detect_modifications:** quick test based on last write time of files ([285fbcc](rime/librime@285fbcc))
* **dict:** no conditional compilation on arm ([85b945f](rime/librime@85b945f))
* **dict:** relocate binary files to $user_data_dir/build ([bc66a47](rime/librime@bc66a47))
* **dict:** use resource resolver to find dictionary files ([8ea08b3](rime/librime@8ea08b3))
* add property notifier ([fa7b5a5](rime/librime@fa7b5a5))
* **resource_resolver:** add class and unit test ([03ee8b4](rime/librime@03ee8b4))
* **resource_resolver:** fallback root path ([02151da](rime/librime@02151da))
* **translator:** add history_translator ([#115](rime/librime#115)) ([ae13354](rime/librime@ae13354))



## 1.2.9 (2014-12-14)

* **rime_api.h:** add `RIME_MODULE_LIST`, `RIME_REGISTER_MODULE_GROUP`.
* **Makefile:** add make targets `thirdparty/*` to build individual libraries.
* **legacy/src/legacy_module.cc:** plugin module `rime-legacy` for GPL code,
  providing component `legacy_userdb` for user dictionary upgrade.
* **src/setup.cc:** define module groups `"default"` and `"deployer"`, to avoid
	naming a list of built-bin modules in `RimeTraits::modules`.
* **test/table_test.cc:** fix random segment faults when run shuffled.
* **thirdparty/src/leveldb:** new dependency LevelDB, replacing Kyoto Cabinet.
* **dict/level_db:** userdb implementation based on LevelDB, replacing treeDb.
* **dict/tree_db:** moved to `legacy/src/`.
* **dict/user_db:** refactored and modularized to ease adding implementations.
* **gear/cjk_minifier:** support CJK Extension E.
* **gear/memory:** save cached phrases as soon as the next composition begins.
* **gear/recognizer:** match space iff set `recognizer/use_space: true`.
* **gear/simplifier:** catch and log OpenCC exceptions when loading.
* **gear/single_char_filter:** bring single character candidates to the front.
* **gear/simplifier:** adapt to OpenCC 1.0 API.
* **thirdparty/src/opencc:** update OpenCC to v1.0.2 (incompatible with v0.4).
* **lever/deployment_tasks:** update and rename task `user_dict_upgrade`.



## 1.2 (2014-07-15)

* **rime_api:** add API functions to access complex structures in config;
  add API to get the raw input and cursor position, or to select a candidate.
* **config:** support references to list elements in key paths.
  eg. `schema_list/@0/schema` is the id of the first schema in schema list.
* **switcher:** enable folding IME options in the switcher menu.
* **dict_compiler:** also detect changes in essay when updating a dictionary;
  support updating prism without the source file of the dictionary.
* **preset_vocabulary:** load `essay.txt` instead of `essay.kct`.
* **reverse_lookup_dictionary:** adopt a new file format with 50% space saving.
* **table:** add support for a new binary format with 20% space saving;
  fix alignment on ARM.
* **ascii_composer:** do not toggle IME states when long pressing `Shift` key;
  support discarding unfinished input when switching to ASCII mode.
* **affix_segmentor:** fix issues with selecting a partial-match candidate.
* **chord_composer:** commit raw input composed with original key strokes.
* **cjk_minifier:** a filter to hide characters in CJK extension set, works
  with `script_translator`.
* **navigator:** do not use `BackSpace` to revert selecting a candidate but to
  edit the input after moving the cursor left or right.
* **punctuator:** support `ascii_punct` option for switching between Chinese and
  Western (ASCII) punctuations.
* **speller:** auto-select candidates by pattern matching against the code;
  fix issues to cooperate with punctuator.
* **CMakeLists.txt:** add options `ENABLE_LOGGING` and `BOOST_USE_CXX11`;
  introduce a new dependency: `libmarisa`.
* **cmake/FindYamlCpp.cmake:** check the availability of the new (v0.5) API.
* **sample:** the directory containing a sample plug-in module.
* **tools/rime_patch.cc:** a command line tool to create patches.
* **thirdparty:** include source code of third-party libraries to ease
  building librime on Windows and Mac.



## 1.1 (2013-12-26)

* **new build dependency:** compiler with C++11 support.
  tested with GCC 4.8.2, Apple LLVM version 5.0, MSVC 12 (2013).
* **encoder:** disable warnings for phrase encode failures in log output;
  limit the number of results in encoding a phrase with multiple solutions.
* **punctuator:** fixed a bug in matching nested "pairs of 'symbols'".
* **speller:** better support for auto-committing, allowing users of table
  based input schema to omit explicitly selecting candidates in many cases.
* **schema_list_translator:** option for static schema list order.
* **table_translator:** fixed the range of CJK-D in charset filter.
@xiaoxiyao
Copy link

现在纠错权重还是太高了,基本没法用

FrancescoCiulla pushed a commit to FrancescoCiulla/trime that referenced this pull request Jul 16, 2022
@iDvel
Copy link

iDvel commented Sep 30, 2022

这个功能在手机+全拼+全键盘上很好用。
但就是打单字的时候很困扰,如果单字时能暂时失效就好了。

@faywong
Copy link

faywong commented May 9, 2024

现在纠错权重还是太高了,基本没法用

+1,请教下各位专业人士。这个纠错的权重如何调整?现在开启后,非准确匹配的候选项跑到了准确匹配候选项的前边去了。这不是预期中的行为。如果能得到大佬们的指导,我愿意参与改进这块。

我的想法:是不是音节数超过 2 个 时才启用纠错功能,同时把以下置信值调低点?

const double kCorrectionCredibility = -4.605170185988091;  // log(0.01)

@faywong
Copy link

faywong commented Jul 8, 2024

你好,我发现当我开启拼写纠错的时候,比如说我要打hun,给候选栏给出了jin这些字的参考。

@nameoverflow 可以再調低些糾錯候選的權重。糾錯最有用的時候是準確匹配的音節或詞組缺失。否則可以讓糾錯結果排在後面一些些,作爲補充。

我做了一項重構,現在權重用log計算了。有空可以調調參數。

非常认可,纠错功能应该是在默认检索不到足够的候选词时启动。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.