Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy of pinyin #15

Open
gexijin opened this issue Feb 12, 2020 · 3 comments
Open

Accuracy of pinyin #15

gexijin opened this issue Feb 12, 2020 · 3 comments

Comments

@gexijin
Copy link

gexijin commented Feb 12, 2020

Thanks for developing such an using package!
But I found that 广西 is translated into "anxi",
and "鸟" is translated into "Diao".

library(pinyin)
mypy <- pydic()
py("广西", sep = "", dic = mypy) # 转换

广西
"ānxī"

py("春眠不觉晓,处处闻啼鸟", dic = mypy) # 转换

春眠不觉晓,处处闻啼鸟
"chūn_mián_bú_jiào_xiǎo_,_chǔ_chǔ_wén_tí_diǎo"

I am not sure if it due to my windows Locale:
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] sp_1.3-2 shinyBS_0.61 shiny_1.4.0 maps_3.3.0 chinamap_0.2.0 plotly_4.9.1 lubridate_1.7.4
[8] forecast_8.11 forcats_0.4.0 ggrepel_0.8.1 tidyr_1.0.2 dplyr_0.8.4 nCov2019_0.0.6 pinyin_1.1.7
[15] Hmisc_4.3-1 ggplot2_3.2.1 Formula_1.2-3 survival_3.1-8 lattice_0.20-38

loaded via a namespace (and not attached):
[1] nlme_3.1-142 sf_0.8-1 xts_0.12-0 RColorBrewer_1.1-2 httr_1.4.1
[6] tools_3.6.2 backports_1.1.5 R6_2.4.1 rpart_4.1-15 KernSmooth_2.23-16
[11] DBI_1.1.0 splitstackshape_1.4.8 lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12
[16] withr_2.1.2 tidyselect_1.0.0 gridExtra_2.3 curl_4.3 compiler_3.6.2
[21] htmlTable_1.13.3 labeling_0.3 tseries_0.10-47 scales_1.1.0 checkmate_2.0.0
[26] lmtest_0.9-37 fracdiff_1.5-1 classInt_0.4-2 quadprog_1.5-8 stringr_1.4.0
[31] digest_0.6.23 foreign_0.8-72 base64enc_0.1-3 jpeg_0.1-8.1 pkgconfig_2.0.3
[36] htmltools_0.4.0 fastmap_1.0.1 htmlwidgets_1.5.1 rlang_0.4.4 TTR_0.23-6
[41] rstudioapi_0.11 quantmod_0.4-15 farver_2.0.3 zoo_1.8-7 jsonlite_1.6.1
[46] acepack_1.4.1 magrittr_1.5 Matrix_1.2-18 Rcpp_1.0.3 munsell_0.5.0
[51] lifecycle_0.1.0 stringi_1.4.5 grid_3.6.2 parallel_3.6.2 promises_1.1.0
[56] crayon_1.3.4 splines_3.6.2 knitr_1.28 pillar_1.4.3 urca_1.3-0
[61] glue_1.3.1 latticeExtra_0.6-29 remotes_2.1.0 data.table_1.12.8 png_0.1-7
[66] vctrs_0.2.2 httpuv_1.5.2 gtable_0.3.0 purrr_0.3.3 assertthat_0.2.1
[71] xfun_0.12 mime_0.9 xtable_1.8-4 e1071_1.7-3 later_1.0.0
[76] class_7.3-15 viridisLite_0.3.0 timeDate_3043.102 tibble_2.1.3 units_0.6-5
[81] cluster_2.1.0 ellipsis_0.3.0

@pzhaonet
Copy link
Owner

Thanks for your feedback. It is not due to your Windows locale. It is due to the default dictionary. Please use this:

library(pinyin)
mypy <- pydic(dic = 'pinyin2')
py("广西", sep = "", dic = mypy)
py("春眠不觉晓,处处闻啼鸟", dic = mypy)

@gexijin
Copy link
Author

gexijin commented Feb 14, 2020 via email

@cskemp
Copy link

cskemp commented Dec 8, 2022

I've also found issues with the default dictionary: e.g.

library(pinyin)
mypy <- pydic()
py("个", dic = mypy)


"ɡàn"

py("有", dic=mypy)


"wěi"

I tried using the pinyin2 directory but couldn't get it to support quanpin -- is this possible? Here's what I tried:

library(pinyin)
mypy <- pydic(method = 'quanpin', dic = 'pinyin2')
py("西", dic = mypy)

西
"xi1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants