Use code to generate Unicode-LaTeX character mapping table #223

nanxstats · 2024-05-28T03:04:08Z

Fixes #218

This PR creates an internal function in R/utils.R to generate the mapping table into R/unicode_latex.R.

This eliminates the need for using the binary file sysdata.rda and is more friendly for version control.

The new, code-generated data frame is bitwise identical to the version saved in sysdata.rda, except that the int column is of class integer, not numeric.

Data ingestion issue worth following up

You might want to check the data ingestion logic. I found no evidence on how the previous version was constructed. I used some ad hoc logic to get an identical version of the table, but it would be good to check if the data included in the previous version is reasonable, or what specific filters were applied. For example, from the beginning, without using quote = "" in read.table(), it will give:

Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

This will result in only 1740 rows vs. 2757 rows when using quote = "", which avoids the warning.

nanxstats · 2024-05-28T03:08:58Z

@yihui in case you got a minute to review

yihui

First, I'd prefer using a matrix to write the data, which is a little more compact than the data frame.

Second, I wonder if it's worth the effort to make the file R/unicode_latex.R human-readable. If not, we could consider just dump() the data frame in update_unicode_latex().

I don't have a strong opinion on either point. It's fine to merge the current PR as is.

R/utils.R

Co-authored-by: Yihui Xie <xie@yihui.name>

nanxstats · 2024-05-28T05:51:13Z

First, I'd prefer using a matrix to write the data, which is a little more compact than the data frame.

Second, I wonder if it's worth the effort to make the file R/unicode_latex.R human-readable. If not, we could consider just dump() the data frame in update_unicode_latex().

I don't have a strong opinion on either point. It's fine to merge the current PR as is.

Great! Thanks. I've applied the changes and updated the table. The matrix version is exactly what we need to be less tedious. How I hoped there could be a row-wise data frame constructor in base. 😂

Making it human-readable seems to be manageable in this case, so let's just keep it that way.

yihui · 2024-05-28T14:41:27Z

R/utils.R

+  rows <- paste(
+    sprintf(
+      '"%s", "%s", %d',
+      tbl$unicode,
+      gsub("\\", "\\\\", tbl$latex, fixed = TRUE),
+      tbl$int
+    ),
+    sep = ", "
+  )


paste() is unnecessary (commas have been added in sprintf()).

Suggested change

rows <- paste(

sprintf(

'"%s", "%s", %d',

tbl$unicode,

gsub("\\", "\\\\", tbl$latex, fixed = TRUE),

tbl$int

),

sep = ", "

)

rows <- sprintf(

'"%s", "%s", %d',

tbl$unicode,

gsub("\\", "\\\\", tbl$latex, fixed = TRUE),

tbl$int

)

Yes! Patched in another PR: #224

elong0527

LGTM, thanks for improving the transparency the the source data!

nanxstats added 4 commits May 27, 2024 22:29

Add Unicode-LaTeX mapping table updater

b71fe56

Run updater to generate unicode_latex.R

3b17419

Remove old unicode_latex artifacts

7c26cec

Add copyright text and update Authors@R

eef2356

nanxstats requested a review from elong0527 May 28, 2024 03:04

yihui approved these changes May 28, 2024

View reviewed changes

R/utils.R Outdated Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

nanxstats and others added 2 commits May 28, 2024 01:30

Apply suggestions from code review

c78fd09

Co-authored-by: Yihui Xie <xie@yihui.name>

Fix matrix code and run updater

42b7a15

Call springf() only once

a9eb789

yihui approved these changes May 28, 2024

View reviewed changes

elong0527 approved these changes May 28, 2024

View reviewed changes

nanxstats merged commit 1373f0d into master May 28, 2024
8 checks passed

nanxstats deleted the sysdata branch May 28, 2024 20:00

nanxstats mentioned this pull request May 28, 2024

Remove unnecessary paste() call #224

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use code to generate Unicode-LaTeX character mapping table #223

Use code to generate Unicode-LaTeX character mapping table #223

nanxstats commented May 28, 2024

nanxstats commented May 28, 2024

yihui left a comment

nanxstats commented May 28, 2024

yihui May 28, 2024

nanxstats May 28, 2024

elong0527 left a comment

Use code to generate Unicode-LaTeX character mapping table #223

Use code to generate Unicode-LaTeX character mapping table #223

Conversation

nanxstats commented May 28, 2024

Data ingestion issue worth following up

nanxstats commented May 28, 2024

yihui left a comment

Choose a reason for hiding this comment

nanxstats commented May 28, 2024

yihui May 28, 2024

Choose a reason for hiding this comment

nanxstats May 28, 2024

Choose a reason for hiding this comment

elong0527 left a comment

Choose a reason for hiding this comment