details

tdhock · Nov 1, 2023 · f85776e · f85776e
1 parent 7a023b6
commit f85776e
Showing 1 changed file with 23 additions and 21 deletions.
diff --git a/README.org b/README.org
@@ -41,7 +41,6 @@ Please read and cite my related R Journal papers, if you use this code!
   #> 4:  setosa  Petal  Width   0.2
   nc::capture_melt_multiple(one.iris, part=".*", "[.]", column=".*")
   #>    Species   part Length Width
-  #>     <fctr> <char>  <num> <num>
   #> 1:  setosa  Petal    1.4   0.2
   #> 2:  setosa  Sepal    5.1   3.5
   nc::capture_melt_multiple(one.iris, column=".*", "[.]", dim=".*")
@@ -83,7 +82,7 @@ The main functions provided in nc are:
   strings/files, using data.table =by= syntax.
 - [[https://cloud.r-project.org/web/packages/nc/vignettes/v3-capture-melt.html][Vignette 3]] discusses =capture_melt_single= and
   =capture_melt_multiple= which match a regex to the column names of a
-  wide data frame, then melt the matching columns. These functions are
+  wide data frame, then melt/reshape the matching columns. These functions are
   especially useful when more than one separate piece of information
   can be captured from each column name, e.g. the iris column names
   =Petal.Width=, =Sepal.Width=, etc each have two pieces of
@@ -126,17 +125,15 @@ an older package that provides [[https://cloud.r-project.org/web/packages/namedC
 | str_match_all_variable | capture_all_str   |
 | df_match_variable      | capture_first_df  |
 
-For an overview of these functions, see my
-[[https://github.com/tdhock/namedCapture-article][R journal paper
-about namedCapture]] for a usage explanation, and a detailed
-comparison with other R regex packages. The main differences between
-the functions in =nc= and =namedCapture= are:
+For an overview of these functions, and a detailed comparison with
+other R regex packages, see my [[https://github.com/tdhock/namedCapture-article][R journal (2019) paper about
+namedCapture]]. The main differences between the functions in =nc= and
+=namedCapture= are:
 - Main =nc= functions all have the =capture_= prefix for easy auto-completion.
-- Internally =nc= uses un-named capture groups, whereas =namedCapture=
-  uses named capture groups. This allows =nc= to support the ICU
-  engine in addition to PCRE and RE2.
 - Output in =nc= is always a data.table (=namedCapture= functions
   output either a character matrix or a data.frame).
+- Subject names and the capture group named =name= are not treated
+  specially (in =namedCapture= they are used for rownames of output).
 - =nc::capture_first_df= does not prefix subject column names to
   capture group column names, whereas
   =namedCapture::df_match_variable= does.
@@ -146,31 +143,36 @@ the functions in =nc= and =namedCapture= are:
 - By default the =nc::capture_first_vec= stops with an error if any
   subjects do not match, whereas =namedCapture::str_match_variable=
   returns NA/missing rows.
-- Subject names and the capture group named =name= are not treated
-  specially (in =namedCapture= they are used for rownames of output).
 - =nc::capture_all_str= only supports capturing multiple matches in a
   single subject, whereas =namedCapture::str_match_all_named= supports
   multiple subjects. 
-  For multiple subjects, use =DT[, nc::capture_all_str(subject), by]=
+  For handling multiple subjects using =nc=,
+  use =DT[, nc::capture_all_str(subject), by]=
   (see [[https://cloud.r-project.org/web/packages/nc/vignettes/v2-capture-all.html][vignette 2]] for more info).
 
-There are some new functions in =nc= which are not present in
+There are several new functions in =nc= which are not present in
 =namedCapture=:
-- =nc::capture_melt_single= inputs a data.frame, tries to match a
-  regex to its column names, then melts matching input column names to
-  a single output column.
-- =nc::capture_melt_multiple= inputs a data.frame, tries to
-  match a regex to its column names, then melts matching input columns
-  to several output columns of different types.
+- =nc::capture_melt_single= and =nc::capture_melt_multiple= use regex
+  for wide-to-tall data reshaping, see [[https://cloud.r-project.org/web/packages/nc/vignettes/v3-capture-melt.html][Vignette 3]] and my 
+  [[https://journal.r-project.org/archive/2021/RJ-2021-029/index.html][R Journal (2021)]] paper for more info.
+- =nc::capture_first_glob= is for reading several regularly named
+  files into R, see its =help()= page for more info.
+- Helper function =nc::measure= can be used to create the
+  =measure.vars= argument of =data.table::melt=, and
+  =nc::capture_longer_spec= can be used to create the =spec= argument
+  of =tidyr::pivot_longer=. See their =help()= pages for more info.
 - Helper function =nc::field= is provided for defining patterns (with
   no repetition) that match subjects like variable=value, and create a
   column/group named variable. 
   See [[https://cloud.r-project.org/web/packages/nc/vignettes/v2-capture-all.html][vignette 2]] for more info.
+- Helper function =nc::alternatives_with_shared_groups= is provided
+  for defining a pattern containing alternatives with shared
+  groups. See [[https://cloud.r-project.org/web/packages/nc/vignettes/v5-helpers.html][vignette 5]] for more info.
 
 The new reshaping functions provide functionality similar to packages
 tidyr, stats, data.table, reshape, reshape2, cdata, utils, etc. The
 main difference is that =nc::capture_melt_*= support named capture
 regular expressions with type conversion, which (1) makes it easier to
 create/maintain a complex regex, and (2) results in less repetition in
-user code. For a detailed comparison see [[https://github.com/tdhock/nc-article][my paper about nc]].
+user code. For a detailed comparison see [[https://github.com/tdhock/nc-article][my R Journal (2021) paper about nc]].