diff --git a/README.org b/README.org index 176252f..0c1809d 100644 --- a/README.org +++ b/README.org @@ -41,7 +41,6 @@ Please read and cite my related R Journal papers, if you use this code! #> 4: setosa Petal Width 0.2 nc::capture_melt_multiple(one.iris, part=".*", "[.]", column=".*") #> Species part Length Width - #> #> 1: setosa Petal 1.4 0.2 #> 2: setosa Sepal 5.1 3.5 nc::capture_melt_multiple(one.iris, column=".*", "[.]", dim=".*") @@ -83,7 +82,7 @@ The main functions provided in nc are: strings/files, using data.table =by= syntax. - [[https://cloud.r-project.org/web/packages/nc/vignettes/v3-capture-melt.html][Vignette 3]] discusses =capture_melt_single= and =capture_melt_multiple= which match a regex to the column names of a - wide data frame, then melt the matching columns. These functions are + wide data frame, then melt/reshape the matching columns. These functions are especially useful when more than one separate piece of information can be captured from each column name, e.g. the iris column names =Petal.Width=, =Sepal.Width=, etc each have two pieces of @@ -126,17 +125,15 @@ an older package that provides [[https://cloud.r-project.org/web/packages/namedC | str_match_all_variable | capture_all_str | | df_match_variable | capture_first_df | -For an overview of these functions, see my -[[https://github.com/tdhock/namedCapture-article][R journal paper -about namedCapture]] for a usage explanation, and a detailed -comparison with other R regex packages. The main differences between -the functions in =nc= and =namedCapture= are: +For an overview of these functions, and a detailed comparison with +other R regex packages, see my [[https://github.com/tdhock/namedCapture-article][R journal (2019) paper about +namedCapture]]. The main differences between the functions in =nc= and +=namedCapture= are: - Main =nc= functions all have the =capture_= prefix for easy auto-completion. -- Internally =nc= uses un-named capture groups, whereas =namedCapture= - uses named capture groups. This allows =nc= to support the ICU - engine in addition to PCRE and RE2. - Output in =nc= is always a data.table (=namedCapture= functions output either a character matrix or a data.frame). +- Subject names and the capture group named =name= are not treated + specially (in =namedCapture= they are used for rownames of output). - =nc::capture_first_df= does not prefix subject column names to capture group column names, whereas =namedCapture::df_match_variable= does. @@ -146,31 +143,36 @@ the functions in =nc= and =namedCapture= are: - By default the =nc::capture_first_vec= stops with an error if any subjects do not match, whereas =namedCapture::str_match_variable= returns NA/missing rows. -- Subject names and the capture group named =name= are not treated - specially (in =namedCapture= they are used for rownames of output). - =nc::capture_all_str= only supports capturing multiple matches in a single subject, whereas =namedCapture::str_match_all_named= supports multiple subjects. - For multiple subjects, use =DT[, nc::capture_all_str(subject), by]= + For handling multiple subjects using =nc=, + use =DT[, nc::capture_all_str(subject), by]= (see [[https://cloud.r-project.org/web/packages/nc/vignettes/v2-capture-all.html][vignette 2]] for more info). -There are some new functions in =nc= which are not present in +There are several new functions in =nc= which are not present in =namedCapture=: -- =nc::capture_melt_single= inputs a data.frame, tries to match a - regex to its column names, then melts matching input column names to - a single output column. -- =nc::capture_melt_multiple= inputs a data.frame, tries to - match a regex to its column names, then melts matching input columns - to several output columns of different types. +- =nc::capture_melt_single= and =nc::capture_melt_multiple= use regex + for wide-to-tall data reshaping, see [[https://cloud.r-project.org/web/packages/nc/vignettes/v3-capture-melt.html][Vignette 3]] and my + [[https://journal.r-project.org/archive/2021/RJ-2021-029/index.html][R Journal (2021)]] paper for more info. +- =nc::capture_first_glob= is for reading several regularly named + files into R, see its =help()= page for more info. +- Helper function =nc::measure= can be used to create the + =measure.vars= argument of =data.table::melt=, and + =nc::capture_longer_spec= can be used to create the =spec= argument + of =tidyr::pivot_longer=. See their =help()= pages for more info. - Helper function =nc::field= is provided for defining patterns (with no repetition) that match subjects like variable=value, and create a column/group named variable. See [[https://cloud.r-project.org/web/packages/nc/vignettes/v2-capture-all.html][vignette 2]] for more info. +- Helper function =nc::alternatives_with_shared_groups= is provided + for defining a pattern containing alternatives with shared + groups. See [[https://cloud.r-project.org/web/packages/nc/vignettes/v5-helpers.html][vignette 5]] for more info. The new reshaping functions provide functionality similar to packages tidyr, stats, data.table, reshape, reshape2, cdata, utils, etc. The main difference is that =nc::capture_melt_*= support named capture regular expressions with type conversion, which (1) makes it easier to create/maintain a complex regex, and (2) results in less repetition in -user code. For a detailed comparison see [[https://github.com/tdhock/nc-article][my paper about nc]]. +user code. For a detailed comparison see [[https://github.com/tdhock/nc-article][my R Journal (2021) paper about nc]].