Skip to content

tdhock/nc-article

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper

Title: Wide-to-tall data reshaping using regular expressions and the nc package.

Published in R Journal.

Abstract: Regular expressions are powerful tools for extracting tables from non-tabular text data. Capturing regular expressions that describe information to extract from column names can be especially useful when reshaping a data table from wide (few rows with many regularly named columns) to tall (fewer columns with more rows). We present the R package nc (short for named capture), which provides functions for wide-to-tall data reshaping using regular expressions. We describe the main new ideas of nc, and provide detailed comparisons with related R packages (stats, utils, data.table, tidyr, tidyfast, tidyfst, reshape2, cdata).

TODOs

compare with tidyfst::longer_dt? should be same as data.table::melt. https://hope-data-science.github.io/tidyfst/articles/example3_reshape.html

8 Nov 2023

figures-iris-dt contains figures to explain melt, for LatinR data.table tutorial.

11 Oct 2020

figure-who-cols-new-data.R runs new timings and figure-who-cols-new.R makes new figure:

figure-who-cols-new.png

5 Oct 2020

figure-who-rows-dt-data.R and figure-iris-rows-dt-data.R compute timings, figure-who-rows-dt.R plots

figure-who-rows-dt.png

figure-who-cols-dt-data.R computes timings, figure-who-cols-dt.R plots

figure-who-cols-dt.png

figure-iris-cols-dt-valgrind.R run under valgrind, no memory problems.

figure-iris-cols-dt-data.R computes timings of new data table methods, figure-iris-cols-dt.R makes

figure-iris-cols-dt.png

17 May 2020

maybe add comparison with tidyfast::dt_pivot_longer?

29 Oct 2019

figure-iris-cols-new.R makes a new figure based on timings computed using updated R packages.

figure-iris-cols-new.png

28 Oct 2019

figure-iris-cols.R makes a figure, based on data computed by figure-iris-cols-data.R, which shows that wide-to-tall data reshaping using either data.table or nc packages is much faster than other packages (cdata, stats, tidyr). This experiment uses inputs with a fixed number of rows, and a variable number of input reshape columns. Each function in the experiment outputs a table with multiple (2) reshape columns. It shows that the quadratic time complexity of cdata, stats, tidyr results in significant slowdowns when there are at least 10,000 input reshape columns.

figure-iris-cols.png

In contrast everything below appears to be linear in the number of input columns when the output has only a single reshape column:

figure-who-cols-minimal.png

source: figure, timings.

Note that stats::reshape is missing in the second plot here, but the result for a smaller N.col size can be seen here https://github.com/tdhock/nc-article/blob/master/figure-who-cols.png

25 Oct 2019

figure-who-both-rows.R makes

figure-who-both-rows.png

24 Oct 2019

figure-who-complex-rows.R makes

figure-who-complex-rows.png

23 Oct 2019

figure-who-rows.R makes

figure-who-rows.png

figure-who-cols.R makes

figure-who-cols.png

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published