Skip to content

Releases: easystats/datawizard

datawizard 1.0.1

07 Mar 10:19
5b7f717
Compare
Choose a tag to compare

BUG FIXES

  • Fixed issue in data_arrange() for data frames that only had one column.
    Formerly, the data frame was coerced into a vector, now the data frame class
    is preserved.

  • Fixed issue in R-devel (4.5.0) due to a change in how grep() handles logical
    arguments with missing values (#588).

datawizard 1.0.0

10 Jan 10:05
Compare
Choose a tag to compare

BREAKING CHANGES AND DEPRECATIONS

  • datawizard now requires R >= 4.0 (#515).

  • Argument drop_na in data_match() is deprecated now. Please use
    remove_na instead (#556).

  • In data_rename() (#567):

    • argument pattern is deprecated. Use select instead.
    • argument safe is deprecated. The function now errors when select
      contains unknown column names.
    • when replacement is NULL, an error is now thrown (previously, column
      indices were used as new names).
    • if select (previously pattern) is a named vector, then all elements
      must be named, e.g. c(length = "Sepal.Length", "Sepal.Width") errors.
  • Order of arguments by and probability_weights in rescale_weights() has
    changed, because for method = "kish", the by argument is optional (#575).

  • The name of the rescaled weights variables in rescale_weights() have been
    renamed. pweights_a and pweights_b are now named rescaled_weights_a
    and rescaled_weights_b (#575).

  • print() methods for data_tabulate() with multiple sub-tables (i.e. when
    length of by was > 1) were revised. Now, an integrated table instead of
    multiple tables is returned. Furthermore, print_html() did not work, which
    was also fixed now (#577).

  • demean() (and degroup()) gets an append argument that defaults to TRUE,
    to append the centered variables to the original data frame, instead of
    returning the de- and group-meaned variables only. Use append = FALSE to
    for the previous default behaviour (i.e. only returning the newly created
    variables) (#579).

CHANGES

  • rescale_weights() gets a method argument, to choose method to rescale
    weights. Options are "carle" (the default) and "kish" (#575).

  • The select argument, which is available in different functions to select
    variables, can now also be a character vector with quoted variable names,
    including a colon to indicate a range of several variables (e.g. "cyl:gear")
    (#551).

  • New function row_sums(), to calculate row sums (optionally with minimum
    amount of valid values), as complement to row_means() (#552).

  • New function row_count(), to count specific values row-wise (#553).

  • data_read() no longer shows warning about forthcoming breaking changes
    in upstream packages when reading .RData files (#557).

  • data_modify() now recognizes n(), for example to create an index for data
    groups with 1:n() (#535).

  • The replacement argument in data_rename() now supports glue-styled
    tokens (#563).

  • data_summary() also accepts the results of bayestestR::ci() as summary
    function (#483).

  • ranktransform() has a new argument zeros to determine how zeros should be
    handled when sign = TRUE (#573).

BUG FIXES

  • describe_distribution() no longer errors if the sample was too sparse to compute
    CIs. Instead, it warns the user and returns NA (#550).

  • data_read() preserves variable types when importing files from rds or
    rdata format (#558).

datawizard 0.13.0

06 Oct 10:46
Compare
Choose a tag to compare

BREAKING CHANGES

  • data_rename() now errors when the replacement argument contains NA values
    or empty strings (#539).

  • Removed deprecated functions get_columns(), data_find(), format_text() (#546).

  • Removed deprecated arguments group and na.rm in multiple functions. Use by and remove_na instead (#546).

  • The default value for the argument dummy_factors in to_numeric() has
    changed from TRUE to FALSE (#544).

CHANGES

  • The pattern argument in data_rename() can also be a named vector. In this
    case, names are used as values for the replacement argument (i.e. pattern
    can be a character vector using <new name> = "<old name>").

  • categorize() gains a new breaks argument, to decide whether breaks are
    inclusive or exclusive (#548).

  • The labels argument in categorize() gets two new options, "range" and
    "observed", to use the range of categorized values as labels (i.e. factor
    levels) (#548).

  • Minor additions to reshape_ci() to work with forthcoming changes in the
    {bayestestR} package.

datawizard 0.12.3

02 Sep 12:25
Compare
Choose a tag to compare

CHANGES

  • demean() (and degroup()) now also work for nested designs, if argument
    nested = TRUE and by specifies more than one variable (#533).

  • Vignettes are no longer provided in the package, they are now only available
    on the website. There is only one "Overview" vignette available in the package,
    it contains links to the other vignettes on the website. This is because there
    are CRAN errors occurring when building vignettes on macOS and we couldn't
    determine the cause after multiple patch releases (#534).

datawizard 0.12.2

21 Jul 07:50
389738d
Compare
Choose a tag to compare
  • Remove htmltools from Suggests in an attempt of fixing an error in CRAN
    checks due to failures to build a vignette (#528).

datawizard 0.12.0

11 Jul 12:30
Compare
Choose a tag to compare

BREAKING CHANGES

  • The argument include_na in data_tabulate() and data_summary() has been
    renamed into remove_na. Consequently, to mimic former behaviour, FALSE and
    TRUE need to be switched (i.e. remove_na = TRUE is equivalent to the former
    include_na = FALSE).

  • Class names for objects returned by data_tabulate() have been changed to
    datawizard_table and datawizard_crosstable (resp. the plural forms,
    *_tables), to provide a clearer and more consistent naming scheme.

CHANGES

  • data_select() can directly rename selected variables when a named vector
    is provided in select, e.g. data_select(mtcars, c(new1 = "mpg", new2 = "cyl")).

  • data_tabulate() gains an as.data.frame() method, to return the frequency
    table as a data frame. The structure of the returned object is a nested data
    frame, where the first column contains name of the variable for which
    frequencies were calculated, and the second column contains the frequency table.

  • demean() (and degroup()) now also work for cross-classified designs, or
    more generally, for data with multiple grouping or cluster variables (i.e.
    by can now specify more than one variable).

datawizard 0.11.0

05 Jun 19:41
Compare
Choose a tag to compare

BREAKING CHANGES

  • Arguments named group or group_by are deprecated and will be removed
    in a future release. Please use by instead. This affects the following
    functions in datawizard (#502).

    • data_partition()
    • demean() and degroup()
    • means_by_group()
    • rescale_weights()
  • Following aliases are deprecated and will be removed in a future release (#504):

    • get_columns(), use data_select() instead.
    • data_find() and find_columns(), use extract_column_names() instead.
    • format_text(), use text_format() instead.

CHANGES

  • recode_into() is more relaxed regarding checking the type of NA values.
    If you recode into a numeric variable, and one of the recode values is NA,
    you no longer need to use NA_real_ for numeric NA values.

  • Improved documentation for some functions.

BUG FIXES

  • data_to_long() did not work for data frame where columns had attributes
    (like labelled data).

datawizard 0.10.0

26 Mar 14:28
bf51817
Compare
Choose a tag to compare

BREAKING CHANGES

  • The following arguments were deprecated in 0.5.0 and are now removed:

    • in data_to_wide(): colnames_from, rows_from, sep
    • in data_to_long(): colnames_to
    • in data_partition(): training_proportion

NEW FUNCTIONS

  • data_summary(), to compute summary statistics of (grouped) data frames.

  • data_replicate(), to expand a data frame by replicating rows based on another
    variable that contains the counts of replications per row.

CHANGES

  • data_modify() gets three new arguments, .at, .if and .modify, to modify
    variables at specific positions or based on logical conditions.

  • data_tabulate() was revised and gets several new arguments: a weights
    argument, to compute weighted frequency tables. include_na allows to include
    or omit missing values from the table. Furthermore, a by argument was added,
    to compute crosstables (#479, #481).

0.9.1

21 Dec 17:19
Compare
Choose a tag to compare

datawizard 0.9.1

CHANGES

  • rescale() gains multiply and add arguments, to expand ranges by a given
    factor or value.

  • to_factor() and to_numeric() now support class haven_labelled.

BUG FIXES

  • to_numeric() now correctly deals with inversed factor levels when
    preserve_levels = TRUE.

  • to_numeric() inversed order of value labels when dummy_factors = FALSE.

  • convert_to_na() now preserves attributes for factors when drop_levels = TRUE.

datawizard 0.9.0

15 Sep 10:46
Compare
Choose a tag to compare

NEW FUNCTIONS

  • row_means(), to compute row means, optionally only for the rows with at
    least min_valid non-missing values.

  • contr.deviation() for sum-deviation contrast coding of factors.

  • means_by_group(), to compute mean values of variables, grouped by levels
    of specified factors.

  • data_seek(), to seek for variables in a data frame, based on their
    column names, variables labels, value labels or factor levels. Searching for
    labels only works for "labelled" data, i.e. when variables have a label or
    labels attribute.

CHANGES

  • recode_into() gains an overwrite argument to skip overwriting already
    recoded cases when multiple recode patterns apply to the same case.

  • recode_into() gains an preserve_na argument to preserve NA values
    when recoding.

  • data_read() now passes the encoding argument to data.table::fread().
    This allows to read files with non-ASCII characters.

  • datawizard moves from the GPL-3 license to the MIT license.

  • unnormalize() and unstandardize() now work with grouped data (#415).

  • unnormalize() now errors instead of emitting a warning if it doesn't have the
    necessary info (#415).

BUG FIXES

  • Fixed issue in labels_to_levels() when values of labels were not in sorted
    order and values were not sequentially numbered.

  • Fixed issues in data_write() when writing labelled data into SPSS format
    and vectors were of different type as value labels.

  • Fixed issue in recode_into() with probably wrong case number printed in the
    warning when several recode patterns match to one case.

  • Fixed issue in recode_into() when original data contained NA values and
    NA was not included in the recode pattern.

  • Fixed issue in data_filter() where functions containing a = (e.g. when
    naming arguments, like grepl(pattern, x = a)) were mistakenly seen as
    faulty syntax.

  • Fixed issue in empty_column() for strings with invalid multibyte strings.
    For such data frames or files, empty_column() or data_read() no longer
    fails.