From f2fb8fffae91c0a909fe219b7948f3dd0e73db83 Mon Sep 17 00:00:00 2001 From: Josh Soref <2119212+jsoref@users.noreply.github.com> Date: Thu, 7 Dec 2023 12:18:30 -0500 Subject: [PATCH] GH-38928: [R] Fix spelling (#38929) ### Rationale for this change ### What changes are included in this PR? Spelling fixes to r/ ### Are these changes tested? ### Are there any user-facing changes? * Closes: #38928 Authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com> Signed-off-by: Jacob Wujciak-Jens --- r/NEWS.md | 10 +++++----- r/R/arrow-object.R | 2 +- r/R/arrow-package.R | 2 +- r/R/compression.R | 2 +- r/R/config.R | 2 +- r/R/csv.R | 2 +- r/R/dataset.R | 2 +- r/R/dplyr-count.R | 2 +- r/R/dplyr-filter.R | 10 +++++----- r/R/dplyr-funcs-augmented.R | 2 +- r/R/dplyr-funcs-conditional.R | 2 +- r/R/dplyr-funcs-datetime.R | 4 ++-- r/R/dplyr-funcs-string.R | 2 +- r/R/dplyr-funcs-type.R | 4 ++-- r/R/duckdb.R | 2 +- r/R/extension.R | 6 +++--- r/R/feather.R | 4 ++-- r/R/filesystem.R | 2 +- r/R/parquet.R | 2 +- r/R/udf.R | 2 +- r/configure | 2 +- r/man/ExtensionType.Rd | 2 +- r/man/FileSystem.Rd | 2 +- r/man/add_filename.Rd | 2 +- r/man/codec_is_available.Rd | 2 +- r/man/io_thread_count.Rd | 2 +- r/man/new_extension_type.Rd | 2 +- r/man/open_dataset.Rd | 2 +- r/man/read_delim_arrow.Rd | 2 +- r/man/write_feather.Rd | 2 +- r/man/write_parquet.Rd | 2 +- r/src/altrep.cpp | 2 +- r/src/safe-call-into-r.h | 6 +++--- r/tests/testthat/helper-arrow.R | 2 +- r/tests/testthat/helper-skip.R | 4 ++-- r/tests/testthat/test-Array.R | 6 +++--- r/tests/testthat/test-backwards-compatibility.R | 2 +- r/tests/testthat/test-dataset-write.R | 4 ++-- r/tests/testthat/test-dplyr-funcs-datetime.R | 12 ++++++------ r/tests/testthat/test-dplyr-summarize.R | 6 +++--- r/tests/testthat/test-extension.R | 4 ++-- r/tools/nixlibs.R | 4 ++-- r/tools/update-checksums.R | 2 +- r/vignettes/arrow.Rmd | 2 +- r/vignettes/data_objects.Rmd | 2 +- r/vignettes/data_types.Rmd | 2 +- r/vignettes/data_wrangling.Rmd | 2 +- r/vignettes/developers/setup.Rmd | 6 +++--- r/vignettes/fs.Rmd | 4 ++-- r/vignettes/install.Rmd | 6 +++--- r/vignettes/read_write.Rmd | 2 +- 51 files changed, 84 insertions(+), 84 deletions(-) diff --git a/r/NEWS.md b/r/NEWS.md index 8c8852e9c86b9..8515facdff871 100644 --- a/r/NEWS.md +++ b/r/NEWS.md @@ -80,10 +80,10 @@ ## Installation -* MacOS builds now use the same installation pathway as on Linux (@assignUser, +* macOS builds now use the same installation pathway as on Linux (@assignUser, #37684). * A warning message is now issued on package load when running under emulation - on MacOS (i.e., use of x86 installation of R on M1/aarch64; #37777). + on macOS (i.e., use of x86 installation of R on M1/aarch64; #37777). * R scripts that run during configuration and installation are now run using the correct R interpreter (@meztez, #37225). * Failed libarrow builds now return more detailed output (@amoeba, #37727). @@ -416,7 +416,7 @@ As of version 10.0.0, `arrow` requires C++17 to build. This means that: * The `arrow.dev_repo` for nightly builds of the R package and prebuilt libarrow binaries is now . -* Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with Windows binaries. (#13484) +* Brotli and BZ2 are shipped with macOS binaries. BZ2 is shipped with Windows binaries. (#13484) # arrow 8.0.0 @@ -549,7 +549,7 @@ Arrow arrays and tables can be easily concatenated: ## Other improvements and fixes * Many of the vignettes have been reorganized, restructured and expanded to improve their usefulness and clarity. -* Code to generate schemas (and individual data type specficiations) are accessible with the `$code()` method on a `schema` or `type`. This allows you to easily get the code needed to create a schema from an object that already has one. +* Code to generate schemas (and individual data type specifications) are accessible with the `$code()` method on a `schema` or `type`. This allows you to easily get the code needed to create a schema from an object that already has one. * Arrow `Duration` type has been mapped to R's `difftime` class. * The `decimal256()` type is supported. The `decimal()` function has been revised to call either `decimal256()` or `decimal128()` based on the value of the `precision` argument. * `write_parquet()` uses a reasonable guess at `chunk_size` instead of always writing a single chunk. This improves the speed of reading and writing large Parquet files. @@ -824,7 +824,7 @@ to send and receive data. See `vignette("flight", package = "arrow")` for an ove * `arrow` now depends on [`cpp11`](https://cpp11.r-lib.org/), which brings more robust UTF-8 handling and faster compilation * The Linux build script now succeeds on older versions of R -* MacOS binary packages now ship with zstandard compression enabled +* macOS binary packages now ship with zstandard compression enabled ## Bug fixes and other enhancements diff --git a/r/R/arrow-object.R b/r/R/arrow-object.R index 5c2cf4691fc9c..b66c39dce957e 100644 --- a/r/R/arrow-object.R +++ b/r/R/arrow-object.R @@ -56,7 +56,7 @@ ArrowObject <- R6Class("ArrowObject", # Return NULL, because keeping this R6 object in scope is not a good idea. # This syntax would allow the rare use that has to actually do this to # do `object <- object$.unsafe_delete()` and reduce the chance that an - # IDE like RStudio will try try to call other methods which will error + # IDE like RStudio will try to call other methods which will error invisible(NULL) } ) diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R index 1f39a50744abc..54e237192e080 100644 --- a/r/R/arrow-package.R +++ b/r/R/arrow-package.R @@ -183,7 +183,7 @@ configure_tzdb <- function() { # Just to be extra safe, let's wrap this in a try(); # we don't want a failed startup message to prevent the package from loading try({ - # On MacOS only, Check if we are running in under emulation, and warn this will not work + # On macOS only, Check if we are running in under emulation, and warn this will not work if (on_rosetta()) { packageStartupMessage( paste( diff --git a/r/R/compression.R b/r/R/compression.R index 8d28fbefd7b3d..3fe00a756987c 100644 --- a/r/R/compression.R +++ b/r/R/compression.R @@ -61,7 +61,7 @@ Codec$create <- function(type = "gzip", compression_level = NA) { #' the Arrow C++ library. This function lets you know which are available for #' use. #' @param type A string, one of "uncompressed", "snappy", "gzip", "brotli", -#' "zstd", "lz4", "lzo", or "bz2", case insensitive. +#' "zstd", "lz4", "lzo", or "bz2", case-insensitive. #' @return Logical: is `type` available? #' @export #' @examples diff --git a/r/R/config.R b/r/R/config.R index bd00afe1be631..941d74e59a90d 100644 --- a/r/R/config.R +++ b/r/R/config.R @@ -40,7 +40,7 @@ io_thread_count <- function() { #' @rdname io_thread_count #' @param num_threads integer: New number of threads for thread pool. At least -#' two threads are reccomended to support all operations in the arrow +#' two threads are recommended to support all operations in the arrow #' package. #' @export set_io_thread_count <- function(num_threads) { diff --git a/r/R/csv.R b/r/R/csv.R index a024c4531e748..03540006ca0a2 100644 --- a/r/R/csv.R +++ b/r/R/csv.R @@ -76,7 +76,7 @@ #' #' Note that if you are specifying column names, whether by `schema` or #' `col_names`, and the CSV file has a header row that would otherwise be used -#' to idenfity column names, you'll need to add `skip = 1` to skip that row. +#' to identify column names, you'll need to add `skip = 1` to skip that row. #' #' @param file A character file name or URI, literal data (either a single string or a [raw] vector), #' an Arrow input stream, or a `FileSystem` with path (`SubTreeFileSystem`). diff --git a/r/R/dataset.R b/r/R/dataset.R index 682f6c1481b4f..08189f1b290a2 100644 --- a/r/R/dataset.R +++ b/r/R/dataset.R @@ -46,7 +46,7 @@ #' #' The default behavior in `open_dataset()` is to inspect the file paths #' contained in the provided directory, and if they look like Hive-style, parse -#' them as Hive. If your dataset has Hive-style partioning in the file paths, +#' them as Hive. If your dataset has Hive-style partitioning in the file paths, #' you do not need to provide anything in the `partitioning` argument to #' `open_dataset()` to use them. If you do provide a character vector of #' partition column names, they will be ignored if they match what is detected, diff --git a/r/R/dplyr-count.R b/r/R/dplyr-count.R index ee713030b262e..df585a6cf0111 100644 --- a/r/R/dplyr-count.R +++ b/r/R/dplyr-count.R @@ -56,7 +56,7 @@ tally.arrow_dplyr_query <- function(x, wt = NULL, sort = FALSE, name = NULL) { tally.Dataset <- tally.ArrowTabular <- tally.RecordBatchReader <- tally.arrow_dplyr_query -# we don't want to depend on dplyr, but we refrence these above +# we don't want to depend on dplyr, but we reference these above utils::globalVariables(c("n", "desc")) check_n_name <- function(name, diff --git a/r/R/dplyr-filter.R b/r/R/dplyr-filter.R index c14c67e70168c..d85fa16af2e71 100644 --- a/r/R/dplyr-filter.R +++ b/r/R/dplyr-filter.R @@ -28,20 +28,20 @@ filter.arrow_dplyr_query <- function(.data, ..., .by = NULL, .preserve = FALSE) out$group_by_vars <- by$names } - filts <- expand_across(out, quos(...)) - if (length(filts) == 0) { + expanded_filters <- expand_across(out, quos(...)) + if (length(expanded_filters) == 0) { # Nothing to do return(as_adq(.data)) } # tidy-eval the filter expressions inside an Arrow data_mask - filters <- lapply(filts, arrow_eval, arrow_mask(out)) + filters <- lapply(expanded_filters, arrow_eval, arrow_mask(out)) bad_filters <- map_lgl(filters, ~ inherits(., "try-error")) if (any(bad_filters)) { # This is similar to abandon_ship() except that the filter eval is # vectorized, and we apply filters that _did_ work before abandoning ship # with the rest - expr_labs <- map_chr(filts[bad_filters], format_expr) + expr_labs <- map_chr(expanded_filters[bad_filters], format_expr) if (query_on_dataset(out)) { # Abort. We don't want to auto-collect if this is a Dataset because that # could blow up, too big. @@ -71,7 +71,7 @@ filter.arrow_dplyr_query <- function(.data, ..., .by = NULL, .preserve = FALSE) if (by$from_by) { out <- dplyr::ungroup(out) } - return(dplyr::filter(out, !!!filts[bad_filters], .by = {{ .by }})) + return(dplyr::filter(out, !!!expanded_filters[bad_filters], .by = {{ .by }})) } } diff --git a/r/R/dplyr-funcs-augmented.R b/r/R/dplyr-funcs-augmented.R index 116248d2dd92a..dca5ca16fa437 100644 --- a/r/R/dplyr-funcs-augmented.R +++ b/r/R/dplyr-funcs-augmented.R @@ -18,7 +18,7 @@ #' Add the data filename as a column #' #' This function only exists inside `arrow` `dplyr` queries, and it only is -#' valid when quering on a `FileSystemDataset`. +#' valid when querying on a `FileSystemDataset`. #' #' To use filenames generated by this function in subsequent pipeline steps, you #' must either call \code{\link[dplyr:compute]{compute()}} or diff --git a/r/R/dplyr-funcs-conditional.R b/r/R/dplyr-funcs-conditional.R index cd0245eeee182..b9639f00295ce 100644 --- a/r/R/dplyr-funcs-conditional.R +++ b/r/R/dplyr-funcs-conditional.R @@ -55,7 +55,7 @@ register_bindings_conditional <- function() { } if (last_arg && arg$type_id() %in% TYPES_WITH_NAN) { - # store the NA_real_ in the same type as arg to avoid avoid casting + # store the NA_real_ in the same type as arg to avoid casting # smaller float types to larger float types NA_expr <- Expression$scalar(Scalar$create(NA_real_, type = arg$type())) Expression$create("if_else", Expression$create("is_nan", arg), NA_expr, arg) diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R index 5b6e16d376554..440210afd630c 100644 --- a/r/R/dplyr-funcs-datetime.R +++ b/r/R/dplyr-funcs-datetime.R @@ -459,7 +459,7 @@ register_bindings_datetime_timezone <- function() { roll_dst[1], "error" = 0L, "boundary" = 2L, - arrow_not_supported("`roll_dst` value must be 'error' or 'boundary' for non-existent times; other values") + arrow_not_supported("`roll_dst` value must be 'error' or 'boundary' for nonexistent times; other values") ) ambiguous <- switch( @@ -467,7 +467,7 @@ register_bindings_datetime_timezone <- function() { "error" = 0L, "pre" = 1L, "post" = 2L, - arrow_not_supported("`roll_dst` value must be 'error', 'pre', or 'post' for non-existent times") + arrow_not_supported("`roll_dst` value must be 'error', 'pre', or 'post' for nonexistent times") ) if (identical(tzone, "")) { diff --git a/r/R/dplyr-funcs-string.R b/r/R/dplyr-funcs-string.R index 3cd8f94476e5e..9f3220e557f08 100644 --- a/r/R/dplyr-funcs-string.R +++ b/r/R/dplyr-funcs-string.R @@ -516,7 +516,7 @@ register_bindings_string_other <- function() { msg = "`stop` must be length 1 - other lengths are not supported in Arrow" ) - # substr treats values as if they're on a continous number line, so values + # substr treats values as if they're on a continuous number line, so values # 0 are effectively blank characters - set `start` to 1 here so Arrow mimics # this behavior if (start <= 0) { diff --git a/r/R/dplyr-funcs-type.R b/r/R/dplyr-funcs-type.R index 0bd340d4be2dd..f244682737cb4 100644 --- a/r/R/dplyr-funcs-type.R +++ b/r/R/dplyr-funcs-type.R @@ -158,8 +158,8 @@ register_bindings_type_cast <- function() { if (identical(fix.empty.names, TRUE)) { names(args) <- make.names(names(args), unique = TRUE) } else { - name_emtpy <- names(args) == "" - names(args)[!name_emtpy] <- make.names(names(args)[!name_emtpy], unique = TRUE) + name_empty <- names(args) == "" + names(args)[!name_empty] <- make.names(names(args)[!name_empty], unique = TRUE) } } diff --git a/r/R/duckdb.R b/r/R/duckdb.R index bf3a57daf2f1e..9632e9bad1984 100644 --- a/r/R/duckdb.R +++ b/r/R/duckdb.R @@ -89,7 +89,7 @@ arrow_duck_connection <- function() { # but if we don't explicitly run dbDisconnect() the user gets a warning # that they may not expect (since they did not open a duckdb connection). # This bit of code will run when the package namespace is cleaned up (i.e., - # at exit). This is more reliable than .onUnload() or .onDetatch(), which + # at exit). This is more reliable than .onUnload() or .onDetach(), which # don't necessarily run on exit. reg.finalizer(arrow_duck_finalizer, function(...) { con <- getOption("arrow_duck_con") diff --git a/r/R/extension.R b/r/R/extension.R index 4419c8ba01642..59a02121fd18c 100644 --- a/r/R/extension.R +++ b/r/R/extension.R @@ -83,7 +83,7 @@ ExtensionArray$create <- function(x, type) { #' - `$WrapArray(array)`: Wraps a storage [Array] into an [ExtensionArray] #' with this extension type. #' -#' In addition, subclasses may override the following methos to customize +#' In addition, subclasses may override the following methods to customize #' the behaviour of extension classes. #' #' - `$deserialize_instance()`: This method is called when a new [ExtensionType] @@ -184,7 +184,7 @@ ExtensionType <- R6Class("ExtensionType", }, ToString = function() { # metadata is probably valid UTF-8 (e.g., JSON), but might not be - # and it's confusing to error when printing the object. This herustic + # and it's confusing to error when printing the object. This heuristic # isn't perfect (but subclasses should override this method anyway) metadata_raw <- self$extension_metadata() @@ -286,7 +286,7 @@ ExtensionType$create <- function(storage_type, #' "dot" syntax (i.e., "some_package.some_type"). The namespace "arrow" #' is reserved for extension types defined by the Apache Arrow libraries. #' @param extension_metadata A [raw()] or [character()] vector containing the -#' serialized version of the type. Chatacter vectors must be length 1 and +#' serialized version of the type. Character vectors must be length 1 and #' are converted to UTF-8 before converting to [raw()]. #' @param type_class An [R6::R6Class] whose `$new()` class method will be #' used to construct a new instance of the type. diff --git a/r/R/feather.R b/r/R/feather.R index 3e390018c825f..474fc6118e44f 100644 --- a/r/R/feather.R +++ b/r/R/feather.R @@ -24,7 +24,7 @@ #' a legacy version available starting in 2016, and the Version 2 (V2), #' which is the Apache Arrow IPC file format. #' The default version is V2. -#' V1 files are distinct from Arrow IPC files and lack many feathures, +#' V1 files are distinct from Arrow IPC files and lack many features, #' such as the ability to store all Arrow data tyeps, and compression support. #' [write_ipc_file()] can only write V2 files. #' @@ -91,7 +91,7 @@ write_feather <- function(x, } } if (is.null(compression_level)) { - # Use -1 as sentinal for "default" + # Use -1 as sentinel for "default" compression_level <- -1L } compression_level <- as.integer(compression_level) diff --git a/r/R/filesystem.R b/r/R/filesystem.R index e0f370ad601b3..c6f92cba1932c 100644 --- a/r/R/filesystem.R +++ b/r/R/filesystem.R @@ -156,7 +156,7 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F #' buckets if `$CreateDir()` is called on the bucket level (default `FALSE`). #' - `allow_bucket_deletion`: logical, if TRUE, the filesystem will delete #' buckets if`$DeleteDir()` is called on the bucket level (default `FALSE`). -#' - `request_timeout`: Socket read time on Windows and MacOS in seconds. If +#' - `request_timeout`: Socket read time on Windows and macOS in seconds. If #' negative, the AWS SDK default (typically 3 seconds). #' - `connect_timeout`: Socket connection timeout in seconds. If negative, AWS #' SDK default is used (typically 1 second). diff --git a/r/R/parquet.R b/r/R/parquet.R index 74f51767a29c4..d92e913cb5db3 100644 --- a/r/R/parquet.R +++ b/r/R/parquet.R @@ -128,7 +128,7 @@ read_parquet <- function(file, #' - A named vector, to specify the value for the named columns, the default #' value for the setting is used when not supplied #' -#' The `compression` argument can be any of the following (case insensitive): +#' The `compression` argument can be any of the following (case-insensitive): #' "uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2". #' Only "uncompressed" is guaranteed to be available, but "snappy" and "gzip" #' are almost always included. See [codec_is_available()]. diff --git a/r/R/udf.R b/r/R/udf.R index fe08f02812fd9..922095cceba6a 100644 --- a/r/R/udf.R +++ b/r/R/udf.R @@ -154,7 +154,7 @@ arrow_scalar_function <- function(fun, in_type, out_type, auto_convert = FALSE) sprintf( paste0( "Expected `fun` to accept %d argument(s)\n", - "but found a function that acccepts %d argument(s)\n", + "but found a function that accepts %d argument(s)\n", "Did you forget to include `context` as the first argument?" ), expected_n_args, diff --git a/r/configure b/r/configure index 96238f0b9a37e..029fc004dfc4c 100755 --- a/r/configure +++ b/r/configure @@ -62,7 +62,7 @@ PKG_CONFIG_NAME="arrow" PKG_BREW_NAME="apache-arrow" PKG_TEST_HEADER="" -# Some env vars that control the build (all logical, case insensitive) +# Some env vars that control the build (all logical, case-insensitive) # Development mode, also increases verbosity in the bundled build ARROW_R_DEV=`echo $ARROW_R_DEV | tr '[:upper:]' '[:lower:]'` # The bundled build compiles arrow C++ from source; FORCE ensures we don't pick up diff --git a/r/man/ExtensionType.Rd b/r/man/ExtensionType.Rd index 032a4a76bf80b..aef4d01d7539e 100644 --- a/r/man/ExtensionType.Rd +++ b/r/man/ExtensionType.Rd @@ -26,7 +26,7 @@ extension metadata as a UTF-8 encoded string. with this extension type. } -In addition, subclasses may override the following methos to customize +In addition, subclasses may override the following methods to customize the behaviour of extension classes. \itemize{ \item \verb{$deserialize_instance()}: This method is called when a new \link{ExtensionType} diff --git a/r/man/FileSystem.Rd b/r/man/FileSystem.Rd index b71d95f423ee3..dbf89ef1387ac 100644 --- a/r/man/FileSystem.Rd +++ b/r/man/FileSystem.Rd @@ -57,7 +57,7 @@ in the background, without blocking (default \code{TRUE}) buckets if \verb{$CreateDir()} is called on the bucket level (default \code{FALSE}). \item \code{allow_bucket_deletion}: logical, if TRUE, the filesystem will delete buckets if\verb{$DeleteDir()} is called on the bucket level (default \code{FALSE}). -\item \code{request_timeout}: Socket read time on Windows and MacOS in seconds. If +\item \code{request_timeout}: Socket read time on Windows and macOS in seconds. If negative, the AWS SDK default (typically 3 seconds). \item \code{connect_timeout}: Socket connection timeout in seconds. If negative, AWS SDK default is used (typically 1 second). diff --git a/r/man/add_filename.Rd b/r/man/add_filename.Rd index 93718435a2042..1fe10ea4f8f26 100644 --- a/r/man/add_filename.Rd +++ b/r/man/add_filename.Rd @@ -12,7 +12,7 @@ augmented column. } \description{ This function only exists inside \code{arrow} \code{dplyr} queries, and it only is -valid when quering on a \code{FileSystemDataset}. +valid when querying on a \code{FileSystemDataset}. } \details{ To use filenames generated by this function in subsequent pipeline steps, you diff --git a/r/man/codec_is_available.Rd b/r/man/codec_is_available.Rd index 5cda813f41673..e79b5724b8b17 100644 --- a/r/man/codec_is_available.Rd +++ b/r/man/codec_is_available.Rd @@ -8,7 +8,7 @@ codec_is_available(type) } \arguments{ \item{type}{A string, one of "uncompressed", "snappy", "gzip", "brotli", -"zstd", "lz4", "lzo", or "bz2", case insensitive.} +"zstd", "lz4", "lzo", or "bz2", case-insensitive.} } \value{ Logical: is \code{type} available? diff --git a/r/man/io_thread_count.Rd b/r/man/io_thread_count.Rd index 6cd44e1f6ea94..ae9297bb57761 100644 --- a/r/man/io_thread_count.Rd +++ b/r/man/io_thread_count.Rd @@ -11,7 +11,7 @@ set_io_thread_count(num_threads) } \arguments{ \item{num_threads}{integer: New number of threads for thread pool. At least -two threads are reccomended to support all operations in the arrow +two threads are recommended to support all operations in the arrow package.} } \description{ diff --git a/r/man/new_extension_type.Rd b/r/man/new_extension_type.Rd index 6d0f27c321991..a7307e538b940 100644 --- a/r/man/new_extension_type.Rd +++ b/r/man/new_extension_type.Rd @@ -32,7 +32,7 @@ array.} is reserved for extension types defined by the Apache Arrow libraries.} \item{extension_metadata}{A \code{\link[=raw]{raw()}} or \code{\link[=character]{character()}} vector containing the -serialized version of the type. Chatacter vectors must be length 1 and +serialized version of the type. Character vectors must be length 1 and are converted to UTF-8 before converting to \code{\link[=raw]{raw()}}.} \item{type_class}{An \link[R6:R6Class]{R6::R6Class} whose \verb{$new()} class method will be diff --git a/r/man/open_dataset.Rd b/r/man/open_dataset.Rd index 7c3d32289f73e..7028f38467303 100644 --- a/r/man/open_dataset.Rd +++ b/r/man/open_dataset.Rd @@ -142,7 +142,7 @@ what names to give the virtual columns that come from the path segments. The default behavior in \code{open_dataset()} is to inspect the file paths contained in the provided directory, and if they look like Hive-style, parse -them as Hive. If your dataset has Hive-style partioning in the file paths, +them as Hive. If your dataset has Hive-style partitioning in the file paths, you do not need to provide anything in the \code{partitioning} argument to \code{open_dataset()} to use them. If you do provide a character vector of partition column names, they will be ignored if they match what is detected, diff --git a/r/man/read_delim_arrow.Rd b/r/man/read_delim_arrow.Rd index 999f2d265b7fd..b56d445c9e2e3 100644 --- a/r/man/read_delim_arrow.Rd +++ b/r/man/read_delim_arrow.Rd @@ -230,7 +230,7 @@ be dropped. Note that if you are specifying column names, whether by \code{schema} or \code{col_names}, and the CSV file has a header row that would otherwise be used -to idenfity column names, you'll need to add \code{skip = 1} to skip that row. +to identify column names, you'll need to add \code{skip = 1} to skip that row. } \examples{ diff --git a/r/man/write_feather.Rd b/r/man/write_feather.Rd index 78cf60b67477f..0d3a7da3b90b4 100644 --- a/r/man/write_feather.Rd +++ b/r/man/write_feather.Rd @@ -59,7 +59,7 @@ and to make sharing data across data analysis languages easy. a legacy version available starting in 2016, and the Version 2 (V2), which is the Apache Arrow IPC file format. The default version is V2. -V1 files are distinct from Arrow IPC files and lack many feathures, +V1 files are distinct from Arrow IPC files and lack many features, such as the ability to store all Arrow data tyeps, and compression support. \code{\link[=write_ipc_file]{write_ipc_file()}} can only write V2 files. } diff --git a/r/man/write_parquet.Rd b/r/man/write_parquet.Rd index af976b1aabf81..480abb12fcf4a 100644 --- a/r/man/write_parquet.Rd +++ b/r/man/write_parquet.Rd @@ -86,7 +86,7 @@ value for each column, in positional order value for the setting is used when not supplied } -The \code{compression} argument can be any of the following (case insensitive): +The \code{compression} argument can be any of the following (case-insensitive): "uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2". Only "uncompressed" is guaranteed to be available, but "snappy" and "gzip" are almost always included. See \code{\link[=codec_is_available]{codec_is_available()}}. diff --git a/r/src/altrep.cpp b/r/src/altrep.cpp index 9bacf07d1840e..9745393d01bbc 100644 --- a/r/src/altrep.cpp +++ b/r/src/altrep.cpp @@ -747,7 +747,7 @@ struct AltrepVectorString : public AltrepVectorBase> { // Helper class to convert to R strings. We declare one of these for the // class to avoid having to stack-allocate one for every STRING_ELT call. // This class does not own a reference to any arrays: it is the caller's - // responsibility to ensure the Array lifetime exeeds that of the viewer. + // responsibility to ensure the Array lifetime exceeds that of the viewer. struct RStringViewer { RStringViewer() : strip_out_nuls_(false), nul_was_stripped_(false) {} diff --git a/r/src/safe-call-into-r.h b/r/src/safe-call-into-r.h index 319d46d11f0d6..0ffd1d16dca01 100644 --- a/r/src/safe-call-into-r.h +++ b/r/src/safe-call-into-r.h @@ -141,15 +141,15 @@ class MainRThread { MainRThread() : initialized_(false), executor_(nullptr), stop_source_(nullptr) {} }; -// This object is used to ensure that signal hanlders are registered when +// This object is used to ensure that signal handlers are registered when // RunWithCapturedR launches its background thread to call Arrow and is // cleaned up however this exits. Note that the lifecycle of the StopSource, // which is registered at package load, is not necessarily tied to the // lifecycle of the signal handlers. The general approach is to register // the signal handlers only when we are evaluating code outside the R thread // (when we are evaluating code *on* the R thread, R's signal handlers are -// sufficient and will signal an interupt condition that will propagate -// via a cpp11::unwind_excpetion). +// sufficient and will signal an interrupt condition that will propagate +// via a cpp11::unwind_exception). class WithSignalHandlerContext { public: WithSignalHandlerContext() : signal_handler_registered_(false) { diff --git a/r/tests/testthat/helper-arrow.R b/r/tests/testthat/helper-arrow.R index 8d39f7252ee21..e277c645d456e 100644 --- a/r/tests/testthat/helper-arrow.R +++ b/r/tests/testthat/helper-arrow.R @@ -37,7 +37,7 @@ with_language <- function(lang, expr) { skip_on_cran() old <- Sys.getenv("LANGUAGE") # Check what this message is before changing languages; this will - # trigger caching the transations if the OS does that (some do). + # trigger caching the translations if the OS does that (some do). # If the OS does cache, then we can't test changing languages safely. before <- i18ize_error_messages() Sys.setenv(LANGUAGE = lang) diff --git a/r/tests/testthat/helper-skip.R b/r/tests/testthat/helper-skip.R index 3d68dac5af69b..bd29080848184 100644 --- a/r/tests/testthat/helper-skip.R +++ b/r/tests/testthat/helper-skip.R @@ -38,11 +38,11 @@ skip_if_not_available <- function(feature) { skip_on_linux_devel() } - # curl/ssl on MacOS is too old to support S3 filesystems without + # curl/ssl on macOS is too old to support S3 filesystems without # crashing when the process exits. if (feature == "s3") { if (on_macos_10_13_or_lower()) { - skip("curl/ssl runtime on MacOS 10.13 is too old") + skip("curl/ssl runtime on macOS 10.13 is too old") } } diff --git a/r/tests/testthat/test-Array.R b/r/tests/testthat/test-Array.R index b29c1f4e09dde..bb005605de318 100644 --- a/r/tests/testthat/test-Array.R +++ b/r/tests/testthat/test-Array.R @@ -371,19 +371,19 @@ test_that("support for NaN (ARROW-3615)", { expect_equal(y$null_count, 1L) }) -test_that("is.nan() evalutes to FALSE on NA (for consistency with base R)", { +test_that("is.nan() evaluates to FALSE on NA (for consistency with base R)", { x <- c(1.0, NA, NaN, -1.0) compare_expression(is.nan(.input), x) }) -test_that("is.nan() evalutes to FALSE on non-floats (for consistency with base R)", { +test_that("is.nan() evaluates to FALSE on non-floats (for consistency with base R)", { x <- c(1L, 2L, 3L) y <- c("foo", "bar") compare_expression(is.nan(.input), x) compare_expression(is.nan(.input), y) }) -test_that("is.na() evalutes to TRUE on NaN (for consistency with base R)", { +test_that("is.na() evaluates to TRUE on NaN (for consistency with base R)", { x <- c(1, NA, NaN, -1) compare_expression(is.na(.input), x) }) diff --git a/r/tests/testthat/test-backwards-compatibility.R b/r/tests/testthat/test-backwards-compatibility.R index 8210bd2e78fd8..5f804b02dcee7 100644 --- a/r/tests/testthat/test-backwards-compatibility.R +++ b/r/tests/testthat/test-backwards-compatibility.R @@ -22,7 +22,7 @@ # To write a new version of a test file for an old version, use docker(-compose) # to setup a linux distribution and use RStudio's public package manager binary # repo to install the old version. The following commands should be run at the -# root of the arrow repo directory and might need slight adjusments. +# root of the arrow repo directory and might need slight adjustments. # R_ORG=rstudio R_IMAGE=r-base R_TAG=4.0-focal docker-compose build --no-cache r # R_ORG=rstudio R_IMAGE=r-base R_TAG=4.0-focal docker-compose run r /bin/bash # R diff --git a/r/tests/testthat/test-dataset-write.R b/r/tests/testthat/test-dataset-write.R index 28ff308747584..9f69380c55b3b 100644 --- a/r/tests/testthat/test-dataset-write.R +++ b/r/tests/testthat/test-dataset-write.R @@ -139,7 +139,7 @@ test_that("Writing a dataset: Parquet->Parquet (default)", { ) }) -test_that("Writing a dataset: `basename_template` default behavier", { +test_that("Writing a dataset: `basename_template` default behavior", { ds <- open_dataset(csv_dir, partitioning = "part", format = "csv") dst_dir <- make_temp_dir() @@ -840,7 +840,7 @@ test_that("Writing a dataset to text files with wrapper functions.", { expect_equal(new_ds %>% collect(), df) }) -test_that("Writing a flat file dataset: `basename_template` default behavier", { +test_that("Writing a flat file dataset: `basename_template` default behavior", { ds <- open_dataset(csv_dir, partitioning = "part", format = "csv") dst_dir <- make_temp_dir() diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R b/r/tests/testthat/test-dplyr-funcs-datetime.R index e707a194a3626..4d3226798d3ff 100644 --- a/r/tests/testthat/test-dplyr-funcs-datetime.R +++ b/r/tests/testthat/test-dplyr-funcs-datetime.R @@ -1550,7 +1550,7 @@ test_that("as.difftime()", { ) # only integer (or integer-like) -> duration conversion supported in Arrow. - # double -> duration not supported. we're not testing the content of the + # double -> duration not supported. We aren't testing the content of the # error message as it is being generated in the C++ code and it might change, # but we want to make sure that this error is raised in our binding implementation expect_error( @@ -1961,7 +1961,7 @@ test_that("`as.Date()` and `as_date()`", { # `as.Date()` ignores the `tzone` attribute and uses the value of the `tz` arg # to `as.Date()` # `as_date()` does the opposite: uses the tzone attribute of the POSIXct object - # passsed if`tz` is NULL + # passed if`tz` is NULL compare_dplyr_binding( .input %>% transmute( @@ -2831,7 +2831,7 @@ test_that("parse_date_time with truncated formats", { }) test_that("parse_date_time with `locale != NULL` not supported", { - # parse_date_time currently doesn't take locale paramete which will be + # parse_date_time currently doesn't take locale parameter which will be # addressed in https://issues.apache.org/jira/browse/ARROW-17147 skip_if_not_available("re2") @@ -3038,7 +3038,7 @@ test_that("build_formats() and build_format_from_order()", { # an "easy" date to avoid conflating tests of different things (i.e., it's # UTC time, and not one of the edge cases on or extremely close to the -# rounding boundaty) +# rounding boundary) easy_date <- as.POSIXct("2022-10-11 12:00:00", tz = "UTC") easy_df <- tibble::tibble(datetime = easy_date) @@ -3703,7 +3703,7 @@ test_that("with_tz() and force_tz() works", { roll_dst = "post") ) %>% collect(), - "roll_dst` value must be 'error' or 'boundary' for non-existent times" + "roll_dst` value must be 'error' or 'boundary' for nonexistent times" ) expect_warning( @@ -3716,7 +3716,7 @@ test_that("with_tz() and force_tz() works", { ) ) %>% collect(), - "`roll_dst` value must be 'error', 'pre', or 'post' for non-existent times" + "`roll_dst` value must be 'error', 'pre', or 'post' for nonexistent times" ) # Raise error when the timezone falls into the DST-break diff --git a/r/tests/testthat/test-dplyr-summarize.R b/r/tests/testthat/test-dplyr-summarize.R index d39c800f3ff0c..b2b2a9e54695d 100644 --- a/r/tests/testthat/test-dplyr-summarize.R +++ b/r/tests/testthat/test-dplyr-summarize.R @@ -355,7 +355,7 @@ test_that("Functions that take ... but we only accept a single arg", { test_that("median()", { # When medians are integer-valued, stats::median() sometimes returns output of - # type integer, whereas whereas the Arrow approx_median kernels always return + # type integer, whereas the Arrow approx_median kernels always return # output of type float64. The calls to median(int, ...) in the tests below # are enclosed in as.double() to work around this known difference. @@ -434,7 +434,7 @@ test_that("quantile()", { # returned by Arrow. # When quantiles are integer-valued, stats::quantile() sometimes returns - # output of type integer, whereas whereas the Arrow tdigest kernels always + # output of type integer, whereas the Arrow tdigest kernels always # return output of type float64. The calls to quantile(int, ...) in the tests # below are enclosed in as.double() to work around this known difference. @@ -841,7 +841,7 @@ test_that("Expressions on aggregations", { ) ) - # Check aggregates on aggeregates with more complex calls + # Check aggregates on aggregates with more complex calls expect_warning( record_batch(tbl) %>% summarise(any(any(!lgl))), paste( diff --git a/r/tests/testthat/test-extension.R b/r/tests/testthat/test-extension.R index 55a1f8d21eedb..8b3d7d8aaa902 100644 --- a/r/tests/testthat/test-extension.R +++ b/r/tests/testthat/test-extension.R @@ -256,7 +256,7 @@ test_that("RecordBatch can roundtrip extension types", { ) # check both column orders, since column order should stay in the same - # order whether the colunns are are extension types or not + # order whether the columns are extension types or not mixed_record_batch2 <- record_batch( normal = normal_vctr, custom = custom_array @@ -296,7 +296,7 @@ test_that("Table can roundtrip extension types", { ) # check both column orders, since column order should stay in the same - # order whether the colunns are are extension types or not + # order whether the columns are extension types or not mixed_table2 <- arrow_table( normal = normal_vctr, custom = custom_array diff --git a/r/tools/nixlibs.R b/r/tools/nixlibs.R index f4ae7312d3757..1794acee70d22 100644 --- a/r/tools/nixlibs.R +++ b/r/tools/nixlibs.R @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. -#### Fuctions #### check end of file for main logic +#### Functions #### check end of file for main logic env_is <- function(var, value) identical(tolower(Sys.getenv(var)), value) # Log messages in the style of the configure script @@ -896,7 +896,7 @@ download_libarrow_ok <- download_ok && !env_is("LIBARROW_DOWNLOAD", "false") thirdparty_dependency_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR", "tools/thirdparty_dependencies") arrow_versioned <- paste0("arrow-", VERSION) -# configure.win uses a different libarrow dir and and the zip is already nested +# configure.win uses a different libarrow dir and the zip is already nested if (on_windows) { lib_dir <- "windows" dst_dir <- lib_dir diff --git a/r/tools/update-checksums.R b/r/tools/update-checksums.R index 8b9f1e6959cfd..f41652e87849e 100644 --- a/r/tools/update-checksums.R +++ b/r/tools/update-checksums.R @@ -49,7 +49,7 @@ binary_paths <- readLines(tasks_yml) |> artifactory_root <- "https://apache.jfrog.io/artifactory/arrow/r/%s/libarrow/bin/%s" -# Get the checksuym file from the artifactory +# Get the checksum file from the artifactory for (path in binary_paths) { sha_path <- paste0(path, ".sha512") file <- file.path("tools/checksums", sha_path) diff --git a/r/vignettes/arrow.Rmd b/r/vignettes/arrow.Rmd index c218b08ede77b..50329334ce8b0 100644 --- a/r/vignettes/arrow.Rmd +++ b/r/vignettes/arrow.Rmd @@ -66,7 +66,7 @@ as.data.frame(dat) When this coercion takes place, each of the columns in the original Arrow Table must be converted to native R data objects. In the `dat` Table, for instance, `dat$x` is stored as the Arrow data type int32 inherited from C++, which becomes an R integer type when `as.data.frame()` is called. -It is possible to exercise fine grained control over this conversion process. To learn more about the different types and how they are converted, see the [data types](./data_types.html) article. +It is possible to exercise fine-grained control over this conversion process. To learn more about the different types and how they are converted, see the [data types](./data_types.html) article. ## Reading and writing data diff --git a/r/vignettes/data_objects.Rmd b/r/vignettes/data_objects.Rmd index 7fcef8e6e78c6..065745182df04 100644 --- a/r/vignettes/data_objects.Rmd +++ b/r/vignettes/data_objects.Rmd @@ -259,7 +259,7 @@ write_parquet(df_b, file.path(ds_dir_b, "part-0.parquet")) write_parquet(df_c, file.path(ds_dir_c, "part-0.parquet")) ``` -If we had wanted to, we could have further subdivided the dataset. A folder could contain multiple files (`part-0.parquet`, `part-1.parquet`, etc) if we wanted it to. Similarly, there is no particular reason to name the files `part-0.parquet` this way at all: it would have been fine to call these files `subset-a.parquet`, `subset-b.parquet`, and `subset-c.parquet` if we had wished. We could have written other file formats if we wanted, and we don't necessarily have to use Hive-style folders. You can learn more about the supported formats by reading the help documentation for `open_dataset()`, and learn about how to exercise fine grained control with `help("Dataset", package = "arrow")`. +If we had wanted to, we could have further subdivided the dataset. A folder could contain multiple files (`part-0.parquet`, `part-1.parquet`, etc) if we wanted it to. Similarly, there is no particular reason to name the files `part-0.parquet` this way at all: it would have been fine to call these files `subset-a.parquet`, `subset-b.parquet`, and `subset-c.parquet` if we had wished. We could have written other file formats if we wanted, and we don't necessarily have to use Hive-style folders. You can learn more about the supported formats by reading the help documentation for `open_dataset()`, and learn about how to exercise fine-grained control with `help("Dataset", package = "arrow")`. In any case, we have created an on-disk parquet Dataset using Hive-style partitioning. Our Dataset is defined by these files: diff --git a/r/vignettes/data_types.Rmd b/r/vignettes/data_types.Rmd index 6cbe7c72e6809..4b5ee01b6ab83 100644 --- a/r/vignettes/data_types.Rmd +++ b/r/vignettes/data_types.Rmd @@ -34,7 +34,7 @@ When the arrow package converts between R data and Arrow data, it will first che knitr::include_graphics("./data_types.png") ``` -In this image, black boxes refer to R data types and light blue boxes refer to Arrow data types. Directional arrows specify conversions (e.g., the bidirectional arrow between the logical R type and the boolean Arrow type means that R logicals convert to Arrow booleans and vice versa). Solid lines indicate that the this conversion rule is always the default; dashed lines mean that it only sometimes applies (the rules and special cases are described below). +In this image, black boxes refer to R data types and light blue boxes refer to Arrow data types. Directional arrows specify conversions (e.g., the bidirectional arrow between the logical R type and the boolean Arrow type means that the logical R converts to an Arrow boolean and vice versa). Solid lines indicate that this conversion rule is always the default; dashed lines mean that it only sometimes applies (the rules and special cases are described below). ## Logical/boolean types diff --git a/r/vignettes/data_wrangling.Rmd b/r/vignettes/data_wrangling.Rmd index e3d5b306f3e71..305a91c156eb1 100644 --- a/r/vignettes/data_wrangling.Rmd +++ b/r/vignettes/data_wrangling.Rmd @@ -165,7 +165,7 @@ sw2 %>% transmute(name, height, mass, res = residuals(lm(mass ~ height))) ``` -Because window functions are not supported, computing an aggregation like `mean()` on a grouped table or within a rowwise opertation like `filter()` is not supported: +Because window functions are not supported, computing an aggregation like `mean()` on a grouped table or within a rowwise operation like `filter()` is not supported: ```{r} sw %>% diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index de33e72407792..8e7cff7410473 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -46,18 +46,18 @@ not possible to link to a system version of libarrow during development). ## Option 1: Using nightly libarrow binaries -On Linux, MacOS, and Windows you can use the same workflow you might use for another +On Linux, macOS, and Windows you can use the same workflow you might use for another package that contains compiled code (e.g., `R CMD INSTALL .` from a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from RStudio). If the `arrow/r/libarrow` directory is not populated, the configure script will attempt to download the latest nightly libarrow binary, extract it to the -`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows` +`arrow/r/libarrow` directory (macOS, Linux) or `arrow/r/windows` directory (Windows), and continue building the R package as usual. Most of the time, you won't need to update your version of libarrow because the R package rarely changes with updates to the C++ library; however, if you start to get errors when rebuilding the R package, you may have to remove the -`libarrow` directory (MacOS, Linux) or `windows` directory (Windows) +`libarrow` directory (macOS, Linux) or `windows` directory (Windows) and do a "clean" rebuild. You can do this from a terminal with `R CMD INSTALL . --preclean`, from RStudio using the "Clean and Install" option from "Build" tab, or using `make clean` if you are using the `Makefile` diff --git a/r/vignettes/fs.Rmd b/r/vignettes/fs.Rmd index a21a7864f7d73..50278af25bd1b 100644 --- a/r/vignettes/fs.Rmd +++ b/r/vignettes/fs.Rmd @@ -14,7 +14,7 @@ This article provides an overview of working with both S3 and GCS data using the ## S3 and GCS support on Linux -Before you start, make sure that your arrow install has support for S3 and/or GCS enabled. For most users this will be true by default, because the Windows and MacOS binary packages hosted on CRAN include S3 and GCS support. You can check whether support is enabled via helper functions: +Before you start, make sure that your arrow install has support for S3 and/or GCS enabled. For most users this will be true by default, because the Windows and macOS binary packages hosted on CRAN include S3 and GCS support. You can check whether support is enabled via helper functions: ```r arrow_with_s3() @@ -307,7 +307,7 @@ Sys.unsetenv("AWS_S3_ENDPOINT") ``` By default, the AWS SDK tries to retrieve metadata about user configuration, -which can cause conficts when passing in connection details via URI (for example +which can cause conflicts when passing in connection details via URI (for example when accessing a MINIO bucket). To disable the use of AWS environment variables, you can set environment variable `AWS_EC2_METADATA_DISABLED` to `TRUE`. diff --git a/r/vignettes/install.Rmd b/r/vignettes/install.Rmd index 10155e3a8cd5b..df43a9de36fc2 100644 --- a/r/vignettes/install.Rmd +++ b/r/vignettes/install.Rmd @@ -10,9 +10,9 @@ In most cases, `install.packages("arrow")` should just work. There are things yo ## Background -The Apache Arrow project is implemented in multiple languages, and the R package depends on the Arrow C++ library (referred to from here on as libarrow). This means that when you install arrow, you need both the R and C++ versions. If you install arrow from CRAN on a machine running Windows or MacOS, when you call `install.packages("arrow")`, a precompiled binary containing both the R package and libarrow will be downloaded. However, CRAN does not host R package binaries for Linux, and so you must choose from one of the alternative approaches. +The Apache Arrow project is implemented in multiple languages, and the R package depends on the Arrow C++ library (referred to from here on as libarrow). This means that when you install arrow, you need both the R and C++ versions. If you install arrow from CRAN on a machine running Windows or macOS, when you call `install.packages("arrow")`, a precompiled binary containing both the R package and libarrow will be downloaded. However, CRAN does not host R package binaries for Linux, and so you must choose from one of the alternative approaches. -This article outlines the recommend approaches to installing arrow on Linux, starting from the simplest and least customizable to the most complex but with more flexbility to customize your installation. +This article outlines the recommend approaches to installing arrow on Linux, starting from the simplest and least customizable to the most complex but with more flexibility to customize your installation. The primary audience for this document is arrow R package _users_ on Linux, and not Arrow _developers_. Additional resources for developers are listed at the end of this article. @@ -225,7 +225,7 @@ already present (when set to `AUTO`, the default). These dependencies vary by platform; however, if you wish to install these yourself prior to libarrow installation, we recommend that you take a look at the [docker file for whichever of our CI builds](https://github.com/apache/arrow/tree/main/ci/docker) -(the ones ending in "cpp" are for building Arrow's C++ libaries, aka libarrow) +(the ones ending in "cpp" are for building Arrow's C++ libraries, aka libarrow) corresponds most closely to your setup. This will contain the most up-to-date information about dependencies and minimum versions. diff --git a/r/vignettes/read_write.Rmd b/r/vignettes/read_write.Rmd index 15b2392b8ee5c..0ee695a6f4907 100644 --- a/r/vignettes/read_write.Rmd +++ b/r/vignettes/read_write.Rmd @@ -140,7 +140,7 @@ write_csv_arrow(mtcars, file_path) read_csv_arrow(file_path, col_select = starts_with("d")) ``` -In addition to the options provided by the readr-style arguments (`delim`, `quote`, `escape_doubple`, `escape_backslash`, etc), you can use the `schema` argument to specify column types: see `schema()` help for details. There is also the option of using `parse_options`, `convert_options`, and `read_options` to exercise fine-grained control over the arrow csv reader: see `help("CsvReadOptions", package = "arrow")` for details. +In addition to the options provided by the readr-style arguments (`delim`, `quote`, `escape_double`, `escape_backslash`, etc), you can use the `schema` argument to specify column types: see `schema()` help for details. There is also the option of using `parse_options`, `convert_options`, and `read_options` to exercise fine-grained control over the arrow csv reader: see `help("CsvReadOptions", package = "arrow")` for details. ## JSON format