✨ Initial commit for branch "dev_theil" (ndi v0.1.6.9006) (#23)

* Added `theil()` function the aspatial racial or ethnic Entropy (*H*) based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795)
idblr · Aug 24, 2024 · ddbac71 · ddbac71
1 parent bd42baa
commit ddbac71
Show file tree

Hide file tree

Showing 31 changed files with 1,151 additions and 170 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: ndi
 Title: Neighborhood Deprivation Indices
-Version: 0.1.6.9005
+Version: 0.1.6.9006
 Date: 2024-08-24
 Authors@R:
     c(person(given = "Ian D.",
@@ -45,11 +45,13 @@ Description: Computes various metrics of socio-economic deprivation and disparit
              based on Hoover (1941) <doi:10.1017/S0022050700052980> and Duncan et al. (1961; 
              LC:60007089), (11) an index of spatial proximity (SP) based on White (1986)
              <doi:10.2307/3644339> and Blau (1977; ISBN-13:978-0-029-03660-0), (12) the 
-             aspatial racial or ethnic Isolatoin Index (xPx*) based on Lieberson (1981; 
+             aspatial racial or ethnic Isolation Index (xPx*) based on Lieberson (1981; 
              ISBN-13:978-1-032-53884-6) and Bell (1954) <doi:10.2307/2574118>, (13) the
              aspatial racial or ethnic Gini Index (G) based Gini (1921) <doi:10.2307/2223319>, 
-             and (14) the aspatial racial or ethnic Dissimilarity Index (D) based on James & 
-             Taeuber (1985) <doi:10.2307/270845>. Also using data from the ACS-5 (2005-2009 
+             (14) the aspatial racial or ethnic Dissimilarity Index (D) based on James & 
+             Taeuber (1985) <doi:10.2307/270845>, and (15) the aspatial racial or ethnic 
+             Entropy (H) based on Theil (1972; ISBN-13:978-0-444-10378-9) and Theil & Finizza
+             (1971) <doi:110.1080/0022250X.1971.9989795>. Also using data from the ACS-5 (2005-2009 
              onward), the package can retrieve the aspatial income Gini Index (G) based on 
              Gini (1921) <doi:10.2307/2223319>.
 License: Apache License (>= 2.0)

diff --git a/NAMESPACE b/NAMESPACE
@@ -14,6 +14,7 @@ export(lieberson)
 export(messer)
 export(powell_wiley)
 export(sudano)
+export(theil)
 export(white)
 export(white_blau)
 import(dplyr)

diff --git a/NEWS.md b/NEWS.md
@@ -1,12 +1,13 @@
 # ndi (development version)
 
-## ndi v0.1.6.9005
+## ndi v0.1.6.9006
 
 ### New Features
 * Added `hoover()` function to compute the aspatial racial or ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089)
 * Added `white_blau()` function to compute an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0)
 * Added `lieberson()` function to compute the aspatial racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and and [Bell (1954)](https://doi.org/10.2307/2574118)
 * Added `james_taeuber()` function to compute the aspatial racial or ethnic Dissimilarity Index (*D*) based on [James & Taeuber (1985)](https://doi.org/10.2307/270845)
+* Added `theil()` function the aspatial racial or ethnic Entropy (*H*) based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795)
 * Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `lieberson()`, `sudano()`, and `white()`, `white_blau()` functions.
 * Thank you for the feature suggestions, [Symielle Gaston](https://orcid.org/0000-0001-9495-1592)
 * Added `holder` argument to `atkinson()` function to toggle the computation with or without the Hölder mean. The function can now compute *A* without the Hölder mean. The default is `holder = FALSE`.
@@ -15,13 +16,13 @@
 ### Updates
 * `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout
 * Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` functions when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA
-* Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `d_fun()` and `xpy_star_fun()`, respectively.
+* Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `ddd_fun()` and `xpy_star_fun()`, respectively.
 * `tigris` and `units` are now Imports
 * 'package.R' deprecated. Replaced with 'ndi-package.R'
 * Re-formatted code and documentation throughout for consistent readability
 * Renamed 'race/ethnicity' or 'racial/ethnic' to 'race or ethnicity' or 'racial or ethnic' throughout documentation to use more modern, inclusive, and appropriate language
 * Updated documentation about value range of *V* (White) from `{0 to 1}` to `{-Inf to Inf}`
-* Added examples for `gini()`, `james_taeuber()`, `lieberson()`, `hoover()` and `white_blau()` functions in vignette and README
+* Added examples for `gini()`, `james_taeuber()`, `lieberson()`, `hoover()`, `theil()`, and `white_blau()` functions in vignette and README
 * Added example for `holder` argument in `atkinson()` function in README
 * Reformatted functions for consistent internal structure
 * Updated examples in vignette to showcase a larger variety of U.S. states

diff --git a/R/atkinson.R b/R/atkinson.R
@@ -40,9 +40,9 @@
 #'
 #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output.
 #'
-#' \emph{A} is a measure of the evenness of residential inequality (e.g., racial or ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation).
+#' \emph{A} is a measure of the evenness of residential inequality (e.g., racial or ethnic segregation) when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation).
 #'
-#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}.
+#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among units of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}.
 #'
 #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{A} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{A} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{A} computation.
 #' 

diff --git a/R/bemanian_beyer.R b/R/bemanian_beyer.R
@@ -39,9 +39,9 @@
 #'
 #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output.
 #'
-#' \emph{LEx/Is} is a measure of the probability that two individuals living within a specific smaller geography (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial or ethnic subgroup(s) will interact, assuming that individuals within a smaller geography are randomly mixed. \emph{LEx/Is} is standardized with a logit transformation and centered against an expected case that all races or ethnicities are evenly distributed across a larger geography. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.)
+#' \emph{LEx/Is} is a measure of the probability that two individuals living within a specific smaller geographical unit (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial or ethnic subgroup(s) will interact, assuming that individuals within a smaller geographical unit are randomly mixed. \emph{LEx/Is} is standardized with a logit transformation and centered against an expected case that all races or ethnicities are evenly distributed across a larger geographical unit. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.)
 #'
-#' \emph{LEx/Is} can range from negative infinity to infinity. If \emph{LEx/Is} is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geography is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geography. If \emph{LEx/Is} is greater than zero then the interaction is more likely to occur within the smaller geography than in the larger geography, and if \emph{LEx/Is} is less than zero then the interaction is less likely to occur within the smaller geography than in the larger geography. Note: the exponentiation of each \emph{LEx/Is} metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geography relative to the larger geography.
+#' \emph{LEx/Is} can range from negative infinity to infinity. If \emph{LEx/Is} is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geographical unit is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geographical unit. If \emph{LEx/Is} is greater than zero then the interaction is more likely to occur within the smaller geographical unit than in the larger geographical unit, and if \emph{LEx/Is} is less than zero then the interaction is less likely to occur within the smaller geographical unit than in the larger geographical unit. Note: the exponentiation of each \emph{LEx/Is} metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geographical unit relative to the larger geographical unit.
 #'
 #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{LEx/Is} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{LEx/Is} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{LEx/Is} computation.
 #'

diff --git a/R/duncan.R b/R/duncan.R
@@ -39,14 +39,14 @@
 #'
 #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output.
 #'
-#' \emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation.
+#' \emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation.
 #'
 #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{D} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{D} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{D} computation.
 #' 
 #' @return An object of class 'list'. This is a named list with the following components:
 #'
 #' \describe{
-#' \item{\code{di}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.}
+#' \item{\code{d}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.}
 #' \item{\code{d_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.}
 #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{D}.}
 #' }
@@ -315,11 +315,11 @@ duncan <- function(geo_large = 'county',
     ## From Duncan & Duncan (1955) https://doi.org/10.2307/2088328
     ## D_{jt} = 1/2 \sum_{i=1}^{k} | \frac{x_{ijt}}{X_{jt}}-\frac{y_{ijt}}{Y_{jt}}|
     ## Where for k smaller geographies:
-    ## D_{jt} denotes the D of larger geography j at time t
-    ## x_{ijt} denotes the racial or ethnic subgroup population of smaller geography i within larger geography j at time t
-    ## X_{jt} denotes the racial or ethnic subgroup population of larger geography j at time t
-    ## y_{ijt} denotes the racial or ethnic referent subgroup population of smaller geography i within larger geography j at time t
-    ## Y_{jt} denotes the racial or ethnic referent subgroup population of larger geography j at time t
+    ## D_{jt} denotes the D of larger geographical unit j at time t
+    ## x_{ijt} denotes the racial or ethnic subgroup population of smaller geographical unit i within larger geographical unit j at time t
+    ## X_{jt} denotes the racial or ethnic subgroup population of larger geographical unit j at time t
+    ## y_{ijt} denotes the racial or ethnic referent subgroup population of smaller geographical unit i within larger geographical unit j at time t
+    ## Y_{jt} denotes the racial or ethnic referent subgroup population of larger geographical unit j at time t
 
     ## Compute
     out_tmp <- out_dat %>%

diff --git a/R/gini.R b/R/gini.R
@@ -278,8 +278,8 @@ gini <- function(geo_large = 'county',
     ## t_{j} is the total population of area j
     ## p_{i} is the proportion of the subgroup population of area i
     ## p_{j} is the proportion of the subgroup population of area j
-    ## T is the total population of all areas
-    ## P is the proportion of the subgroup population of all areas
+    ## T is the total population of all smaller geographical units
+    ## P is the proportion of the subgroup population of all smaller geographical units
 
     ## Compute
     out_tmp <- out_dat %>%

diff --git a/R/globals.R b/R/globals.R
@@ -260,6 +260,7 @@ globalVariables(
     'G_incE',
     'G_re', 
     'xPx_star', 
-    'xPy_star'
+    'xPy_star',
+    'H'
   )
 )