Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some data for tests, and add_weights function #9

Merged
merged 3 commits into from
Jun 19, 2023

Conversation

AbrahamAz
Copy link
Contributor

No description provided.

@AbrahamAz
Copy link
Contributor Author

add weights

R/add_weights.R Outdated Show resolved Hide resolved
R/add_weights.R Outdated Show resolved Hide resolved
stop("Cannot find the defined strata column in the provided sample frame.")
if(!strata_column_dataset %in% names(.dataset))
stop("Cannot find the defined strata column in the provided dataset.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can add -

  1. Error message if all the strata from the dataset are not available in the sample frame
  2. Warning message -if all the strata from the sample frame are not available in the dataset [I am suggesting warning, because sometimes its normal to have population data across the country, including the areas where we are having access issue]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if there is a missing strata in the dataset. the weights df will return only NA.
  • if you add a na.rm = T, the weights will including the population of the missing strata, hence not correct.
  • I am not really sure what Mehedi wanted to achieve here. We can change the warning to error until he comes back.

#missing strata in dataset : does not run
set.seed(2133)
my_data <- data.frame(aa = runif(100),
strata = sample(LETTERS[1:5],size = 100, T))
my_sf <- data.frame(strata = LETTERS[1:6],
pop = c(10000,10000,20000,30000,5000,5000))
add_weights(my_data,
my_sf,
strata_column_dataset = "strata",
strata_column_sample = "strata",
population_column = "pop") %>%
dplyr::summarise(sum(weight))

#missing strata in sampling frame : ok
set.seed(2133)
my_data <- data.frame(aa = runif(100),
strata = sample(LETTERS[1:6],size = 100, T))
my_sf <- data.frame(strata = LETTERS[1:5],
pop = c(10000,10000,20000,30000,5000))
add_weights(my_data,
my_sf,
strata_column_dataset = "strata",
strata_column_sample = "strata",
population_column = "pop") %>%
dplyr::summarise(sum(weight))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the third point, I think what Mehedi meant is that the sampling frame was done, then data collection happened in fewer locations than what was decided because of field reasons. I understand that this then requires either the change of the whole sample frame document to match the collected data, as filtering out the strata not collected in the sample frame will not resolve the issue.
I changed it for the time being to error as mentioned, but we can look again at it.

R/add_weights.R Show resolved Hide resolved
stop("Cannot find the defined strata column in the provided sample frame.")
if(!strata_column_dataset %in% names(.dataset))
stop("Cannot find the defined strata column in the provided dataset.")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if there is a missing strata in the dataset. the weights df will return only NA.
  • if you add a na.rm = T, the weights will including the population of the missing strata, hence not correct.
  • I am not really sure what Mehedi wanted to achieve here. We can change the warning to error until he comes back.

#missing strata in dataset : does not run
set.seed(2133)
my_data <- data.frame(aa = runif(100),
strata = sample(LETTERS[1:5],size = 100, T))
my_sf <- data.frame(strata = LETTERS[1:6],
pop = c(10000,10000,20000,30000,5000,5000))
add_weights(my_data,
my_sf,
strata_column_dataset = "strata",
strata_column_sample = "strata",
population_column = "pop") %>%
dplyr::summarise(sum(weight))

#missing strata in sampling frame : ok
set.seed(2133)
my_data <- data.frame(aa = runif(100),
strata = sample(LETTERS[1:6],size = 100, T))
my_sf <- data.frame(strata = LETTERS[1:5],
pop = c(10000,10000,20000,30000,5000))
add_weights(my_data,
my_sf,
strata_column_dataset = "strata",
strata_column_sample = "strata",
population_column = "pop") %>%
dplyr::summarise(sum(weight))

R/add_weights.R Outdated Show resolved Hide resolved
tests/testthat/test-add_weights.R Show resolved Hide resolved
R/add_weights.R Outdated Show resolved Hide resolved
R/add_weights.R Outdated Show resolved Hide resolved
@yannsay-impact yannsay-impact merged commit 3419d64 into impact-initiatives:main Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants