Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stratas not present in the dataset #10

Open
yannsay-impact opened this issue Jun 19, 2023 · 1 comment
Open

stratas not present in the dataset #10

yannsay-impact opened this issue Jun 19, 2023 · 1 comment
Assignees

Comments

@yannsay-impact
Copy link
Collaborator

To be checked when Mehedi is back. If stratas are not present in the dataset, weights will be calculated over the total population which is incorrect. What was the idea behind having a warning rather than an error? Currently, we set it as error.

R/add_weights.R
stop("Cannot find the defined strata column in the provided sample frame.")
if(!strata_column_dataset %in% names(.dataset))
stop("Cannot find the defined strata column in the provided dataset.")

Member
@mhkhan27 mhkhan27 4 days ago
Perhaps we can add -

Error message if all the strata from the dataset are not available in the sample frame
Warning message -if all the strata from the sample frame are not available in the dataset [I am suggesting warning, because sometimes its normal to have population data across the country, including the areas where we are having access issue]
Member
Author
@AbrahamAz AbrahamAz 4 days ago
done

Member
@yannsay-impact yannsay-impact 4 days ago
if there is a missing strata in the dataset. the weights df will return only NA.
if you add a na.rm = T, the weights will including the population of the missing strata, hence not correct.
I am not really sure what Mehedi wanted to achieve here. We can change the warning to error until he comes back.
#missing strata in dataset : does not run
set.seed(2133)
my_data <- data.frame(aa = runif(100),
strata = sample(LETTERS[1:5],size = 100, T))
my_sf <- data.frame(strata = LETTERS[1:6],
pop = c(10000,10000,20000,30000,5000,5000))
add_weights(my_data,
my_sf,
strata_column_dataset = "strata",
strata_column_sample = "strata",
population_column = "pop") %>%
dplyr::summarise(sum(weight))

#missing strata in sampling frame : ok
set.seed(2133)
my_data <- data.frame(aa = runif(100),
strata = sample(LETTERS[1:6],size = 100, T))
my_sf <- data.frame(strata = LETTERS[1:5],
pop = c(10000,10000,20000,30000,5000))
add_weights(my_data,
my_sf,
strata_column_dataset = "strata",
strata_column_sample = "strata",
population_column = "pop") %>%
dplyr::summarise(sum(weight))

Member
Author
@AbrahamAz AbrahamAz 3 days ago
For the third point, I think what Mehedi meant is that the sampling frame was done, then data collection happened in fewer locations than what was decided because of field reasons. I understand that this then requires either the change of the whole sample frame document to match the collected data, as filtering out the strata not collected in the sample frame will not resolve the issue.
I changed it for the time being to error as mentioned, but we can look again at it.

@yannsay-impact
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants