Dataset updates #30

briatte · 2021-01-30T11:57:25Z

Closes #21, #22 and #23 (copied below), #27.

Update from 2023

Stop updating the data, really.

'Freeze' as it is ~~(except for ESS, perhaps)~~
Archive the original freezed datasets/codebooks in data-raw/
Update srqm_data to use data-raw/
Slightly improve the _readme documents
- Document freezes
- Document codebook issues, e.g. QOG 2020: make sure GDP documentation has been corrected #27
- Ideally, this would be in the Stata Guide…
Add WEP? Country-level data: World Economics and Politics Dataverse #24

Detailed notes

QOG: ~~qog2023~~ -- since QOG 2023 is out
- freeze: qog2019
- would require rewriting code and looking at less clear results… see code at end of section
- only advantage would be lower codebook size → just downsample the 2019 one, it only loses the intra-doc links
- note the codebook issue! QOG 2020: make sure GDP documentation has been corrected #27
- Perhaps simply drop the eu_* variables
GSS: ~~gss7221~~ -- since GSS has updated too
- freeze: gss7616 (but see below)
- not fun to keep only one year: keep ~~older years~~ one old year too
- ~~possibly break down single data into yearly ones?~~ restrict to 1976 and 2016
  - would solve "max 2,048 vars" issue from Compatibility with different versions of Stata #28
  - ~~raises question as to how to zip it all (currently uses gss7616* to match files)~~
ESS: ~~ess2008~~ -- in order to continue using torture question?
- freeze: ess0816, or ess2008 and ess2016 (different codebooks, so it's fine)
- keep using Round 4 for both torture example and health services ones (results are not as clear-cut with Round 8(
- keep Round 8 to cover e.g. climate change
- problem: DTA file is too large -- divide, to avoid _merge problem
- document existence of ess2016 despite not in use anywhere in the course do-files
WVS: wvs9904 -- keep old version for sharia law question
- update to last version, check encoding
- possibly also include a more recent wave? (raises same question as ess2016)
NHIS: update to ~~nhis202* recent year~~ nhis1020?
- check if sampling frame and variables have changed first
- see below on how URL structure for fetching has changed

Note on QOG -- offers only this as a replacement in 2023, which is not ideal:

// school life expectancy
sc wdi_fertility wef_lse, ms(i) mlab(ccodealp) || lfit wdi_fertility wef_lse, ///
	name(g1, replace)
// linear fit + SSA data points only, underpredicted
sc wdi_fertility wef_lse if ht_region == 4, ms(i) mlab(ccodealp) || ///
	lfit wdi_fertility wef_lse, ///
	name(g2, replace)
// all regions
forv i = 1/10 {
	sc wdi_fertility wef_lse if ht_region == `i', ms(i) mlab(ccodealp) || ///
	lfit wdi_fertility wef_lse, ///
	name("region`i'", replace)
}

The plan for 2021:

Additional things to consider:

Dataset names

I like the initial "acronym + year" convention, but it produces strange names for multiple-year survey datasets:

ess1214 (not used) and ess0816
wvs9904 (unavoidable)
nhis1017 (unavoidable, unless we use a single year, but that removes any demo of keep if year)
gss7616 (unavoidable, unless we separate the years)

Merged datasets

Is it still a good idea to do that for e.g. ESS? Probably not, esp. if we need to limit datasets at 2,048 variables for Stata/IC.

Keep NHIS with multiple years. Use it to demo keep if year.
Keep WVS with multiple years (country-dependent).
Break down GSS.
Break down ESS.

Both WVS and ESS are used to demo keep if inlist(country, …), the other subset we want to show.

Additional datasets

It would make a lot of sense to have more datasets for the students to use than those used in the do-files.

Currently, the do-files are selective anyway: we provide ESS 2016 (Round 8) but do not use the data, even though the dependent variable also exists in that round.

GSS has a single codebook, so bundling many years would duplicate the codebook in the ZIP archives. Not ideal.
ESS could be broken down to Rounds 4 (2008), 8 (2016) and 9 (2018).

The text was updated successfully, but these errors were encountered:

briatte added enhancement datasets labels Jan 30, 2021

briatte self-assigned this Jan 30, 2021

This was referenced Jan 30, 2021

Update to GSS 2018 #23

Closed

Update code to new survey datasets #22

Closed

Update to QOG January 2020 #21

Closed

Ten years after — v2.0 #31

Open

briatte mentioned this issue May 11, 2023

Compatibility with different versions of Stata #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset updates #30

Dataset updates #30

briatte commented Jan 30, 2021 •

edited

Loading

Dataset updates #30

Dataset updates #30

Comments

briatte commented Jan 30, 2021 • edited Loading

Update from 2023

The plan for 2021:

Dataset names

Merged datasets

Additional datasets

briatte commented Jan 30, 2021 •

edited

Loading