Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regional_data tutorial #177

Closed
antagomir opened this issue Feb 11, 2020 · 10 comments
Closed

regional_data tutorial #177

antagomir opened this issue Feb 11, 2020 · 10 comments

Comments

@antagomir
Copy link
Member

Regional data tutorial seems great. Some of the tables are very extensive; would it make sense to show only the first few lines with head(), and/or use knitr::kable to make the table for html/markdown friendly in this online version?

@antaldaniel
Copy link
Contributor

Yes, certainly, I will change it this afternoon.

@antaldaniel
Copy link
Contributor

I wanted to make a suggestion about my last major release addition about regional statistics. I had been working with this issue ever since, and I grew to be convinced that this functionality should be removed to a separate CRAN package, while maintain cross-functionality and documentation link in vignettes.

In 2019, there had been major steps in both the Eurostat and in OECD to formalize sub-national statistics, not only including regional ones, but metropolitan area / city statistics. As opposed to national boundaries, that are relatively stable, such sub-national divisions are extremely numerous and change every year.

My original solution was triggerd by the way Eurostat converted its regional statistics from the NUTS2013 definition to NUTS2016, and I wrote a converter to this change, with keeping some NUTS2010 codes as exceptions. But the NUTS2021 is already agreed on, and occasionally I find data that is coded by NUTS2008, etc. All in all, I went back till 1999 (the last not standardized year of EU regions) and created new functions that can convert from any year to any year with precision. (I.e. NUTS2008 to NUTS2021, NUTS 2016 to NUTS2013). OECD uses the EU NUTS in its statistics, too, but it incorporates advanced statistical systems from the EU, Canada and Australia. My new package will take care of these changes, too.

However, recoding the regions is only the first step. When working with time-series or panel data, you would like to impute missing or erroneously coded data in tables. Given that my coding is getting very solid and general, this is possible, but there are so many possibilities that it will take probably a year or two that my solutions will mature. So it would be very impractical to keep them in the Eurostat package.

The way I am planning the change is the following:

  • Soon I will release on CRAN the first version of ‘regions’, if possible, withing the rOpenGov, with a detailed vignette that can be put into ‘Eurostat’ and regions, too, about how to validate, convert and impute regional Eurostat statistics.
  • I will also heavily review the mapping functions of Eurostat, because I think that they were not originally designed for the newer sub-national maps. Since we will see new, metropolitan area and other statistics appearing soon, I will keep making sure, within the Eurostat package, that mapping goes correctly.
  • This problematic vignette will be completely new, so I will not update it in this form.
  • The already existing functions will give warnings that they were moved to the new package, and will be treated as wrappers of the new (and more general) versions.

I think that even my early releases will make it sure that regional Eurostat statistical products can be enhanced greatly.

@antagomir
Copy link
Member Author

This sounds a very good plan, and I agree that splitting packages like this can be well justified. We would be very happy to maintain this as part of rOpenGov. There has also been some planning with @muuankarski to synchronize better the data part of eurostat pkg, and geospatial part (now in eurostat_geodata pkg). All three packages are complementary, and it is useful to develop them jointly.

I can grant you access to create the necessary repository under rOpenGov if you do not have those permissions already, and then we can see how to best contribute.

@antagomir
Copy link
Member Author

Oh - one more comment - would it be useful to have eurostat somehow in the pkg name?

@antaldaniel
Copy link
Contributor

Hi, more than happy to bring it in to the rOpenGov, I had a lot of little things with it, and eventually sent it to CRAN as a 0.1.0 to see if it goes through. If it passes CRAN, I am planning a very quick follow up that brings the two packages in synch.

Originally I was thinking about referring to eurostat in the name, but eventually this package went a lot farther, and it will handle the US, Australia, Japan, and besides the NUTS typologies OECD and some other countries individual ISO 3166-2. So that would not make a lot of sense. In fact, I hope that I will be able to make a programmatic connection between the OECD and the Eurostat package.

Here is the new package: https://github.com/antaldaniel/regions
Let me know how can I bring it into rOpenGov exactly! (Sorry if I ask something very trivial, I actually did this earlier with the iotables that I will now bring a bit forward, too.)

@antagomir
Copy link
Member Author

Ah, great. OK!

Perhaps there are other ways but I now just forked this under rOpenGov and gave you the full admin permissions to this repo. In addition, I gave maintainer-level permissions for the eurostat devel team.

You might be able to create new repositories under rOpenGov now as you have full admin rights to this pkg. But I am not sure, we can try next time when there is a need.

Shall we use the rOpenGov repo as the main branch from now on? If you like, we could do a small pkg review and possibly propose some improvements (if any - it looks very good already..).

As I mentioned, we will need to see if we can sync with eurostat_geodata pkg with @muuankarski ; and later on it might make a lot of sense to write a short report to R journal for instance (or another suitable one).

@antaldaniel
Copy link
Contributor

Thank you, @antagomir, what you propose is very good, I submitted, actually fearing some hickups to CRAN before I went to bed at night and I woke to the news that regions is on CRAN.

  1. I will update the url's in the package in the coming days, and of course,
  2. I will create a new vignette for eurostat and cross-reference the two packages.
  3. I'll also make a blogpost that could be on rOpenGov, too. It would be nice to connect it to Rbloggers. I am not familiar with Jekyll, but I use hugo and I think that the content can be taken over with very minimal changes.
  4. I indeed would like to submit something to the Journal of Statistical Software as soon as there are enough experiences / use cases with the package. I'd gladly incorporate anybody in this project, too. I'm working with a collaborator and a very exciting addition - a NUTS-Google typology correspondence table that will allow to match Eurostat regional data with Covid regional data, and hopefully Google Trends data. I think that it would create interest.
    @muuankarski: I may be just clumsy, but I do not find the eurostat_geodata package repo, I'd gladly take a look. Not to mention the tidymetadata package, because I have been working on something similar for Eurobarometer and EVS surveys. I think that whatever can be generalized could go to tidymetadata (similarly to regions, which is not eurostat -specific) and I would then focus in my eurobarometer package on GESIS/Eurobarometer specific issues. [My package is on github only in skeleton, I had been working on it for years now, but I was lacking the generalization idea that your tidymetadata has.

I am not extremely good with git, but I see that there are a lot of new project / team features for free that are available for rOpenGov, too. For new packages maybe it is a good idea to figure out.

@antagomir
Copy link
Member Author

Perfect with CRAN.

1-2. Good, let us know if you need any help.

  1. We have linked ropengov blog to R-bloggers and any posts tagged with R should appear there if it still works. And any post that is written in Rmd and reproducible should be directly transferable. We have some bugs to fix with our blog but can try to fix these on the same go.

  2. The geodata package is here: https://github.com/rOpenGov/eurostat_geodata. The plan overall sounds very good and aligned with what we are after. Let's see how to develop.

  3. JSS is a good one, according to our experience there would also need to be statistical content, so a plain interface / data handling package is not sufficient. Might be something to consider.

  4. New features emerge all the time and let us try to keep track and take advantage.

@antaldaniel
Copy link
Contributor

Issue 186 is connected to this, as we start working on the regions package, these issues should be resolved.

@antaldaniel
Copy link
Contributor

@antagomir Please review my latest pull request to the devel. It has a new vignette, deprecates the old (and not correctly working) functions. and gives a warning to the user to use regions instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants