-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with week time code #200
Comments
No, it doesn't. I think weekly data is relatively new addition in Eurostat. I thought that id would be easyly fixed, but
However, there seems to be a ISOweek package: https://cran.r-project.org/web/packages/ISOweek/, which I guess gives right dates. Or we could use UK week defination (there is some difference in starting week). Then there seems to be also week W99. How, that is supposed to be treated? |
Yes, I personally decided to use 99 means that the week is not known (to cite the same source: "W99 means ‘unknown week’."). |
As it is converted to a Date, on what date a W99 should be converted? The last day of the last week? |
Very good question. Definitely not the last week, as it'd imply that all people with unknown death date died on the last week, i.e. they'd be pooled together with those who indeed died on the last week. I don't know whether it breaks any consistency within |
But then we would lose year information. I thought that last week would have information on two dates. Dated infromation on the first day, as normal, and unknown on the last day. |
Ah, I forget that, you're completely correct. I am no expert in designing such things, but what you outlined seems to be a possible solution. Although the user has to be very clearly informed in this case what do those dates exactly mean (and also generally, that while there is a concrete date, the data pertains to a week). |
FWIW, {ISOweek} is now the correct solution, I think - I just ended up using it on the same data (national, not Eurostat, but produced to the same standard). Perhaps tidyverse/lubridate#506 (comment) may also be helpful. And thanks for {eurostat}, very helpful! |
My solution is to filter out the data from W99, which definitely is not a clean solution, but given it only affects Hungary/Latvia and Sweden.... its a workaround. W99 values by geo and year:
So right know I have this code working fine:
The best way would be if Eurostat would divide the W99 values and assign them to each week of the year accordingly to known values week "weights". If anybody works with countries, that have W99 data, then I would suggest to do this manually. |
If this is a common need, would it be feasible to have an additional enrichment function that could be run after data retrieval? |
"The best way would be if Eurostat would divide the W99 values and assign them to each week of the year accordingly to known values week "weights". If anybody works with countries, that have W99 data, then I would suggest to do this manually." I completely agree. As a minimum solution, proportionally increasing all values would work in my opinion. (At least if the proportion of values reported for W99 is small compared to the total.) |
I did some testing with the dataset mentioned here and I have to say fixing this weekly data issue was easier than figuring out how to efficiently handle this dataset with 110 million row (after pivot_longer). 16 GB of RAM wasn't apparently enough the way it was done before. The results are in commit cfdaf37 of the v4-dev branch (version 4.0.0.9002). Based on the discussion here I couldn't figure out a sensible solution to W99 values. Drop it? Assign it to the last day of the year? Distribute the values evenly for the whole year? In my solution I coerced it to the first day of the first week of the year and the function prints a warning message for the user, suggesting to use |
Closed with the CRAN release of package version 4.0.0 |
It seems
eurostat
(more specifically,eurotime2date
) can't handle weekly data:The text was updated successfully, but these errors were encountered: