Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text in a double is creating NA accident_index values #231

Open
tra6sdc opened this issue Nov 14, 2023 · 3 comments
Open

Text in a double is creating NA accident_index values #231

tra6sdc opened this issue Nov 14, 2023 · 3 comments

Comments

@tra6sdc
Copy link

tra6sdc commented Nov 14, 2023

Hello,

accident_index is a concatenation of accident_year and accident_reference.
accident_index is of type double, but accident_reference is of type character.
This means that some accident_index values are NA

> casualty_2018<-get_stats19(year = 2018, type = "casualty", format = TRUE)
Files identified: dft-road-casualty-statistics-casualty-2018.csv

   https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2018.csv
Data already exists in data_dir, not downloading

-- Column specification -----------------------------------------------------------------------------------------
cols(
  accident_index = col_double(),
  accident_year = col_double(),
  accident_reference = col_character(),
  vehicle_reference = col_double(),
  casualty_reference = col_double(),
  casualty_class = col_double(),
  sex_of_casualty = col_double(),
  age_of_casualty = col_double(),
  age_band_of_casualty = col_double(),
  casualty_severity = col_double(),
  pedestrian_location = col_double(),
  pedestrian_movement = col_double(),
  car_passenger = col_double(),
  bus_or_coach_passenger = col_double(),
  pedestrian_road_maintenance_worker = col_double(),
  casualty_type = col_double(),
  casualty_home_area_type = col_double(),
  casualty_imd_decile = col_double(),
  lsoa_of_casualty = col_character()
)

Warning: 22715 parsing failures.
  row            col               expected        actual                                                                                                  file
30320 accident_index no trailing characters 201801T266389 'C:\Users\tra6sdc\AppData\Local\Temp\RtmpQZxJ35/dft-road-casualty-statistics-casualty-2018.csv'
30321 accident_index no trailing characters 201801T271905 'C:\Users\tra6sdc\AppData\Local\Temp\RtmpQZxJ35/dft-road-casualty-statistics-casualty-2018.csv'
30322 accident_index no trailing characters 201801T274868 'C:\Users\tra6sdc\AppData\Local\Temp\RtmpQZxJ35/dft-road-casualty-statistics-casualty-2018.csv'
30323 accident_index no trailing characters 201801T274868 'C:\Users\tra6sdc\AppData\Local\Temp\RtmpQZxJ35/dft-road-casualty-statistics-casualty-2018.csv'
30324 accident_index no trailing characters 201801T278015 'C:\Users\tra6sdc\AppData\Local\Temp\RtmpQZxJ35/dft-road-casualty-statistics-casualty-2018.csv'
..... .............. ...................... .. [... truncated]
@tra6sdc
Copy link
Author

tra6sdc commented Nov 14, 2023

Additionally, sometimes it creates a very inflated accident_index

       accident_index accident_year accident_reference vehicle_reference               vehicle_type
78577    2.018135e+12          2018          1352F0005                 1                        Car
78720    2.018135e+61          2018          1352L0054                 1                        Car
78721    2.018135e+61          2018          1352L0054                 2                        Car
82295    2.018136e+63          2018          1358F0056                 1 Motorcycle 125cc and under
127865   2.018340e+67          2018          340D00061                 1                        Car
77910    2.018135e+72          2018          1351F0065                 1                Pedal cycle
77911    2.018135e+72          2018          1351F0065                 2                        Car
81982    2.018136e+79          2018          1357S0072                 1      Taxi/Private hire car
214387  2.018630e+123          2018          63D000118                 1                        Car
214388  2.018630e+123          2018          63D000118                 2       Agricultural vehicle

@Robinlovelace
Copy link
Member

Thanks for raising the issue. Any thoughts of the underlying cause and solution?

My thinking: we can set the type with cols(): https://readr.tidyverse.org/reference/cols.html

@tra6sdc
Copy link
Author

tra6sdc commented Nov 14, 2023

Yep, explicitly specify the column data types rather than allowing R to guess. The inflated accident_index values may be a hexadecimal thing (but 'L' isn't a hexadecimal character) and the above solution might deal with this issue too. I believe that this isn't something I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants