Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug a data loader #853

Closed
espinielli opened this issue Feb 17, 2024 · 11 comments · Fixed by #996
Closed

How to debug a data loader #853

espinielli opened this issue Feb 17, 2024 · 11 comments · Fixed by #996
Labels
documentation Improvements or additions to documentation

Comments

@espinielli
Copy link

I have tried to implement an R data loader.
It works as a script but it errors in Observable Framework.
I was looking for ways/suggestions on how to debug data loaders in the documentation but failed.
Maybe it could be a useful addition...

@mbostock mbostock added the documentation Improvements or additions to documentation label Feb 18, 2024
@Fil
Copy link
Contributor

Fil commented Feb 18, 2024

Was there anything in the logs of the preview (or build) script?

@mbostock
Copy link
Member

In what way did it fail? How did you run it locally? Did you try running it with Rscript in the same directory? (That’s all that Framework does, so it’s surprising that it works outside of Framework but not inside it.)

@espinielli
Copy link
Author

espinielli commented Feb 18, 2024

Well, things failed because I was reading some local configuration.

I had to discover from which directory Rscript was being executing in order to specify the correct pathnames.
By trial and error I was able to spit pieces of info and write a working example as detailed below.

I have something working to (data) load a subset of the airports in OurAirports like in the code snippet below:

The file
docs/data/monitored-airports.csv
contains the ICAO code of the airports I am interested in:

icao
EBBR
EGLL

the relevant dataloader
docs/data/airports.csv.R

library(dplyr)
library(readr)
library(here)

# put a marker file `.here` in root of the framework example
moni <- readr::read_csv(here::here("docs", "data", "monitored-airports.csv")) |>
  dplyr::pull(icao)

apts_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/airports.csv'
ctrs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/countries.csv'
regs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/regions.csv'

ctrs <- readr::read_csv(ctrs_url, na = c(""))
regs <- readr::read_csv(regs_url, na = c(""))
rwys <- readr::read_csv(rwys_url, na = c(""))

apts <- readr::read_csv(apts_url, na = c("")) |>
  dplyr::filter(ident %in% moni) |>
  dplyr::left_join(ctrs, by = c("iso_country" = "code"), suffix = c("", "_country")) |>
  dplyr::left_join(regs, by = c("iso_region" = "code"), suffix = c("", "_region")) |>
  dplyr::select(
    id,
    ident,
    code_icao = ident,
    iata_code,
    type,
    name,
    latitude = latitude_deg,
    longitude = longitude_deg,
    elevation = elevation_ft,
    iso_country,
    name_country,
    iso_region,
    name_region,
    continent,
  ) |>
  dplyr::mutate(
    name_continent = dplyr::case_when(
      continent == "AF" ~ "Africa",
      continent == "AN" ~ "Antarctica",
      continent == "AS" ~ "Asia",
      continent == "EU" ~ "Europe",
      continent == "NA" ~ "North America",
      continent == "OC" ~ "Oceania",
      continent == "SA" ~ "South America",
      .default = NA_character_
    )
  )

apts |>
  readr::write_csv(stdout())

@espinielli
Copy link
Author

A simpler R data loader is the following for retrieving Italian medium/large airports

library(dplyr)
library(readr)



apts_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/airports.csv'
ctrs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/countries.csv'
regs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/regions.csv'

ctrs <- readr::read_csv(ctrs_url, na = c(""))
regs <- readr::read_csv(regs_url, na = c(""))

apts <- readr::read_csv(apts_url, na = c("")) |>
  dplyr::left_join(ctrs, by = c("iso_country" = "code"), suffix = c("", "_country")) |>
  dplyr::left_join(regs, by = c("iso_region" = "code"), suffix = c("", "_region")) |>
  dplyr::filter(iso_country == "IT", type %in% c("medium_airport", "large_airport")) |>
  dplyr::select(
    id,
    ident,
    code_icao = ident,
    iata_code,
    type,
    name,
    latitude = latitude_deg,
    longitude = longitude_deg,
    elevation = elevation_ft,
    iso_country,
    name_country,
    iso_region,
    name_region,
    continent,
  ) |>
  dplyr::mutate(
    name_continent = dplyr::case_when(
      continent == "AF" ~ "Africa",
      continent == "AN" ~ "Antarctica",
      continent == "AS" ~ "Asia",
      continent == "EU" ~ "Europe",
      continent == "NA" ~ "North America",
      continent == "OC" ~ "Oceania",
      continent == "SA" ~ "South America",
      .default = NA_character_
    )
  )

apts |>
  readr::write_csv(stdout())

@espinielli
Copy link
Author

The critical info for the documentation, at least for an R data loader, is where Rscript will be invoked from.

@mbostock
Copy link
Member

Thanks for all the additional context! Where do you expect (or want) Rscript to be invoked from?

@espinielli
Copy link
Author

The way it is done currently is ok, I think it should just be documented.

Data loader implementers will need to have this knowledge to eventually navigate their filesystem and retrieve complementary data as in my first example (reading a local file).

@Fil
Copy link
Contributor

Fil commented Feb 19, 2024

It might make more sense to cd to the docs root rather than stay in the code root?

@espinielli
Copy link
Author

My first example using the here package in R and adding a dummy .here file in <observable framework project root>, does exactly that:
It marks the Observable Framework project root as the root for the here() function where to navigate the files from.
So

here("docs", "data", "monitored-airports.csv")

picks
<framework project root>/docs/data/monitored-airports.csv

@mbostock
Copy link
Member

It might make more sense to cd to the docs root rather than stay in the code root?

No, I don’t think we should do that. (By analogy with JavaScript, dependencies are installed into node_modules at the project root, not within the source root.)

@timbrock
Copy link

The way it is done currently is ok, I think it should just be documented.

Data loader implementers will need to have this knowledge to eventually navigate their filesystem and retrieve complementary data as in my first example (reading a local file).

I've been having exactly the same issue this morning. So +1 for documentation of this :).

Fil added a commit that referenced this issue Mar 6, 2024
mbostock added a commit that referenced this issue Mar 6, 2024
* document data loaders' cwd and import.meta.url

closes #853

* format code

* Update loaders.md

---------

Co-authored-by: Mike Bostock <mbostock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants