How to debug a data loader #853

espinielli · 2024-02-17T22:52:07Z

I have tried to implement an R data loader.
It works as a script but it errors in Observable Framework.
I was looking for ways/suggestions on how to debug data loaders in the documentation but failed.
Maybe it could be a useful addition...

Fil · 2024-02-18T09:52:31Z

Was there anything in the logs of the preview (or build) script?

mbostock · 2024-02-18T14:36:40Z

In what way did it fail? How did you run it locally? Did you try running it with Rscript in the same directory? (That’s all that Framework does, so it’s surprising that it works outside of Framework but not inside it.)

espinielli · 2024-02-18T15:58:10Z

Well, things failed because I was reading some local configuration.

I had to discover from which directory Rscript was being executing in order to specify the correct pathnames.
By trial and error I was able to spit pieces of info and write a working example as detailed below.

I have something working to (data) load a subset of the airports in OurAirports like in the code snippet below:

The file
docs/data/monitored-airports.csv
contains the ICAO code of the airports I am interested in:

icao
EBBR
EGLL

the relevant dataloader
docs/data/airports.csv.R

library(dplyr)
library(readr)
library(here)

# put a marker file `.here` in root of the framework example
moni <- readr::read_csv(here::here("docs", "data", "monitored-airports.csv")) |>
  dplyr::pull(icao)

apts_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/airports.csv'
ctrs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/countries.csv'
regs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/regions.csv'

ctrs <- readr::read_csv(ctrs_url, na = c(""))
regs <- readr::read_csv(regs_url, na = c(""))
rwys <- readr::read_csv(rwys_url, na = c(""))

apts <- readr::read_csv(apts_url, na = c("")) |>
  dplyr::filter(ident %in% moni) |>
  dplyr::left_join(ctrs, by = c("iso_country" = "code"), suffix = c("", "_country")) |>
  dplyr::left_join(regs, by = c("iso_region" = "code"), suffix = c("", "_region")) |>
  dplyr::select(
    id,
    ident,
    code_icao = ident,
    iata_code,
    type,
    name,
    latitude = latitude_deg,
    longitude = longitude_deg,
    elevation = elevation_ft,
    iso_country,
    name_country,
    iso_region,
    name_region,
    continent,
  ) |>
  dplyr::mutate(
    name_continent = dplyr::case_when(
      continent == "AF" ~ "Africa",
      continent == "AN" ~ "Antarctica",
      continent == "AS" ~ "Asia",
      continent == "EU" ~ "Europe",
      continent == "NA" ~ "North America",
      continent == "OC" ~ "Oceania",
      continent == "SA" ~ "South America",
      .default = NA_character_
    )
  )

apts |>
  readr::write_csv(stdout())

espinielli · 2024-02-18T16:14:51Z

A simpler R data loader is the following for retrieving Italian medium/large airports

library(dplyr)
library(readr)



apts_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/airports.csv'
ctrs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/countries.csv'
regs_url <- 'https://raw.githubusercontent.com/davidmegginson/ourairports-data/main/regions.csv'

ctrs <- readr::read_csv(ctrs_url, na = c(""))
regs <- readr::read_csv(regs_url, na = c(""))

apts <- readr::read_csv(apts_url, na = c("")) |>
  dplyr::left_join(ctrs, by = c("iso_country" = "code"), suffix = c("", "_country")) |>
  dplyr::left_join(regs, by = c("iso_region" = "code"), suffix = c("", "_region")) |>
  dplyr::filter(iso_country == "IT", type %in% c("medium_airport", "large_airport")) |>
  dplyr::select(
    id,
    ident,
    code_icao = ident,
    iata_code,
    type,
    name,
    latitude = latitude_deg,
    longitude = longitude_deg,
    elevation = elevation_ft,
    iso_country,
    name_country,
    iso_region,
    name_region,
    continent,
  ) |>
  dplyr::mutate(
    name_continent = dplyr::case_when(
      continent == "AF" ~ "Africa",
      continent == "AN" ~ "Antarctica",
      continent == "AS" ~ "Asia",
      continent == "EU" ~ "Europe",
      continent == "NA" ~ "North America",
      continent == "OC" ~ "Oceania",
      continent == "SA" ~ "South America",
      .default = NA_character_
    )
  )

apts |>
  readr::write_csv(stdout())

espinielli · 2024-02-18T16:17:02Z

The critical info for the documentation, at least for an R data loader, is where Rscript will be invoked from.

mbostock · 2024-02-18T19:39:59Z

Thanks for all the additional context! Where do you expect (or want) Rscript to be invoked from?

espinielli · 2024-02-19T08:07:33Z

The way it is done currently is ok, I think it should just be documented.

Data loader implementers will need to have this knowledge to eventually navigate their filesystem and retrieve complementary data as in my first example (reading a local file).

Fil · 2024-02-19T08:17:43Z

It might make more sense to cd to the docs root rather than stay in the code root?

espinielli · 2024-02-19T08:34:53Z

My first example using the here package in R and adding a dummy .here file in <observable framework project root>, does exactly that:
It marks the Observable Framework project root as the root for the here() function where to navigate the files from.
So

here("docs", "data", "monitored-airports.csv")

picks
<framework project root>/docs/data/monitored-airports.csv

mbostock · 2024-02-19T08:57:03Z

It might make more sense to cd to the docs root rather than stay in the code root?

No, I don’t think we should do that. (By analogy with JavaScript, dependencies are installed into node_modules at the project root, not within the source root.)

timbrock · 2024-02-19T12:21:04Z

The way it is done currently is ok, I think it should just be documented.

Data loader implementers will need to have this knowledge to eventually navigate their filesystem and retrieve complementary data as in my first example (reading a local file).

I've been having exactly the same issue this morning. So +1 for documentation of this :).

closes #853

* document data loaders' cwd and import.meta.url closes #853 * format code * Update loaders.md --------- Co-authored-by: Mike Bostock <mbostock@gmail.com>

mbostock added the documentation Improvements or additions to documentation label Feb 18, 2024

Fil added a commit that referenced this issue Mar 6, 2024

document data loaders' cwd and import.meta.url

125798f

closes #853

Fil mentioned this issue Mar 6, 2024

document data loaders' cwd and import.meta.url #996

Merged

mbostock closed this as completed in #996 Mar 6, 2024

mbostock added a commit that referenced this issue Mar 6, 2024

document data loaders' cwd and import.meta.url (#996)

69eed90

* document data loaders' cwd and import.meta.url closes #853 * format code * Update loaders.md --------- Co-authored-by: Mike Bostock <mbostock@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to debug a data loader #853

How to debug a data loader #853

espinielli commented Feb 17, 2024

Fil commented Feb 18, 2024

mbostock commented Feb 18, 2024

espinielli commented Feb 18, 2024 •

edited

Loading

espinielli commented Feb 18, 2024

espinielli commented Feb 18, 2024

mbostock commented Feb 18, 2024

espinielli commented Feb 19, 2024

Fil commented Feb 19, 2024

espinielli commented Feb 19, 2024

mbostock commented Feb 19, 2024

timbrock commented Feb 19, 2024

How to debug a data loader #853

How to debug a data loader #853

Comments

espinielli commented Feb 17, 2024

Fil commented Feb 18, 2024

mbostock commented Feb 18, 2024

espinielli commented Feb 18, 2024 • edited Loading

espinielli commented Feb 18, 2024

espinielli commented Feb 18, 2024

mbostock commented Feb 18, 2024

espinielli commented Feb 19, 2024

Fil commented Feb 19, 2024

espinielli commented Feb 19, 2024

mbostock commented Feb 19, 2024

timbrock commented Feb 19, 2024

espinielli commented Feb 18, 2024 •

edited

Loading