feat: add `get_log(job)` #356

toph-allen · 2024-12-30T21:26:52Z

Intent

The initial intent of this PR was to move get_job() to use the v1 API.

However, that endpoint returns the exact same data as get_jobs(), so it seems kinda redundant. Is there any point at which you'll have a job key where you haven't called get_jobs()? Not right now, I don't think.

The old unversioned get_job() function also returned the log for the job, and this seems to be the main use case for the singular function, so I instead added a new function, get_log(), which gets the job log.

Fixes #341

Approach

~~The new function is get_log(content_item, key). It's a little clunky to use, because the API requires the content GUID as well as the job key.~~

An alternate approach would be to add a Job R6 class. This would also probably also require a different method / signature for things like terminate_jobs(), because you'd wanna keep the existing implementation but also allow you to call terminate_jobs(job_class) (although that plural function name suggests that it takes a list of jobs, so maybe it would just call for a terminate_job(job_class_object) function.

Adds a get_job_list(content) function which returns a list of jobs for a content item. Each job is just a list containing the API output, augmented with the app_guid (which is required for subsequent job requests but not contained in the data returned) and the Connect client.
Adds a get_log(job) function which returns the job log for the job object as represented by that list item.

client <- connect()
content <- content_item(CONTENT_GUID)
jobs <- get_job_list(content)
log <- get_log(jobs[[1]])

Checklist

Does this change update NEWS.md (referencing the connected issue if necessary)?
Does this change need documentation? Have you run devtools::document()?

jonkeane

Thanks, a few questions inline.

An alternate approach would be to add a Job R6 class. This would also probably also require a different method / signature for things like terminate_jobs(), because you'd wanna keep the existing implementation but also allow you to call terminate_jobs(job_class) (although that plural function name suggests that it takes a list of jobs, so maybe it would just call for a terminate_job(job_class_object) function.

Could you say more about this? What would the benefits of this approach be? What about the downsides? It might also be helpful here to write the code that someone using this would write if this were the case and compare it to the code that one would right with this PR as it stands.

R/parse.R

R/content.R

toph-allen · 2025-01-02T16:18:20Z

@jonkeane Can I get a re-review when you have a sec?

R/content.R

jonkeane

Thanks for this. The list-based approach looks interesting and a promising direction to try out.

One question I have is how much should we flag this tension / split to folks, especially (also commented below in the right place in the code):

Is one able to use get_job_log() with the output of get_jobs()? I think the answer is no — that's ok, but we should probably flag that.

Having two separate paths a list-based one and a data.frame-based one is totally fine, but if we do that we should embrace it + document it.

tests/testthat/test-content.R

jonkeane · 2025-01-10T21:32:07Z

R/content.R

+#' @return
+#'
+#' - `get_jobs()`: A data frame with a row representing each job.
+#' - `get_job_list()`: A list with each element representing a job.


If we include this, we should do it elsewhere, but should we talk a little bit about this difference in the documentation? Where to use one or the other?

It would be reasonable to talk about this elsewhere. This might be a good place to talk about it, or it might make sense in the "details" section of this documentation.

I think we also might want to add a terminate_job() function that takes a single job object, analogous to get_job_log(). The current terminate_jobs() function takes a content item and an optional job key.

Aside — the reason I haven't used get_log() for this function is because Connect has multiple types of logs (e.g. audit, server) and

I wrote a paragraph in the Details section of this functions documentation to get at the distinction. When you say "we should do it elsewhere" did you mean "outside of the function documentation" or "outside of the @return section"?

I want to add a terminate_job() function in a small follow-on PR that also takes job objects. That's another example to add to this documentation that helps to get at the distinction. The current terminate_jobs() function is built more for the data frame paradigm.

jonkeane · 2025-01-10T21:33:34Z

R/content.R

+#' \dontrun{
+#' client <- connect()
+#' item <- content_item(client, "951bf3ad-82d0-4bca-bba8-9b27e35c49fa")
+#' jobs <- get_job_list(item)
+#' log <- get_job_log(jobs[[1]])
+#' }


Is one able to use get_job_log() with the output of get_jobs()? I think the answer is no — that's ok, but we should probably flag that.

The docs already already mention this, in the text for the job param to get_job_log(). You commented here — does that indicate you think there should be, like, a comment in the code block here?

Ah no, I see it now. I missed it on first read. Maybe that means it should be highlighted more (especially given the intentional split we are making here), or maybe it means I need better glasses. I'll leave it up to you if you want to call that out more highly (like in the body text of help).

FWIW, I commented here because it's where I thought about it looking at the examples and missed it in the argument which would have been a better place to comment it on. I might be reading slightly too much into word choice, but I agree with the (subtle) skepticism I'm reading about it being in a comment in the example as a way to highlight that.

I made a minor change to the documentation to mention this in more places and make it slightly more prominent. I believe these were the only issues you had in your review — as such I requested re-review.

But I think this gets at the major sticking point for me with the "it's just a list" approach — which is that there is no good way to go from the data frame returned by get_jobs() to the object needed by get_job_log(). Without this, I think it's likely that we haven't fully thought through our users' workflows, and are building something that looks good in minimal examples but doesn't provide good ergonomics beyond that.

I'm fine merging this PR as is — documenting that limitation — I think it's best to move forward here. But this is something that I've been thinking about e.g. in #305.

One thought I had as a stepping stone in that direction, which almost works with what connectapi currently has, is a job() function that takes a job key and returns the server data (the stand-in for a "job" class).

The workflow I have in mind is:

client <- connect() client |> content_item(GUID) |> get_jobs() |> # data frame slice_max(end_date) |> # get the most recent pull(key) |> job() |> # construct a job object — in this case just a list get_job_log()

But job() would need the client, and that's so far always the first argument, so this workflow would need to break that paradigm. This is why I've been thinking about #359 (default client as a trailing arg). This would require a surplus HTTP request but that seems fine.

Another option that might actually work is just piping a single-row data frame to as.list() — except that the data frame lacks a list-column of client objects, i.e. it's not an exact match for the data from get_job_list().

client <- connect() client |> content_item(GUID) |> get_jobs() |> # data frame slice_max(end_date) |> # get the most recent as.list() |> # wouldn't actually work right now get_job_log()

R/content.R

toph-allen · 2025-01-12T17:04:58Z

R/content.R

@@ -747,6 +769,51 @@ terminate_jobs <- function(content, keys = NULL) {
  res_df
 }

+#' @rdname get_jobs
+get_job_list <- function(content) {


@jonkeane Thinking it would be better to just call this jobs(), which would align it with #305.

In other words — if a function returning a list-like collection of a Connect server objects is an evolutionary transition fossil on the way to an API like that described in #305, we should name it like that.

That sounds like a great idea

I actually misremembered the proposal when I was writing that comment on Sunday. The actual resource class is not the thing which could be accessed in list-like ways. That proposal uses a function get_all() (called on the resource class) to get a list of objects.

I think I'd actually want to avoid using a jobs() function name to leave it unoccupied for a resource class, should one be implemented. I was trying to think about whether it's reasonable to have a resource class that could be addressed like a list, but that's out of scope.

In practice, having a function called jobs() that returns a list and a function called get_jobs() that returns a database is less intuitive than frustratingly wordy function names.

toph-allen · 2025-01-13T23:53:42Z

@jonkeane I made some minor documentation changes responding to your review, but git pull/push operations are not working (on any repo) and I'm going down a rabbit hole trying to fix. :|

[edit] https://www.githubstatus.com/incidents/qd96yfgvmcf9

toph-allen · 2025-01-14T21:12:41Z

Merging without approval after checking in with @jonkeane.

toph-allen added 6 commits December 18, 2024 16:31

wip

2c4ff43

Merge branch 'main' into toph/get-job-singular-v1

148ea78

update timestamp parsing to handle fractional seconds

6c00d3c

add get_job_log() function

43f7de5

add documentation

6d2e23d

update NEWS

7bbaf91

toph-allen requested review from schloerke and jonkeane and removed request for schloerke December 30, 2024 21:26

fix silly copy-paste typo

95c178e

jonkeane reviewed Dec 31, 2024

View reviewed changes

R/parse.R Show resolved Hide resolved

R/content.R Outdated Show resolved Hide resolved

add examples for jobs functions

5655e57

toph-allen requested a review from jonkeane December 31, 2024 16:23

add support for max log lines query string param

911b076

toph-allen requested a review from schloerke January 2, 2025 16:13

jonkeane reviewed Jan 2, 2025

View reviewed changes

R/content.R Outdated Show resolved Hide resolved

add app_guid to jobs results

b8f3be0

toph-allen mentioned this pull request Jan 7, 2025

Proposal: Make client optional, with an implicit or explicitly-set default value. #359

Open

add test for get_job_list()

2cd864b

toph-allen requested a review from jonkeane January 9, 2025 22:35

update news and documentation

d0e5eec

jonkeane reviewed Jan 10, 2025

View reviewed changes

R/content.R Outdated Show resolved Hide resolved

change

c30338b

jonkeane reviewed Jan 10, 2025

View reviewed changes

toph-allen commented Jan 12, 2025

View reviewed changes

toph-allen mentioned this pull request Jan 13, 2025

RFC: A new user experience for connectapi #305

Open

respond to comments

32b9fc9

toph-allen requested a review from jonkeane January 14, 2025 14:06

rename get_job_log to get_log

46951ad

toph-allen changed the title ~~feat: add get_job_log()~~ feat: add get_log(job) Jan 14, 2025

toph-allen merged commit 5bf7f59 into main Jan 14, 2025
19 checks passed

toph-allen deleted the toph/get-job-singular-v1 branch January 14, 2025 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `get_log(job)` #356

feat: add `get_log(job)` #356

toph-allen commented Dec 30, 2024 •

edited

Loading

jonkeane left a comment

toph-allen commented Jan 2, 2025

jonkeane left a comment

jonkeane Jan 10, 2025

toph-allen Jan 10, 2025 •

edited

Loading

toph-allen Jan 13, 2025

jonkeane Jan 10, 2025

toph-allen Jan 13, 2025 •

edited

Loading

jonkeane Jan 14, 2025

toph-allen Jan 14, 2025

toph-allen Jan 12, 2025

jonkeane Jan 13, 2025

toph-allen Jan 13, 2025

toph-allen Jan 13, 2025

toph-allen commented Jan 13, 2025 •

edited

Loading

toph-allen commented Jan 14, 2025

feat: add get_log(job) #356

feat: add get_log(job) #356

Conversation

toph-allen commented Dec 30, 2024 • edited Loading

Intent

Approach

Checklist

jonkeane left a comment

Choose a reason for hiding this comment

toph-allen commented Jan 2, 2025

jonkeane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toph-allen Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toph-allen Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toph-allen commented Jan 13, 2025 • edited Loading

toph-allen commented Jan 14, 2025

feat: add `get_log(job)` #356

feat: add `get_log(job)` #356

toph-allen commented Dec 30, 2024 •

edited

Loading

toph-allen Jan 10, 2025 •

edited

Loading

toph-allen Jan 13, 2025 •

edited

Loading

toph-allen commented Jan 13, 2025 •

edited

Loading