-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GET/PUT file ID/path handlers #12
Conversation
Codecov ReportBase: 81.98% // Head: 77.53% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #12 +/- ##
==========================================
- Coverage 81.98% 77.53% -4.46%
==========================================
Files 4 5 +1
Lines 272 316 +44
Branches 41 45 +4
==========================================
+ Hits 223 245 +22
- Misses 37 59 +22
Partials 12 12
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
jupyter_server_fileid/handlers.py
Outdated
@web.authenticated | ||
@authorized | ||
async def get(self, path): | ||
idx = self.settings["file_id_manager"].index(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be:
idx = self.settings["file_id_manager"].index(path) | |
idx = self.settings["file_id_manager"].get_id(path) |
otherwise, index()
will add the path (assuming it can be stat
'd) and return the newly created ID.
Are you planning on supporting the ability to fetch a file's path given its ID? This will be necessary for external references (e.g., comments) to locate their parent/host file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that get_id
doesn't index the file automatically, while index
does. So I guess we need the latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on supporting the ability to fetch a file's path given its ID?
Maybe in another PR, but it's not needed for now in JupyterLab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that get_id doesn't index the file automatically, while index does. So I guess we need the latter?
I think so - otherwise you're potentially performing a transaction on a fetch request - which is contrary to REST best practices (AFAIU).
Maybe in another PR, but it's not needed for now in JupyterLab.
Ok - this should be fine. You won't be able to use a similar format of endpoint in that case as it will be ambiguous so I wasn't sure if supporting path-fetching will affect this endpoint. You could probably take an approach similar to kernel interrupt and restarts with an endpoint definition of something like api/fileid/{id,path}/(.*)
although I'm sure there are more elegant approaches. My comment was more about not impacting what is essentially "set in stone" when fetching paths is required - which will be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so - otherwise you're potentially performing a transaction on a fetch request - which is contrary to REST best practices (AFAIU).
I think you misunderstood me, we need index
which might perform a transaction.
with an endpoint definition of something like
api/fileid/{id,path}/(.*)
I think you're right, we need api/fileid/id/(.*)
here, and we'll need api/fileid/path/(.*)
when we want to get the path corresponding to a given ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we will need both the path and id variant. @dlqqq can you help us understand which underlying methods to call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kevin's concern is that a GET request shouldn't be creating a new resource under RESTful semantics. Indexing creates a file ID "resource" and is also an idempotent operation, so under RESTful semantics, it should be called via PUT.
An API that conforms strictly to RESTful semantics might look something like this:
GET /api/fileid/files/{path or id} => calls get_path() if given ID, or get_id() if given path
PUT /api/fileid/ids => index a file at a given path (provided in request payload or URI query string)
An API compatible with @davidbrochart's suggestion might look something like this:
GET /api/fileid/{path or id} => index a file if given path, or get_path() if given ID
Personally, I'm inclined towards Kevin's suggestion as it allows REST API users to distinguish between calling get_id()
and index()
, which is useful in scenarios where a user wants to check whether a file was indexed previously but doesn't want to index it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The obvious issue with putting the path or ID in the request URL rather than as a query string is that it's not possible for the backend to distinguish whether you're talking about a file with the name of a UUID located in the server root or if you're talking about a file ID. So regardless of which high-level API design we settle on, we should probably modify the GET requests to explicitly specify whether they're passing in a path or an ID via ?path=...
or ?id=...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, from the client point of view, GETting a file ID seems like the right HTTP verb. The server creating a new resource is an implementation detail. For instance, if all files under a root directory were indexed at startup and then tracked for any change (renaming, etc.), then there would be no need for indexing in this GET request. The client doesn't ask for a new resource, it asks for a property of a file.
I don't have any preference as to where to specify what we request (an ID for a path or a path for an ID). It could be in the URL path:
GET /api/fileid/id/{path}: get an ID for a path
GET /api/fileid/path/{id}: get a path for an ID
or as a query parameter:
GET /api/fileid?path={path}: get an ID for a path
GET /api/fileid?id={id}: get a path for an ID
The latter is slightly more complicated as we need to check that only one parameter is passed, and the returned schema could theoretically be different for a path or an ID, but it's not a big deal.
@dlqqq is correct regarding my concern about RESTful API semantics. As David Q points out, users should be able to determine that a given path is not currently indexed, so, in those cases, Regarding the endpoints, I believe Jupyter, thus far, has tried to not introduce parameters and instead used "determinators" embedded into the endpoint. This would argue for If there's a desire to keep the implementation to a single handler, you could then have |
I have difficulty understanding why they would want to know that, and if it should be allowed. For me an ID is like a property of the file, just like its size.
I'm fine with that too. |
Great discussion! The only other point is that the most important property
of a GET call is that it is idempotent. I believe we could have GET create
the id the first time it is called for a path and continue to return that
same id in future calls while having that property. The only case it
doesn't cover is if the client wants to know if the path has been assigned
an id or not. If we believe that a client will need that information, then
the dual GET/PUT API makes sense.
…On Fri, Oct 14, 2022 at 8:48 AM David Brochart ***@***.***> wrote:
users should be able to determine that a given path is not currently
indexed
I have difficulty understanding why they would want to know that, and if
it should be allowed. For me an ID is like a property of the file, just
like its size.
This would argue for api/fileid/{id,path}/{path-value, id-value} where
the "determinator" of id indicates that the target "resource" is an ID
and the remaining portion of the endpoint is the path
I'm fine with that too.
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGXUFWWZZYOOCUFOFXL6DWDF6E7ANCNFSM6AAAAAARECUJ6Y>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Brian E. Granger
Senior Principal Technologist, AWS AI/ML ***@***.***)
On Leave - Professor of Physics and Data Science, Cal Poly
@ellisonbg on GitHub
|
If we're ever going to want to allow BYOID (which UUIDs enable - in addition to global uniqueness) we need However, getting down to brass tacks, |
That is a subtle point about us being able to use `PUT` for everything and
indicate the additional information using status codes. Very nice - I am
completely on board with that.
…On Fri, Oct 14, 2022 at 10:48 AM Kevin Bates ***@***.***> wrote:
The only case it
doesn't cover is if the client wants to know if the path has been assigned
an id or not. If we believe that a client will need that information, then
the dual GET/PUT API makes sense.
If we're ever going to want to allow BYOID (which UUIDs enable - in
addition to global uniqueness) we need GET to indicate a given path is
not currently indexed. PUT would then be used for cases where the user
needs the ID and doesn't want to provide it (i.e., let the service generate
the ID), while POST could be used to associate a *given* ID to a path.
However, getting down to brass tacks
<https://www.rfc-editor.org/rfc/rfc9110#name-methods>, GET
<https://www.rfc-editor.org/rfc/rfc9110#name-get> is defined to be both
*idempotent*
<https://www.rfc-editor.org/rfc/rfc9110#name-idempotent-methods> and
*safe* <https://www.rfc-editor.org/rfc/rfc9110#name-safe-methods> and if
GET requests can change the state of the server, they are not "safe".
Using PUT <https://www.rfc-editor.org/rfc/rfc9110#name-put> we'd return a
201 when the index is created and 200 when it already exists - just as
the spec indicates (and 404 if the path doesn't exist). This is the
precise behavior David B wants and we maintain appropriate semantics - win,
win.
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGXUH6RWO5FF5EKYDKSZDWDGMGBANCNFSM6AAAAAARECUJ6Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Brian E. Granger
Senior Principal Technologist, AWS AI/ML ***@***.***)
On Leave - Professor of Physics and Data Science, Cal Poly
@ellisonbg on GitHub
|
It seems strange to me to use |
for more information, see https://pre-commit.ci
89165cc
to
16cd786
Compare
4e47dcd
to
bdb0593
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me.
Should extensions specify/expose their own Open API (swagger) docs? If so, that can be added later - although I'm not really finding precedent for this at the moment.
Question, why `PUT` and not `POST? Isn't `POST` used in situation where the
server is responsible for assigning an identifier?
…On Mon, Oct 17, 2022 at 9:37 AM Kevin Bates ***@***.***> wrote:
***@***.**** approved this pull request.
These changes look good to me.
Should extensions specify/expose their own Open API (swagger) docs? If so,
that can be added later - although I'm not really finding precedent for
this at the moment.
—
Reply to this email directly, view it on GitHub
<#12 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGXUDPERUWN6FW466RXOLWDV6GNANCNFSM6AAAAAARECUJ6Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
Brian E. Granger
Senior Principal Technologist, AWS AI/ML ***@***.***)
On Leave - Professor of Physics and Data Science, Cal Poly
@ellisonbg on GitHub
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin-bates @davidbrochart Thank you two for resolving your concerns together while I was away! I love this new REST API. It balances readability with usability.
It seems strange to me to use PUT to actually get something, but I'm definitely not an expert in requests semantics, so I trust you guys.
Again, the concern is that index()
has the potential to write to the database, which a GET request shouldn't do. In a way, all requests are "getting" something from the backend, be it a JSON payload or status code. GET should be reserved for requests that involve exclusively database reads, when we're talking about REST APIs for backends backed by a database. Of course the semantics here get muddled if every request writes telemetry/logging data, but generally we don't count that. 😉
Question, why
PUT
and notPOST? Isn't
POST` used in situation where the
server is responsible for assigning an identifier?
@ellisonbg PUT is essentially POST with an idempotency constraint. PUT is to POST as "upsert" is to "insert". For more, see: https://www.rfc-editor.org/rfc/rfc2616#section-9.1.2
@davidbrochart Feel free to merge if this PR is complete. |
Thanks! I wasn't aware of the idempotency constraint of PUT.
…On Mon, Oct 17, 2022 at 10:13 AM david qiu ***@***.***> wrote:
@davidbrochart <https://github.com/davidbrochart> Feel free to merge if
this PR is complete.
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGXUDTE7URZCPLSZRS6BLWDWCNLANCNFSM6AAAAAARECUJ6Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Brian E. Granger
Senior Principal Technologist, AWS AI/ML ***@***.***)
On Leave - Professor of Physics and Data Science, Cal Poly
@ellisonbg on GitHub
|
As I said, it's an implementation detail. When I implement this feature in jupyverse, I will probably use watchfiles to detect file changes and automatically index them before any request is made. So I could have used a |
I totally see where you're coming from David. This discussion is primarily about the discipline required to adhere to a spec otherwise, what good is a spec? I like to think of REST verbs mapped to CRUD operations and Regardless of the implementation, clients wanting the ID of a given path need to first attempt |
Thanks everyone, merging! |
@dlqqq Would you mind adding |
Added both @davidbrochart and @Zsailer as maintainers. |
Thanks! |
No description provided.