Add GET/PUT file ID/path handlers #12

davidbrochart · 2022-10-13T09:35:55Z

No description provided.

codecov-commenter · 2022-10-13T09:38:49Z

Codecov Report

Base: 81.98% // Head: 77.53% // Decreases project coverage by -4.45% ⚠️

Coverage data is based on head (4e47dcd) compared to base (0db749a).
Patch coverage: 50.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #12      +/-   ##
==========================================
- Coverage   81.98%   77.53%   -4.46%     
==========================================
  Files           4        5       +1     
  Lines         272      316      +44     
  Branches       41       45       +4     
==========================================
+ Hits          223      245      +22     
- Misses         37       59      +22     
  Partials       12       12

Impacted Files	Coverage Δ
jupyter_server_fileid/handlers.py	`48.78% <48.78%> (ø)`
jupyter_server_fileid/extension.py	`50.00% <66.66%> (+2.38%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

kevin-bates · 2022-10-13T14:13:14Z

jupyter_server_fileid/handlers.py

+    @web.authenticated
+    @authorized
+    async def get(self, path):
+        idx = self.settings["file_id_manager"].index(path)


Shouldn't this be:

Suggested change

idx = self.settings["file_id_manager"].index(path)

idx = self.settings["file_id_manager"].get_id(path)

otherwise, index() will add the path (assuming it can be stat'd) and return the newly created ID.

Are you planning on supporting the ability to fetch a file's path given its ID? This will be necessary for external references (e.g., comments) to locate their parent/host file.

It seems that get_id doesn't index the file automatically, while index does. So I guess we need the latter?

Are you planning on supporting the ability to fetch a file's path given its ID?

Maybe in another PR, but it's not needed for now in JupyterLab.

It seems that get_id doesn't index the file automatically, while index does. So I guess we need the latter?

I think so - otherwise you're potentially performing a transaction on a fetch request - which is contrary to REST best practices (AFAIU).

Maybe in another PR, but it's not needed for now in JupyterLab.

Ok - this should be fine. You won't be able to use a similar format of endpoint in that case as it will be ambiguous so I wasn't sure if supporting path-fetching will affect this endpoint. You could probably take an approach similar to kernel interrupt and restarts with an endpoint definition of something like api/fileid/{id,path}/(.*) although I'm sure there are more elegant approaches. My comment was more about not impacting what is essentially "set in stone" when fetching paths is required - which will be the case.

I think so - otherwise you're potentially performing a transaction on a fetch request - which is contrary to REST best practices (AFAIU).

I think you misunderstood me, we need index which might perform a transaction.

with an endpoint definition of something like api/fileid/{id,path}/(.*)

I think you're right, we need api/fileid/id/(.*) here, and we'll need api/fileid/path/(.*) when we want to get the path corresponding to a given ID.

Agree that we will need both the path and id variant. @dlqqq can you help us understand which underlying methods to call?

I think Kevin's concern is that a GET request shouldn't be creating a new resource under RESTful semantics. Indexing creates a file ID "resource" and is also an idempotent operation, so under RESTful semantics, it should be called via PUT.

An API that conforms strictly to RESTful semantics might look something like this:

GET /api/fileid/files/{path or id} => calls get_path() if given ID, or get_id() if given path PUT /api/fileid/ids => index a file at a given path (provided in request payload or URI query string)

An API compatible with @davidbrochart's suggestion might look something like this:

GET /api/fileid/{path or id} => index a file if given path, or get_path() if given ID

Personally, I'm inclined towards Kevin's suggestion as it allows REST API users to distinguish between calling get_id() and index(), which is useful in scenarios where a user wants to check whether a file was indexed previously but doesn't want to index it now.

The obvious issue with putting the path or ID in the request URL rather than as a query string is that it's not possible for the backend to distinguish whether you're talking about a file with the name of a UUID located in the server root or if you're talking about a file ID. So regardless of which high-level API design we settle on, we should probably modify the GET requests to explicitly specify whether they're passing in a path or an ID via ?path=... or ?id=....

On the other hand, from the client point of view, GETting a file ID seems like the right HTTP verb. The server creating a new resource is an implementation detail. For instance, if all files under a root directory were indexed at startup and then tracked for any change (renaming, etc.), then there would be no need for indexing in this GET request. The client doesn't ask for a new resource, it asks for a property of a file.
I don't have any preference as to where to specify what we request (an ID for a path or a path for an ID). It could be in the URL path:

GET /api/fileid/id/{path}: get an ID for a path GET /api/fileid/path/{id}: get a path for an ID

or as a query parameter:

GET /api/fileid?path={path}: get an ID for a path GET /api/fileid?id={id}: get a path for an ID

The latter is slightly more complicated as we need to check that only one parameter is passed, and the returned schema could theoretically be different for a path or an ID, but it's not a big deal.

kevin-bates · 2022-10-14T14:58:57Z

@dlqqq is correct regarding my concern about RESTful API semantics. As David Q points out, users should be able to determine that a given path is not currently indexed, so, in those cases, GET should return some indication (404 perhaps?) if the path is not indexed. Users would then turn around and issue a PUT if they wanted to create the index (which should probably return a 404 if the path doesn't exist). Since David B knows he wants to index the path if it's not indexed already, then PUT (with its idempotent semantics) is the correct approach for that (IMO).

Regarding the endpoints, I believe Jupyter, thus far, has tried to not introduce parameters and instead used "determinators" embedded into the endpoint. This would argue for api/fileid/{id,path}/{path-value, id-value} where the "determinator" of id indicates that the target "resource" is an ID and the remaining portion of the endpoint is the path, while a "determinator" of path indicates the opposite (resource is a PATH with remaining portion the ID).

If there's a desire to keep the implementation to a single handler, you could then have PUT api/fileid/id/my/path/to/my/file indicate that an ID is created relative to my/path/to/my/file and, due to its idempotent semantic, will return the existing ID should it already be indexed. The analogous PUT api/fileid/path/<uuid> would probably return a 501 (not implemented) since the FileID service doesn't allow for BYOID (currently). (If using UUID as a PK, we could support this endpoint, which would require a body to provide the file path.)

davidbrochart · 2022-10-14T15:48:20Z

users should be able to determine that a given path is not currently indexed

I have difficulty understanding why they would want to know that, and if it should be allowed. For me an ID is like a property of the file, just like its size.

This would argue for api/fileid/{id,path}/{path-value, id-value} where the "determinator" of id indicates that the target "resource" is an ID and the remaining portion of the endpoint is the path

I'm fine with that too.

ellisonbg · 2022-10-14T15:57:07Z

Great discussion! The only other point is that the most important property of a GET call is that it is idempotent. I believe we could have GET create the id the first time it is called for a path and continue to return that same id in future calls while having that property. The only case it doesn't cover is if the client wants to know if the path has been assigned an id or not. If we believe that a client will need that information, then the dual GET/PUT API makes sense.

…

On Fri, Oct 14, 2022 at 8:48 AM David Brochart ***@***.***> wrote: users should be able to determine that a given path is not currently indexed I have difficulty understanding why they would want to know that, and if it should be allowed. For me an ID is like a property of the file, just like its size. This would argue for api/fileid/{id,path}/{path-value, id-value} where the "determinator" of id indicates that the target "resource" is an ID and the remaining portion of the endpoint is the path I'm fine with that too. — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUFWWZZYOOCUFOFXL6DWDF6E7ANCNFSM6AAAAAARECUJ6Y> . You are receiving this because you commented.Message ID: ***@***.***>

-- Brian E. Granger Senior Principal Technologist, AWS AI/ML ***@***.***) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

kevin-bates · 2022-10-14T17:48:05Z

The only case it
doesn't cover is if the client wants to know if the path has been assigned
an id or not. If we believe that a client will need that information, then
the dual GET/PUT API makes sense.

If we're ever going to want to allow BYOID (which UUIDs enable - in addition to global uniqueness) we need GET to indicate a given path is not currently indexed. PUT would then be used for cases where the user needs the ID and doesn't want to provide it (i.e., let the service generate the ID), while POST could be used to associate a given ID to a path.

However, getting down to brass tacks, GET is defined to be both idempotent and safe and if GET requests can change the state of the server, they are not "safe". Using PUT we'd return a 201 when the index is created and 200 when it already exists - just as the spec indicates (and 404 if the path doesn't exist). This is the precise behavior David B wants and we maintain appropriate semantics - win, win.

ellisonbg · 2022-10-14T17:56:21Z

That is a subtle point about us being able to use `PUT` for everything and indicate the additional information using status codes. Very nice - I am completely on board with that.

…

On Fri, Oct 14, 2022 at 10:48 AM Kevin Bates ***@***.***> wrote: The only case it doesn't cover is if the client wants to know if the path has been assigned an id or not. If we believe that a client will need that information, then the dual GET/PUT API makes sense. If we're ever going to want to allow BYOID (which UUIDs enable - in addition to global uniqueness) we need GET to indicate a given path is not currently indexed. PUT would then be used for cases where the user needs the ID and doesn't want to provide it (i.e., let the service generate the ID), while POST could be used to associate a *given* ID to a path. However, getting down to brass tacks <https://www.rfc-editor.org/rfc/rfc9110#name-methods>, GET <https://www.rfc-editor.org/rfc/rfc9110#name-get> is defined to be both *idempotent* <https://www.rfc-editor.org/rfc/rfc9110#name-idempotent-methods> and *safe* <https://www.rfc-editor.org/rfc/rfc9110#name-safe-methods> and if GET requests can change the state of the server, they are not "safe". Using PUT <https://www.rfc-editor.org/rfc/rfc9110#name-put> we'd return a 201 when the index is created and 200 when it already exists - just as the spec indicates (and 404 if the path doesn't exist). This is the precise behavior David B wants and we maintain appropriate semantics - win, win. — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUH6RWO5FF5EKYDKSZDWDGMGBANCNFSM6AAAAAARECUJ6Y> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Brian E. Granger Senior Principal Technologist, AWS AI/ML ***@***.***) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

davidbrochart · 2022-10-14T19:29:34Z

It seems strange to me to use PUT to actually get something, but I'm definitely not an expert in requests semantics, so I trust you guys.

for more information, see https://pre-commit.ci

kevin-bates

These changes look good to me.

Should extensions specify/expose their own Open API (swagger) docs? If so, that can be added later - although I'm not really finding precedent for this at the moment.

ellisonbg · 2022-10-17T17:04:28Z

Question, why `PUT` and not `POST? Isn't `POST` used in situation where the server is responsible for assigning an identifier?

…

On Mon, Oct 17, 2022 at 9:37 AM Kevin Bates ***@***.***> wrote: ***@***.**** approved this pull request. These changes look good to me. Should extensions specify/expose their own Open API (swagger) docs? If so, that can be added later - although I'm not really finding precedent for this at the moment. — Reply to this email directly, view it on GitHub <#12 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUDPERUWN6FW466RXOLWDV6GNANCNFSM6AAAAAARECUJ6Y> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

-- Brian E. Granger Senior Principal Technologist, AWS AI/ML ***@***.***) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

dlqqq

@kevin-bates @davidbrochart Thank you two for resolving your concerns together while I was away! I love this new REST API. It balances readability with usability.

It seems strange to me to use PUT to actually get something, but I'm definitely not an expert in requests semantics, so I trust you guys.

Again, the concern is that index() has the potential to write to the database, which a GET request shouldn't do. In a way, all requests are "getting" something from the backend, be it a JSON payload or status code. GET should be reserved for requests that involve exclusively database reads, when we're talking about REST APIs for backends backed by a database. Of course the semantics here get muddled if every request writes telemetry/logging data, but generally we don't count that. 😉

Question, why PUT and not POST? Isn't POST` used in situation where the
server is responsible for assigning an identifier?

@ellisonbg PUT is essentially POST with an idempotency constraint. PUT is to POST as "upsert" is to "insert". For more, see: https://www.rfc-editor.org/rfc/rfc2616#section-9.1.2

dlqqq · 2022-10-17T17:13:47Z

@davidbrochart Feel free to merge if this PR is complete.

ellisonbg · 2022-10-17T17:19:28Z

Thanks! I wasn't aware of the idempotency constraint of PUT.

…

On Mon, Oct 17, 2022 at 10:13 AM david qiu ***@***.***> wrote: @davidbrochart <https://github.com/davidbrochart> Feel free to merge if this PR is complete. — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUDTE7URZCPLSZRS6BLWDWCNLANCNFSM6AAAAAARECUJ6Y> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Brian E. Granger Senior Principal Technologist, AWS AI/ML ***@***.***) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

davidbrochart · 2022-10-17T17:55:05Z

Again, the concern is that index() has the potential to write to the database, which a GET request shouldn't do.

As I said, it's an implementation detail. When I implement this feature in jupyverse, I will probably use watchfiles to detect file changes and automatically index them before any request is made. So I could have used a GET because the file will always be indexed before getting its ID, hence no write to the database inside the GET. That is, as Kevin said, if we don't allow BYOID.
You could argue that there might be a small amount of time when the client requests an ID for a file that "is going to be indexed", because the GET request and the file change detection are parallel and independent tasks. But why would the client ask an ID for a file that it doesn't even know exists (because not detected yet) in the first place? You could say that it's because the client thought it existed and the file was swapped with another one before the GET request. Fair enough, but then the client has a problem, because it's going to open the wrong file. Maybe we should just return a 404 when the file is not yet indexed.
Anyway, all this seems quite arbitrary to me, but if everyone is fine with this API, I'll merge 😄

kevin-bates · 2022-10-17T18:39:11Z

I totally see where you're coming from David. This discussion is primarily about the discipline required to adhere to a spec otherwise, what good is a spec? I like to think of REST verbs mapped to CRUD operations and GET maps to Read - where no transaction will be created. Indexing, on the other hand, can resort in a transaction. If one doesn't already exist, one will be created, etc. PUT, POST, and DELETE all require transactions. (This is all coming from years of Microsoft DCOM and J2EE EJB distributed transactions.)

Regardless of the implementation, clients wanting the ID of a given path need to first attempt GET - which will succeed 99% of the time in both implementations (but may not in others that might choose an on-demand approach to indexing), then fallback to PUT when GET returns 404.

davidbrochart · 2022-10-17T19:34:03Z

Thanks everyone, merging!

davidbrochart · 2022-10-17T19:36:59Z

@dlqqq Would you mind adding davidbrochart as a maintainer on https://pypi.org/project/jupyter-server-fileid ?

dlqqq · 2022-10-17T20:03:08Z

Added both @davidbrochart and @Zsailer as maintainers.

davidbrochart · 2022-10-17T20:06:10Z

Thanks!

davidbrochart mentioned this pull request Oct 13, 2022

Handle file IDs jupyterlab/jupyterlab#13233

Closed

kevin-bates reviewed Oct 13, 2022

View reviewed changes

davidbrochart and others added 2 commits October 17, 2022 08:45

Add GET file ID handler

9f65ff3

[pre-commit.ci] auto fixes from pre-commit.com hooks

e69ed2f

for more information, see https://pre-commit.ci

davidbrochart force-pushed the handler branch 2 times, most recently from 89165cc to 16cd786 Compare October 17, 2022 07:32

Review

bdb0593

davidbrochart force-pushed the handler branch from 4e47dcd to bdb0593 Compare October 17, 2022 07:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

40418c8

for more information, see https://pre-commit.ci

davidbrochart mentioned this pull request Oct 17, 2022

Use file ID jupyterlab/jupyterlab#13246

Merged

kevin-bates approved these changes Oct 17, 2022

View reviewed changes

dlqqq approved these changes Oct 17, 2022

View reviewed changes

davidbrochart changed the title ~~Add GET file ID handler~~ Add GET/PUT file ID/path handlers Oct 17, 2022

davidbrochart merged commit 7ec1f95 into jupyter-server:main Oct 17, 2022

davidbrochart deleted the handler branch October 17, 2022 19:35

kevin-bates mentioned this pull request Oct 19, 2022

add boolean sync argument to get_path() #20

Merged

Zsailer mentioned this pull request Oct 20, 2022

Meeting Notes 2022 jupyter-server/team-compass#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GET/PUT file ID/path handlers #12

Add GET/PUT file ID/path handlers #12

davidbrochart commented Oct 13, 2022

codecov-commenter commented Oct 13, 2022 •

edited

Loading

kevin-bates Oct 13, 2022

davidbrochart Oct 13, 2022

davidbrochart Oct 13, 2022

kevin-bates Oct 13, 2022

davidbrochart Oct 13, 2022

ellisonbg Oct 13, 2022

dlqqq Oct 13, 2022

dlqqq Oct 13, 2022

davidbrochart Oct 14, 2022

kevin-bates commented Oct 14, 2022

davidbrochart commented Oct 14, 2022

ellisonbg commented Oct 14, 2022 via email

kevin-bates commented Oct 14, 2022

ellisonbg commented Oct 14, 2022 via email

davidbrochart commented Oct 14, 2022

kevin-bates left a comment

ellisonbg commented Oct 17, 2022 via email

dlqqq left a comment

dlqqq commented Oct 17, 2022

ellisonbg commented Oct 17, 2022 via email

davidbrochart commented Oct 17, 2022

kevin-bates commented Oct 17, 2022

davidbrochart commented Oct 17, 2022

davidbrochart commented Oct 17, 2022

dlqqq commented Oct 17, 2022

davidbrochart commented Oct 17, 2022

	idx = self.settings["file_id_manager"].index(path)
	idx = self.settings["file_id_manager"].get_id(path)

Add GET/PUT file ID/path handlers #12

Add GET/PUT file ID/path handlers #12

Conversation

davidbrochart commented Oct 13, 2022

codecov-commenter commented Oct 13, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin-bates commented Oct 14, 2022

davidbrochart commented Oct 14, 2022

ellisonbg commented Oct 14, 2022 via email

kevin-bates commented Oct 14, 2022

ellisonbg commented Oct 14, 2022 via email

davidbrochart commented Oct 14, 2022

kevin-bates left a comment

Choose a reason for hiding this comment

ellisonbg commented Oct 17, 2022 via email

dlqqq left a comment

Choose a reason for hiding this comment

dlqqq commented Oct 17, 2022

ellisonbg commented Oct 17, 2022 via email

davidbrochart commented Oct 17, 2022

kevin-bates commented Oct 17, 2022

davidbrochart commented Oct 17, 2022

davidbrochart commented Oct 17, 2022

dlqqq commented Oct 17, 2022

davidbrochart commented Oct 17, 2022

codecov-commenter commented Oct 13, 2022 •

edited

Loading