Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend extractors with an optional seed per endpoint. #126

Closed
wants to merge 5 commits into from

Conversation

HeroicKatora
Copy link
Contributor

@HeroicKatora HeroicKatora commented Jan 20, 2019

Description

The Extract-trait was complemented by another trait to allow one type to provide an seed for extracting multiple others. All existing self-extracting types (Body, Json, Path, …) are provided with an impl Extract<T, Data> for (). A second impl for Endpoint has been created: Seeded. It is essentially a tuple of a standard endpoint function and a Sync + 'static seed tuple which provides the seeds for extracting the function arguments.

The new trait looks as follows (trait bounds may be stricter than necessary), fairly closely modeled after serde::de::DeserializeSeed:

/// An extractor for an app with `Data`
pub trait ExtractSeed<Param, Data>: Send + Sync + Sized + 'static 
    where Param: Send + Sized + 'static 
{
    /// The async result of `extract`.
    ///
    /// The `Err` case represents that the endpoint should not be invoked, but
    /// rather the given response should be returned immediately.
    type Fut: Future<Output = Result<Param, Response>> + Send + 'static;

    /// Attempt to extract a value from the given request.
    fn extract(&self,
        data: &mut Data,
        req: &mut Request,
        params: &Option<RouteMatch<'_>>,
        store: &Store,
    ) -> Self::Fut;
}

An example impl for this trait (except () for all Extract), uses a runtime provided header name to extract the corresponding header:

/// Wrapper for conenience of disambiguiation, can be discussed
pub struct NamedHeader(pub http::header::HeaderName);

impl<T: 'static, S: 'static> ExtractSeed<Header<T>, S> for NamedHeader 
    where T: From<http::header::HeaderValue> + Send
{
    type Fut = future::Ready<Result<Header<T>, Response>>;
    fn extract(&self,
        data: &mut S,
        req: &mut Request,
        params: &Option<RouteMatch<'_>>,
        store: &Store,
    ) -> Self::Fut {
        let header = req.headers().get(&self.0);
        match header {
            Some(value) => future::ok(Header(value.clone().into())),
            None => future::err(http::status::StatusCode::BAD_REQUEST.into_response()),
        }
    }
}

Similar to the other Endpoint impl for closures, for Seeded we also use a closure to provide implementations up to 9 arguments. It follows the other implementation closely, only extending the required trait parameters by new ones and replacing the extraction logic with the following:

// within the endpoint macro, from
let ($($Y),*) = &self.1;
$(let $X = <$Y as ExtractSeed<$X, Data>>::extract($Y, &mut data, &mut req, &params, store);)*

Motivation and Context

This allows an extracted value to depend on information only available at runtime. Such behaviour was previously only (nicely, without static Mutex<T>) possible by creating a new generic type wrapping the actually intended parameter type. This generic would then take a type argument implementing a trait. However, this trait implementation must typically be deferred to an enduser who has to customize it according to their own app state type. Futhermore, this needs one type and impl per endpoint with different configuration.

One problem where this would be applicable is authentication/authorization. Different endpoints may require different security realms or permissions, different user names etc. Encoding this all into types with static functions to derive different runtime values from app state quickly becomes a hassle of writing the same implementation with just slightly different constants. However, the outcome parameter into endpoint functions should mostly be very similar. A single value containing the name of the authorized entity. By implementing this via type-level generics though, the type of this value quickly becomes much much more complex. Moving the necessary information into runtime instead allows one to use a single type to guarantee the necessary invariant.

Programmers may still use newtype structs to type such values more securely but these can be expressed more succinctly without generics.

This could be a solution for #108 as NamedComponent may gain a seed which gets supplied with the name at runtime. This has been exemplified in the example.

This solution is not intended to encourage arbitrary reconfiguration of extractor seeds at runtime although it could allow it. The seeds are hidden behind a BoxedEndpoint as soon as they are provided to the router and may only inspect themselves immutably while requiring Send + Sync.

How Has This Been Tested?

There is an example that extracts the user-agent header based on a runtime constructed HeaderName. Another route extracts the path component num where num was supplied via a static string instead of an associated item on a type parameter.

Other tests in cfg(test) will follow as implementation goes beyond Poc. No other tests had to be modified for these changes to work as the App interface remained unchanged.

Other TODOs

  • Provide missing wrappers like Resource::get_with
  • Consistently create dynamic seeds for extracted types where applicable (in the style of SegmentName, NamedHeader)
  • Use the same naming scheme for all seeds and types

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly. (Not yet)
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes. (Not yet)
  • All new and existing tests passed. (All existing, new tests to come)

edit 2019-01-21 (yw): updated link to example, and added syntax highlighting.

@yoshuawuyts yoshuawuyts added feature A feature that's ready for implementation design Open design question labels Jan 21, 2019
@yoshuawuyts yoshuawuyts requested review from aturon and tirr-c and removed request for aturon January 21, 2019 09:53
@yoshuawuyts
Copy link
Member

ll existing self-extracting types (Body, Json, Path, …) provide impl Extract<T, Data> for () and the Endpoint implementation of Fn(…) was changed to use () to extract these.

While reading the source I was a bit confused by how this works. I think explaining this in the source itself would be helpful!


Pinging @aturon @tirr-c -- this PR seems pretty involved with the design of Tide, I think it'd be good if you could take a look.

@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Jan 21, 2019

@yoshuawuyts I expanded the code to that section underneath, condensed to the really relevant lines, I hope that is what you meant.

@HeroicKatora HeroicKatora changed the title Extend extractors with an optional seed per endpoint. [WIP] Extend extractors with an optional seed per endpoint. Jan 21, 2019
Copy link
Collaborator

@tirr-c tirr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly looked through the code, and this looks quite a large change!


fn main() {
let mut app = tide::App::new(());
app.at("/").get_with(display_header, NamedHeader(HeaderName::from_static("user-agent")));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This get_with call would require ExtractSeed<fn(Header<HeaderValue>) -> String, NamedHeader>: Endpoint<_, _>, but NamedHeader is not a tuple. I think this should be (NamedHeader(...),), but maybe I've got something wrong.

Copy link
Contributor Author

@HeroicKatora HeroicKatora Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, rather I missed something because in the macro expansion for one argument, this let binding gets created: let ($($X),*) = …. Which of course then expands to let (T0) = and not a tuple as in let (T0,) = …. And the same for the right side of that assignment.

I must however say it felt more ergonomic to write, it is much less confusing with one less pair of brackets and comma.

src/body.rs Outdated Show resolved Hide resolved
src/extract.rs Outdated
@@ -3,15 +3,15 @@ use futures::prelude::*;
use crate::{configuration::Store, Request, Response, RouteMatch};

/// An extractor for an app with `Data`
pub trait Extract<Data>: Send + Sized + 'static {
pub trait Extract<Param, Data>: Send + Sync + Sized + 'static where Param: Send + Sized + 'static {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the trait signature should be Extract<Seed, Data>. This looks quite reversed; for instance, Json is a extractor, not ().

Copy link
Contributor Author

@HeroicKatora HeroicKatora Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

() is intended as an extractor for Json. That sounds more plausible even, () extracts Param from Data. If we go the way of adding a trait rather than replacing one, this could of course be solved with another name. I'm open to bikeshedding.

However, Extract<(), Data> for Json would mean that the natural &self argument becomes &Seed. Then the trait can no longer be boxed to create a Box<Extract<Json, Data, Fut=Box<Future<…>>>> that can extract Json.

src/head.rs Outdated

pub struct Header<T>(pub T);

impl<T: From<http::header::HeaderValue> + Send + 'static, S: 'static> Extract<Header<T>, S> for NamedHeader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A seed doesn't have to be a newtype; impl Extract<Header<T>, S> for http::header::HeaderName would work fine here! (or reversed, as of the review above.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gravitated to that choice as well, but consider String may get confusing if used as the extractor for too many argument types. Then, consistency of providing a newtype may be more important than not nesting in a newtype. It has its pros and cons, feel free to disagree.

@aturon
Copy link
Collaborator

aturon commented Jan 23, 2019

Thanks for the PR, @HeroicKatora! There are definitely some interesting ideas here. I want to mull on it a bit, but here are some immediate thoughts:

  • It seems like there are two problems you're taking on here. One is extraction that depends on runtime values. Another is avoiding having to encode too much into the type system. (I do think these may be related issues, but still worth separating out). Does that seem right?

  • AIUI, seeding for endpoints is "all or nothing": either you don't pass in any seeds (and everything has to work with ()), or else you have to pass a tuple with all of the seeds explicit (including passing dummy () values). While it's possible to do type-level computation to give more flexibility, that route entails a lot of complexity (and usually baffling error messages).

  • Seeding spreads out the logic of extraction a bit: rather than it being wholly contained in the endpoint definition, now some important parts of provided in router definition. That's a bit unfortunate.

  • For comparison, I had in mind a couple of other ways of addressing the core problems, IIUC:

    • Extractors don't have to be transparent newtypes. So, for example, we could provide a Headers extractor that then provides a method for extracting particular headers. That method would be invoked within the endpoint, thereby centralizing extraction logic there. Being a method, it can take a runtime parameter for which header is desired.
    • Extractors should be able to use Tide's configuration system. This could possibly be used to provide parameters to a given extractor instance, though we haven't worked out a full, ergonomic way for doing that.

In short, I definitely think we should have a solid story for the concerns you're raising in this PR, but I'm wondering if we can find alternative approaches that stick with Tide's original strict separation between routing concerns and endpoint/extraction concerns.

@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Jan 23, 2019

One is extraction that depends on runtime values. Another is avoiding having to encode too much into the type system. (I do think these may be related issues, but still worth separating out). Does that seem right?

I suppose these are related but intended was mostly a solution to the second (with a specific issue faced by third-party library authors. I'll expand on it below). The design specifically asserts all extractor seeds to be sync, and only gives immutable access. It does not assert Clone as the configuration was intended to be endpoint specific. Once there are runtime values in place, I don't see how one could prevent them from being used as dynamic configuration, how could one stop injecting Arc<Mutex<Anything>> or a Once<_> of this when a static reference is required? So, instead I cut that effort and focused on making it clear that this was not intended while making it useful.

AIUI, seeding for endpoints is "all or nothing":

You are correct, and this was intentional. In the tuple model, every parameter maps very clearly to its seed. By using the standard unit type and making this the standard seed, I hoped to make get_with seem like a natural extension of an endpoint whose seed is only (). Type level computation would not only make the implementation hard but also the usage.

Seeding spreads out the logic of extraction a bit: rather than it being wholly contained in the endpoint definition, now some important parts of provided in router definition. That's a bit unfortunate.

This could definitely be improved. I was just not sure where to integrate seeds idiomatically. So, I chose to put them/the one next to the convenience methods in Resource so that there is no need to name the otherwise clunky ExtractSeed-endpoint. I'm open to other suggestions of where and how to construct this Endpoint implementation wrapper. The current place just felt fairly ergonomic when writing the example. For a first time user, that is.

  • Extractors should be able to use Tide's configuration system. This could possibly be used to provide parameters to a given extractor instance, though we haven't worked out a full, ergonomic way for doing that.

I'm not sure in which way this might improve the situation. Extractors seem to then need to rely on a specific type to extract from a loose typemap but that type is no longer type checked and needs to be unique to this extractor at the same time.

  • Extractors don't have to be transparent newtypes. So, for example, we could provide a Headers extractor that then provides a method for extracting particular headers.

This seems to be at conflict with the idea of using extraced values as an easily pluggable guard to BadRequest and the sort. Another usage are authorization tokens or in the simplest case passwords. This is just one of the cases where a third-party library would be very welcome. But as a library writer, this approach is flawed when constrained to the type system alone.

  • Requiring the all values to be derived from the user's AppState works in the end but makes it impossible to provide drop-in guards. Every part needs to be amamalgamated into the AppState (and that also requires Sync etc.).
  • Deriving the values from a type system has similar problems. Coupling values loosely through dynamic typemaps runs into configuration problems as soon as there should be more than one value or that value should be routing of endpoint dependent, which is the common case for authorization. Coupling them via a associated items on traits results in the stacking of type-parameters in most places and in most libraries only couples dynamic user values by having AppState derive this trait. This also provides no integration into routing or endpoint.

Tl;dr: I agree that we should try to separate endpoint and routing information more cleanly in the interface, and not just its implementation. But I think that dynamic values provide the most idiomatic and most usable coupling, especially when considering how typical configurable components might be implemented in libraries.

Lastly, let me note that this could solve the problem of #63 . If you want a proof of concept for this, I will gladly try to provide one.

@aturon
Copy link
Collaborator

aturon commented Jan 23, 2019

@HeroicKatora thanks for the detailed response! I'll try to respond fully later, but a quick question:

This seems to be at conflict with the idea of using extraced values as an easily pluggable guard to BadRequest and the sort.

Can you elaborate on this a bit? One core design aspect of Tide is that routing happens fully prior to extraction, and hence extraction cannot influence which endpoint is invoked. Bailing out for an endpoint has to happen within the endpoint body (and we eventually intend to provide redirects or other similar functionality).

So to clarify what I meant a bit, we could have a Segments extractor that did not need to have a name statically set up, but where you do the extraction in the body:

fn my_endpoint(segments: Segments) {
    let id: u64 = segments.get("id");
    let name: String = segments.get("name");
    // ...
}

@HeroicKatora
Copy link
Contributor Author

Can you elaborate on this a bit? One core design aspect of Tide is that routing happens fully prior to extraction, and hence extraction cannot influence which endpoint is invoked.

Sure. In case of a denied authorization for example you might want to respond with a fully static 'Denied' error with correct Authorization header. That error response is possible for an extractor because it does not need to influence routing. And for a library, the header should most likely be set by the extractor since it contains information about the authorization method that should be used for example. That is not to say that it should not be possible to do this in the endpoint body but that for the most common cases the extractor would be much more ergonomic.

@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Jan 24, 2019

Bailing out for an endpoint has to happen within the endpoint body (and we eventually intend to provide redirects or other similar functionality).

Can I get some clarification on this? If I understood the blog post correctly and recall it, then this was sometime intended to occur not via the path, i.e. by not going through the router. Is there a reason why such a redirection would be possible for endpoints but not for extractors (especially if seeded such that the intended target can be configured at construction)?

@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Mar 1, 2019

To address the biggest concern of @tirr-c I think I'll do a rebase of this where I leave out the changes to Extract itself and instead introduce a new trait ExtractSeed with a blanket impl for () (or Phantom<T> paralleling closely serde::de::DeserializeSeed).

@HeroicKatora
Copy link
Contributor Author

Is it okay to rebase this in progress? Or should we close and reopen, or something else entirely?

The result looks more clunky for binding it to a route but at least the
concerns are separate more strictly. This also removes the need for
duplicating the router utilities for methods.
@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Mar 1, 2019

In order to makes this much more convincing, I've written a sample add-on to provide sessions:

https://github.com/HeroicKatora/tide/tree/session
or direct link to example app

This is exactly the use case, where the third-party library wants to provide additional extractors without the configuration living inside generic type arguments (and it should not leak into AppData).

@HeroicKatora
Copy link
Contributor Author

And another one, providing Basic authentication. For this use case, the desire of endpoint specific configuration is probably greater. Don't use this in production but it's more or less okayish and feature complete.

https://github.com/HeroicKatora/tide/tree/authorize
https://github.com/HeroicKatora/tide/blob/authorize/examples/simple.rs

@HeroicKatora HeroicKatora changed the title [WIP] Extend extractors with an optional seed per endpoint. Extend extractors with an optional seed per endpoint. Mar 4, 2019
@HeroicKatora
Copy link
Contributor Author

Found no further issues after the rebase, so I've updated the documentation in the pull request. @aturon @tirr-c This completes the rework to address the main weaknesses, so I've also removed the [WIP] tag in the title.

@bIgBV bIgBV mentioned this pull request Mar 6, 2019
@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Apr 10, 2019

Made irrelevant by #156 which removes the concept to be extended while simultaneously providing a much cleaner approach with core Rust semantics. Third-party crates can add parameter to the extension traits created by them to replace the concept of seeds or pass the Context<_> to the seed.

You could still find the example code for the authentication crates useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Open design question feature A feature that's ready for implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants