Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use OpenAPI to describe resource's schema? #78

Closed
Bpoe opened this issue May 16, 2023 · 11 comments
Closed

Should we use OpenAPI to describe resource's schema? #78

Bpoe opened this issue May 16, 2023 · 11 comments

Comments

@Bpoe
Copy link
Collaborator

Bpoe commented May 16, 2023

Summary of the new feature / enhancement

Should we use OpenAPI to describe a resource's schema? The advantages are:

  • Pervasive and widely understood
  • Lots of tooling available
  • Allows for different schema for different operations (Get, Set, Test can each have their own schema)
  • Allows for more than one resource type to be described by a single spec
  • Allows for documenting response codes

Cons:

  • Geared toward HTTP, so it requires things like paths that are not currently planned as part of DSC

Proposed technical implementation details (optional)

No response

@Bpoe
Copy link
Collaborator Author

Bpoe commented May 17, 2023

Here is a sample OpenAPI spec for a resource for environment variables:

openapi: 3.0.0
info:
  title: An example environment variable DSCv3 resource defined in OpenAPI format
  version: 0.0.1
paths:
  /:
    get:
      summary: List all environment variables
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariableList"

  /{name}:
    get:
      summary: Get an environment variable
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariable"

    put:
      summary: Create or update an environment variable
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/environmentVariableCreateRequest"
      responses:
        "200":
          description: Variable already exists and already has the desired value
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariable"

        "201":
          description: Variable was either created or updated with the desired value
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariable"

  /{name}/test:
    post:
      summary: Test if an environment variable exists as expected
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/environmentVariableCreateRequest"
      responses:
        "200":
          description: Test action was executed successfully
          content:
            application/json:
              schema:
                type: Boolean

components:
  schemas:
    environmentVariable:
      type: object
      properties:
        name: 
          type: string
        value:
          type: string

    environmentVariableList:
      type: array
      items:
        $ref: "#/components/schemas/environmentVariable"

    environmentVariableCreateRequest:
      type: object
      properties:
        value:
          type: string

@michaeltlombardi
Copy link
Collaborator

I've been thinking about this, and my current perspective is that DSC as architected and implemented so far for v3 is too far from a RESTful API for this to be more helpful than harmful. OpenAPI is specifically and explicitly a standard for HTTP APIs, which DSC is not.

Below are some extended thoughts about OpenAPI, JSON Schema, and DSCv3.

Context

For the dsc executable, I know of the following operations related to the resource's manifest:

  • Get the current state for a given configuration document on a target
  • Test a given configuration on a target
  • Apply a given configuration on a target (handles test if needed)
  • Get the current state for a given resource on a target
  • Test a given resource's state against a desired state on a target
  • Apply settings for a given resource on a target (handles test if needed)

These already have a defined interface surface in the form of the get, test, and set methods. Nominally, get can be called without any further information for some resources, returning all instances of that resource on the target. All other calls, including get for a specific resource, require at least one extra parameter, like the key property for a resource.

Resources may have multiple key properties that uniquely identify them. They may also be single-instance resources that don't have a unique identifier per-se - they're always unique per-target, like the configured timezone.

Right now, the resource definition uses a JSON Schema definition. OpenAPI is a specific dialect for valid JSON Schemas that extends the basic semantics of a JSON to better represent full APIs. The resource definition schemas are implementations of a specific API contract that describe the properties of an instance of the resource.

If we consider an example schema, like Microsoft/OSInfo:

"$schema": "http://json-schema.org/draft-07/schema#",
"title": "OsInfo",
"type": "object",
"required": [],
"properties": {
"$id": {
"type": "string"
},
"architecture": {
"type": [
"string",
"null"
]
},
"bitness": {
"$ref": "#/definitions/Bitness"
},
"codename": {
"type": [
"string",
"null"
]
},
"edition": {
"type": [
"string",
"null"
]
},
"family": {
"$ref": "#/definitions/Family"
},
"version": {
"type": "string"
}
},
"additionalProperties": false,
"definitions": {
"Bitness": {
"type": "string",
"enum": [
"32",
"64",
"unknown"
]
},
"Family": {
"type": "string",
"enum": [
"Linux",
"MacOS",
"Windows"
]
}

We can see that we're able to describe a resource with a normative JSON Schema. These schemas can be auto-generated from the code that defines a struct in numerous languages. This is nearly the same (there's some implementation details for OpenAPI that conflict with normative JSON Schemas at edge cases) as the components.schema. Whether the eventual manifest is a json/yaml blob that is a normative JSON Schema or an OpenAPI schema document, the resource author will still need to generate or author the resource's configurable surface schema.

The rest of the manifest for a DSC Resource as currently required by the implementation is information that the resource needs to advertise to the dsc executable (and any other tool that wants to use the resource without DSC) about how to call the resource. Those keys are defined by dsc itself:

pub struct ResourceManifest {
/// The version of the resource manifest schema.
#[serde(rename = "manifestVersion")]
pub manifest_version: String,
/// The namespaced name of the resource.
#[serde(rename = "type")]
pub resource_type: String,
/// The version of the resource.
pub version: String,
/// The description of the resource.
pub description: Option<String>,
/// Details how to call the Get method of the resource.
pub get: GetMethod,
/// Details how to call the Set method of the resource.
#[serde(skip_serializing_if = "Option::is_none")]
pub set: Option<SetMethod>,
/// Details how to call the Test method of the resource.
#[serde(skip_serializing_if = "Option::is_none")]
pub test: Option<TestMethod>,
/// Details how to call the Validate method of the resource.
#[serde(skip_serializing_if = "Option::is_none")]
pub validate: Option<ValidateMethod>,
/// Indicates the resource is a provider of other resources.
#[serde(skip_serializing_if = "Option::is_none")]
pub provider: Option<Provider>,
/// Mapping of exit codes to descriptions. Zero is always success and non-zero is always failure.
#[serde(rename = "exitCodes", skip_serializing_if = "Option::is_none")]
pub exit_codes: Option<HashMap<i32, String>>,
/// Details how to get the schema of the resource.
#[serde(skip_serializing_if = "Option::is_none")]
pub schema: Option<SchemaKind>,
}

That struct can be automatically exported as a JSON Schema, which can be used in editors like VS Code and nearly any language for validating the values. In editors, you can get IntelliSense when hand-authoring your own schema. I don't see a way around a resource author having to generate or author this information for their resource, since dsc and any arbitrary caller will need it. I don't think this information maps clearly to an OpenAPI spec document, but we could probably agree on a mapping that makes it work. That will probably require clobbering or extending semantics for OpenAPI to fit DSC.

Generating

To automatically generate the OpenAPI spec from code, the code has to implement HTTP routing. Any code generated from an OpenAPI spec is generated to receive and process HTTP requests.

For both JSON Schema and OpenAPI component schemas, authors can generate the schema from a struct or a struct from a schema.

Unless the entire model is inverted from CLI tools to REST APIs (with handlers/endpoints), OpenAPI requires at least some hand-authoring for the spec.

Reviewing Manifests

In the current implementation, I don't need to orient myself around API endpoints and how those map to CLI calls. If the resource manifest was defined as an OpenAPI spec, I need to understand that mapping, or the implementation needs to be reworked to actually treat the resources as endpoints. Otherwise, I risk confusion about how I should implement or call the resource.

With the schema for resource manifests published, I can author them in my VS Code with validation, IntelliSense, and hover-help. That can help me contextualize things better while I'm authoring and editing.

If the resource manifests are authored as OpenAPI spec documents, the validation, IntelliSense, and hover-help will be for general OpenAPI spec documents - generally helpful for making sure my document is valid to that spec, but not linking it back to the specific requirements and help for authoring a DSC Resource.

Advantages Comparison

Returning to the initial list of advantages for OpenAPI:

  1. Pervasive and widely understood
  2. Lots of tooling available
  3. Allows for different schema for different operations (Get, Set, Test can each have their own schema)
  4. Allows for more than one resource type to be described by a single spec
  5. Allows for documenting response codes

Regarding points 1, 2, and 4, these advantages are also covered by the existing implementation. JSON Schema is pervasive and widely understood (OpenAPI is built on it, it's also used extensively for non-API validation of data), has tools in nearly every language, and can be authored to describe many objects in the same document (see the Crescendo schema for an multi-object example).

For point 5, the current schema already includes an entry for exit codes as a map of exit codes to their human-readable descriptor. The registry resource uses this schema entry to indicate how the caller should understand an exit code:

"exitCodes": {
"0": "Success",
"1": "Invalid parameter",
"2": "Invalid input",
"3": "Registry error",
"4": "JSON serialization failed"
},

Combined with defined and documented semantics, if this key isn't defined in the resource manifest the caller has to assume 0 indicates no errors and 1 indicates an unknown level of failure.

For point 3, I think I understand that the advantage of OpenAPI is either that a resource author could return different data for the methods. In the current implementation, the resource advertises how the methods are called, not what is returned. The return information is expected to be typed in a particular way by DSC (and thus, any caller reusing the resource). In cases where a single resource is being operated on, it's always expected to return either the current state of the resource (which must validate against the resource's own object schema) or the current state of the resource and a list of the properties that were out of state (for test) or changed (for set). In the not-yet-implemented case where get is called for any number of resources, the caller should reasonably expect that the return is an array of current states (and that the return may be an empty array).

I would strongly discourage resource implementations to return different data for a resource's state depending on the method called, as that breaks nearly every existing model for retrieving and understanding the state of a configurable resource across tools. The surface of a resource should be defined by its schema, regardless of method or higher-order caller, because it represents a trustable contract about the resource's known surface area.

Conclusion

Based on the current implementation, the semantics of DSC and OpenAPI, and the authoring impact, I would recommend that resource manifests not be authored as OpenAPI spec documents instead of JSON Schema documents.

If DSCv3 was rearchitected so that DSC itself and/or the resources were required to be implemented as microservices for REST APIs, I think OpenAPI would be the best possible solution for describing DSC and compatible resources. It would allow auto-generation of basic implementations for resource authors in any number of languages.

However, I think reorienting DSC and compatible resources to a RESTful model will increase the complexity of DSC itself, of compatible resources, of integrating with resources directly for other callers, of deploying DSC and compatible resources, and reasoning about DSC generally. It will be an even larger break of continuity than DSCv3 already represents and require altering the semantics of the platform and tooling.

I do believe that if someone were to build a RESTful service on top of DSC, one that could process incoming REST calls and invoke dsc as needed, handling async operations, returns, reporting, etc, that would be an incredibly strong value add to the ecosystem. I could see such a service being centralized (I make a request to dsc.contoso.com or whatever and it handles running dsc on the target nodes) or per-node both being very useful. I would definitely be interested in a service like that.

@Bpoe
Copy link
Collaborator Author

Bpoe commented May 17, 2023

That service is Azure Machine Configuration and that's the perspective that I'm looking at DSC from.

@michaeltlombardi
Copy link
Collaborator

That service is Azure Machine Configuration and that's the perspective that I'm looking at DSC from.

Definitely! But (as I understand it) Azure Policy's machine configuration feature is a higher-order tool that builds on and calls DSC.

I can absolutely see a coherent future where machine configuration exposes a REST API implemented to/by OpenAPI that exposes endpoints for managing one or more nodes. Machine configuration would be able to consume the schemas for resources included in a package or referenced in a configuration document to generate the API spec and that users could browse and use directly.

I'm enthusiastically interested in machine configuration going in that direction.

@Bpoe
Copy link
Collaborator Author

Bpoe commented May 17, 2023

I disagree with the assessment that DSC is too far from a RESTful interface. In fact, I'm suggesting that we actually do make it compatible as a RESTful interface. This will enable some pretty cool scenarios for Azure that I can elaborate on in person.

Consider this "get" example:

http://localhost:8888/providers/Microsoft.Sample/coolResourceType/foo
  |
  V
sample-dsc-resource.exe get /providers/Microsoft.Sample/coolResourceType/foo
  ^
  |
dsc.exe resource get --resource Microsoft.Sample/coolResourceType --name foo

Here you can see how a get scenario would work for a local HTTP endpoint or using the dsc.exe command, and how that call would map to the actual resource's command line.

@Bpoe
Copy link
Collaborator Author

Bpoe commented May 17, 2023

Also, here is an updated OpenAPI spec for a resource provider:

openapi: 3.1.0
info:
  title: An example environment variable DSCv3 resource defined in OpenAPI format
  version: 0.0.1
  x-ms-dsc-namespace: Microsoft.Configuration

servers:
- url: file://{resourcePath}/environmentvariables.exe
  description: The path the executable
  variables:
    resourcePath:
      default: c:/Program Files/msft

paths:
  /providers/Microsoft.Configuration/environmentVariables/:
    get:
      summary: List all environment variables
      x-ms-dsc-resource: environmentVariables
      x-ms-dsc-operation: list
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariableList"

  /providers/Microsoft.Configuration/environmentVariables/{name}:
    get:
      summary: Get an environment variable
      x-ms-dsc-resource: environmentVariables
      x-ms-dsc-operation: get
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariable"

    put:
      summary: Create or update an environment variable
      x-ms-dsc-resource: environmentVariables
      x-ms-dsc-operation: set
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/environmentVariableCreateRequest"
      responses:
        "200":
          description: Variable already exists and already has the desired value
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariable"

        "201":
          description: Variable was either created or updated with the desired value
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/environmentVariable"

    delete:
      summary: Delete an environment variable
      x-ms-dsc-resource: environmentVariables
      x-ms-dsc-operation: delete
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      responses:
        "200":
          description: Variable has been deleted
        "204":
          description: Variable does not exist

  /providers/Microsoft.Configuration/environmentVariables/{name}/test:
    post:
      summary: Test if an environment variable exists as expected
      x-ms-dsc-resource: environmentVariables
      x-ms-dsc-operation: test
      parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/environmentVariableCreateRequest"
      responses:
        "200":
          description: Test action was executed successfully
          content:
            application/json:
              schema:
                type: boolean

components:
  schemas:
    environmentVariable:
      type: object
      properties:
        name: 
          type: string
        value:
          type: string

    environmentVariableList:
      type: array
      items:
        $ref: "#/components/schemas/environmentVariable"

    environmentVariableCreateRequest:
      type: object
      properties:
        value:
          type: string

@michaeltlombardi
Copy link
Collaborator

Can you help me understand this example a little more? I'm not sure I follow it.

http://localhost:8888/providers/Microsoft.Sample/coolResourceType/foo
  |
  V
sample-dsc-resource.exe get /providers/Microsoft.Sample/coolResourceType/foo
  ^
  |
dsc.exe resource get --resource Microsoft.Sample/coolResourceType --name foo

I see that there's a REST call to the local machine and a CLI call to dsc.exe. It looks to me like both are pointing in towards sample-dsc-resource.exe, as if those are two separate and equivalent calls.

In this example, is sample-dsc-resource.exe one of the following, or something else?

  • A DSC Resource implemented as a REST API microservice
  • A shim agent that sits over DSC Resources and interprets REST calls to invoke the resources
  • A DSC Resource implemented with a get command that uses the representation of a REST call as its input

To clarify the model I was thinking of in my head, the flows are something like:

---
title: Direct resource call
---
sequenceDiagram
    participant u as User
    participant r as resource.exe
    u->>r: resource.exe get --name foo
    r-->>u: return resource JSON for foo, exit code 0
Loading
---
title: Call through dsc CLI
---
sequenceDiagram
    participant u as User
    participant dsc
    participant r as resource.exe
    u->>dsc: dsc resource get \ <br/>--module foo \<br/>--name bar \<br/> --properties name=baz
    dsc->>r: '{ "name": "foo"}' | resource.exe get
    r-->>dsc: return resource JSON for foo, exit code 0
    dsc-->>u: return dsc result JSON
Loading
---
title: Call through API service
---
sequenceDiagram
    box Client Machine
        participant u as User
    end
    box Cloud
        participant apis as REST API Service
    end
    box Target Machine
        participant apia as REST API Agent
        participant dsc
        participant r as resource.exe
    end

    u-)apis: get /providers/foo/bar/baz
    apis-)apia: Forward request to<br/>agent on correct machine
    apia->>dsc: dsc resource get \ <br/>--module foo \<br/>--name bar \<br/> --properties name=baz
    dsc->>r: '{ "name": "foo"}' | resource.exe get
    r-->>dsc: return resource JSON for foo, exit code 0
    dsc-->>apia: return dsc result JSON
    apia--)apis: POST query result from dsc
    note over u,apis: User can retrieve result info now
Loading

My thought for these flows is that all three are valid use cases, but the last one depends on a service/agent between the user making the REST call and dsc invoking the resource.

@Bpoe
Copy link
Collaborator Author

Bpoe commented May 18, 2023

It's the latter; A DSC Resource implemented with a get command that uses the representation of a REST call as its input

The resource is modeled to expect its "command" input in URI format. The payload is passed via Stdin (in the case of set/test). This allows it to be called equally via the dsc.exe and the local web service. Modeling a resource in this way allows for both scenarios.

Think of it as modeling the resource so that it can be consumed by 2 different front ends.

@Bpoe
Copy link
Collaborator Author

Bpoe commented May 19, 2023

The sequence diagrams finally loaded for me :)
I'm thinking of the last one, but the REST API "Agent" would not need to use dsc.exe. It would call the resource directly.

---
title: Call flow from ARM to resource.ese
---
sequenceDiagram
    box Client Machine
        participant u as User
    end
    box Cloud
        participant arm as Azure Resource Manager
    end
    box Target Machine
        participant api as REST API
        participant r as resource.exe
    end

    u->>arm: put /providers/foo/bar/baz<br>'{ "name": "foo", "value": "bar" }'
    arm->>api: Forward request to<br/>agent on correct machine
    api->>r: '{ "name": "foo", "value": "bar" }' | resource.exe put /providers/foo/bar/baz
    r-->>api: 'status: 201 Created'<br>{ "name": "foo", "value": "bar" }'
    api-->>arm: 'HTTP/1.1 201 Created'<br>'{ "name": "foo", "value": "bar" }'
    arm-->>u: 'HTTP/1.1 201 Created'<br>'{ "name": "foo", "value": "bar" }'
Loading
---
title: Call flow from Shell to resource.ese
---
sequenceDiagram
    box Target Machine
        participant shell
        participant dsc
        participant r as resource.exe
    end

    shell->>dsc: dsc.exe resource set -resource foo/bar -name baz
    dsc->>r: '{ "name": "foo", "value": "bar" }' | resource.exe put /providers/foo/bar/baz
    r-->>dsc: 'status: 201 Created'<br>'{ "name": "foo", "value": "bar" }'
    dsc-->>shell: return dsc result JSON
Loading

@mgreenegit
Copy link
Member

I think we should isolate what we will need in machine config to support this vs in the dsc platform and define any specific requirements as a new issue. Ideally, a REST endpoint will "just" call the native commands.

@SteveL-MSFT
Copy link
Member

As discussed in last meeting, this is in scope for Machine Config, but not directly part of DSC v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants