Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jobs): ValidateAction #1421

Merged
merged 19 commits into from
Oct 24, 2024
Merged

Conversation

sbliven
Copy link
Contributor

@sbliven sbliven commented Sep 10, 2024

Description

Add ValidateAction to check Job DTO formatting

Motivation

When creating a job clients can provide an untyped jobParams object in the DTO, or a jobResultObject when updating. These were previously not validated.

Fixes:

This PR adds a new validate job action which can test for the presence of particular attributes in the DTO. Currently the attributes are specified using a JSONPath and are only verified to be non-null. If needed, future implementations could add JSONSchema support to do deeper validation.

For example, the following jobConfig.json section would configure the public job to require at least one datasetId:

{
      "jobType": "public",
      "create": {
        "auth": "#all",
        "actions": [
          {
            "actionType": "validate",
            "required": ["jobParams.datasetIds[0]"]
          }
        ]
      },
      "statusUpdate": {
        "auth": "#all"
      }
    }

Changes:

  • Implement validate action
  • New dependency: jsonpath-plus
  • Add tests
  • Add example configuration

Tests included

  • Included for each change/fix?
  • Passing? (Merge will not be approved unless this is checked)

Documentation

  • swagger documentation updated [required]
  • official documentation updated [nice-to-have]

This partially reverts commit b218952.

- Tightened bounds on DtoType for JobAction
- Validate DTOs
DTO validation is in the controller, so the interceptor isn't needed.
Also tests and example
@sbliven
Copy link
Contributor Author

sbliven commented Sep 18, 2024

What does everyone think of the use of JSONPath?

Pro: Simple syntax for simple reference, while keeping enough power for more complex structures like arrays
Pro: Well-known standard (eg used by jq command)
Con: Adds a dependency
Con: Only checks element presence, not type

A more powerful option would be to validate against a JSON Schema. This would allow type checking. However I think it would be much more verbose.

@sofyalaski
Copy link
Contributor

I think the JSONpath solution is appropriate for now.
However, there is a minor issue in the jobs.controller: performJobCreateAction now accesses only the jobInstance of JobClass as an argument (this also needed to call the other function that performs the actions). performJobCreateAction should take two arguments, the second being of JobDto type so that we can call the validate function. Now in the controller validate is called on JobClass instance, which is not how validate is defined.

@sbliven
Copy link
Contributor Author

sbliven commented Oct 2, 2024

Good catch, @sofyalaski. Why does this even compile?

@sbliven
Copy link
Contributor Author

sbliven commented Oct 2, 2024

Ah, it compiles because JobClass is structurally compatible with CreateJobDto. Working on a fix now.

- Revert the previous commit, which was incorrect and incomplete
- Make valiateDTO and performActions generic so they can be shared
  between endpoints
- Separate out checkConfigVersion, which is specific to statusUpdate
- Move `getJobTypeConfiguration` calls up so this is only called once
@sbliven
Copy link
Contributor Author

sbliven commented Oct 7, 2024

This implementation has a couple problems.

First, the JSONPath solution doesn't perform any type checking. It would be nice to do something like

{
  "actionType": "validate",
  "required": {
    "jobParams.datasetIds": "object",
    "jobParams.datasetIds[*]": "string"
  }
}

This looks fine for simple types (string, number, list of literal values) but could quickly grow in complexity until it reaches a full type system. One solution would be to supply a JSON Schema to validate the DTO (which provides full type support). However including JSON schemas in the config file seems overly complex for operators. Is there a happy medium?

Second, the validate method currently acts against the DTO body. Half of the checks in v3 are actually against database entities referenced in the DTO or Job itself. For instance, "archive" jobs should check that all datasets associated with the job have "datasetLifecycle.archivable". Or "publish" jobs with file listings should check that all files exist, requiring checking associated DataBlocks from the dataset. Should these be added to a generic ValidateAction or be a more specific check? We are leaning towards handling those as special cases in the jobs controller.

- Add new test for validate jobs to check for required parameters
  in both POST and PATCH operations
- Remove some unused TestData which didn't match the current Job DTOs
- Add 'validate' job to the test jobconfig.json
- rename top-level array to 'request'. This leaves open the possibility
  to validate other things beside the DTO
- now configuration takes the format of `"JSONPath": JSON-Schema`. The
  schema part is validated with Avj. Note that JSON Schemas can be very
  concise for simple datatypes, so this doesn't require deep knowledge
  of how to write JSONSchema.
- Update unit & mocha tests
- remove 'any' via typeguard
- `npm run lint:fix`
- remove duplicate (and outdated) docstring
(previous merge commit lost a parent)
@sbliven
Copy link
Contributor Author

sbliven commented Oct 12, 2024

I changed how this works somewhat. Now the ValidateAction config looks like:

{
  "actionType": "validate",
  "request": {
    "JSONPath": JSONSchema,
    ...
  }
}

The schema gets validated by avj, so you theoretically could use the full power of a schema. However combining it with the path allows most use cases to be covered by short, easily readable one-liners. Examples:

{
  "actionType": "validate",
  "request": {
    "jobParams.name": { "type": "string"}, // match simple types
    "jobParams.answers[*]": { "enum": ["yes","no"]}, // literal values (here applied to an array)
    "jobResultObject": {"$ref": "https://json.schemastore.org/schema-org-thing.json"}, // JSON Schema
  }
}

I spelled the top-level request to because it gets applied to the DTO (either the create or update DTO). If it needed to get applied somewhere else (eg to associated datasets) we can add that in the future.

test/Jobs.js Show resolved Hide resolved
test/Jobs.js Show resolved Hide resolved
@@ -694,62 +694,6 @@ const TestData = {
],
},

ArchiveJob: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These weren't used by any tests. OK to delete them?

"jobParams.datasetIds[*]": {
"type": "object",
"required": ["pid","files"]
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the best example, as it's a bit complicated.

Copy link

@despadam despadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to try and also cover #1120 in this PR?

src/jobs/config/jobConfig.example.json Outdated Show resolved Hide resolved
@despadam
Copy link

despadam commented Oct 23, 2024

Agreed to merge #1440, then remove checkDatasetState from the controller and incorporate its logic into the validate action, which will allow for configuring jobType - datasetLifecycle combinations and deleting the hardcoded job-types.enum. These changes will come in a new PR which will also incorporate this: #1120

@sbliven sbliven changed the title Validateaction feat: ValidateAction Oct 24, 2024
@sbliven sbliven changed the title feat: ValidateAction feat(jobs): ValidateAction Oct 24, 2024
@sbliven sbliven self-assigned this Oct 24, 2024
@sbliven sbliven added this to the Release Jobs milestone Oct 24, 2024
@sbliven sbliven merged commit 1b44fe0 into SciCatProject:release-jobs Oct 24, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants