Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposed schema changes #204

Closed
donaldcampbelljr opened this issue Aug 30, 2024 · 2 comments
Closed

proposed schema changes #204

donaldcampbelljr opened this issue Aug 30, 2024 · 2 comments
Milestone

Comments

@donaldcampbelljr
Copy link
Contributor

Related to: pepkit/eido#71

Make samples an array with items of type object to match the input schemas used by eido.
We also discussed adding required_files as a field as well. See this bedmaker PEP as an example: https://schema.databio.org/?namespace=pipelines&schema=bedmaker

So you could have a pipestat schema like so:

title: An example Pipestat output schema
description: A pipeline that uses pipestat to report sample and project level results.
type: object
properties:
  pipeline_name: "default_pipeline_name"
  project:
    type: object
    properties:
      number_of_things:
        type: integer
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec rutrum rhoncus tellus, ac euismod nisl mattis sit amet. Aenean scelerisque"
      percentage_of_things:
        type: number
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam ultricies nunc orci, sed aliquam est."
      name_of_something:
        type: string
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ipsum erat, porta in condimentum viverra, pellentesque in nisl. Nulla rhoncus nibh est, quis malesuada diam suscipit at. In ut diam."
      switch_value:
        type: boolean
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras pharetra."
  samples:
    type: array
    items:
      type: object
      properties:
        smooth_bw:
          path: "aligned_{genome}/{sample_name}_smooth.bw"
          type: string
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce nec cursus nulla."
        aligned_bam:
          path: "aligned_{genome}/{sample_name}_sort.bam"
          type: string
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ipsum erat, porta in condimentum viverra, pellentesque in nisl. Nulla rhoncus nibh est, quis malesuada diam suscipit at. In ut diam."
        peaks_bed:
          path: "peak_calling_{genome}/{sample_name}_peaks.bed"
          type: string
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce nec cursus nulla."
        output_file:
          $ref: "#/$defs/file"
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce nec cursus nulla."
        output_image:
          $ref: "#/$defs/image"
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras pharetra."
      required_files:
        - output_file
$defs:
  image:
    type: object
    object_type: image
    properties:
      path:
        type: string
      thumbnail_path:
        type: string
      title:
        type: string
    required:
      - path
      - thumbnail_path
      - title
  file:
    type: object
    object_type: file
    properties:
      path:
        type: string
      title:
        type: string
    required:
      - path
      - title

@donaldcampbelljr
Copy link
Contributor Author

Regarding the required_files item, how would this work exactly when pipestat reported a result?

Currently, for file objects, our output schemas require file related information, such as path, title, etc. When the user wants to report a specific file result, that result must have those fields. However, if we add a required_files field, would that mean that every time any result is reported, one of the results must be a required file? And how is this different from a required field?

Should we just allow the pipestat schema to have a required_file section but ignore it within pipestat (since I believe the utility occurs when the pipestat results are used as an input PEP to something else)?

@donaldcampbelljr
Copy link
Contributor Author

donaldcampbelljr commented Sep 6, 2024

The above changes to the schema were added.
Regarding the required_files key we changed the key to be tangible. If the user wishes they can add this key to a pipestat output schema and then use that schema as an input schema.

donaldcampbelljr added a commit that referenced this issue Sep 9, 2024
make samples type array and nest under items #204
@donaldcampbelljr donaldcampbelljr added this to the v0.11.0 milestone Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant