proposed schema changes #204

donaldcampbelljr · 2024-08-30T18:07:12Z

Make samples an array with items of type object to match the input schemas used by eido.
We also discussed adding required_files as a field as well. See this bedmaker PEP as an example: https://schema.databio.org/?namespace=pipelines&schema=bedmaker

So you could have a pipestat schema like so:

title: An example Pipestat output schema
description: A pipeline that uses pipestat to report sample and project level results.
type: object
properties:
  pipeline_name: "default_pipeline_name"
  project:
    type: object
    properties:
      number_of_things:
        type: integer
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec rutrum rhoncus tellus, ac euismod nisl mattis sit amet. Aenean scelerisque"
      percentage_of_things:
        type: number
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam ultricies nunc orci, sed aliquam est."
      name_of_something:
        type: string
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ipsum erat, porta in condimentum viverra, pellentesque in nisl. Nulla rhoncus nibh est, quis malesuada diam suscipit at. In ut diam."
      switch_value:
        type: boolean
        description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras pharetra."
  samples:
    type: array
    items:
      type: object
      properties:
        smooth_bw:
          path: "aligned_{genome}/{sample_name}_smooth.bw"
          type: string
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce nec cursus nulla."
        aligned_bam:
          path: "aligned_{genome}/{sample_name}_sort.bam"
          type: string
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ipsum erat, porta in condimentum viverra, pellentesque in nisl. Nulla rhoncus nibh est, quis malesuada diam suscipit at. In ut diam."
        peaks_bed:
          path: "peak_calling_{genome}/{sample_name}_peaks.bed"
          type: string
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce nec cursus nulla."
        output_file:
          $ref: "#/$defs/file"
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce nec cursus nulla."
        output_image:
          $ref: "#/$defs/image"
          description: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras pharetra."
      required_files:
        - output_file
$defs:
  image:
    type: object
    object_type: image
    properties:
      path:
        type: string
      thumbnail_path:
        type: string
      title:
        type: string
    required:
      - path
      - thumbnail_path
      - title
  file:
    type: object
    object_type: file
    properties:
      path:
        type: string
      title:
        type: string
    required:
      - path
      - title

The text was updated successfully, but these errors were encountered:

donaldcampbelljr · 2024-08-30T19:46:31Z

Regarding the required_files item, how would this work exactly when pipestat reported a result?

Currently, for file objects, our output schemas require file related information, such as path, title, etc. When the user wants to report a specific file result, that result must have those fields. However, if we add a required_files field, would that mean that every time any result is reported, one of the results must be a required file? And how is this different from a required field?

Should we just allow the pipestat schema to have a required_file section but ignore it within pipestat (since I believe the utility occurs when the pipestat results are used as an input PEP to something else)?

donaldcampbelljr · 2024-09-06T11:06:38Z

The above changes to the schema were added.
Regarding the required_files key we changed the key to be tangible. If the user wishes they can add this key to a pipestat output schema and then use that schema as an input schema.

make samples type array and nest under items #204

donaldcampbelljr added a commit that referenced this issue Aug 30, 2024

make samples type array and nest under items #204

b1b451d

donaldcampbelljr mentioned this issue Aug 30, 2024

make samples type array and nest under items #204 #205

Merged

donaldcampbelljr added the likely-solved label Sep 6, 2024

donaldcampbelljr added a commit that referenced this issue Sep 9, 2024

Merge pull request #205 from pepkit/dev_pipestat_schema_revisions

9bf46cc

make samples type array and nest under items #204

donaldcampbelljr added this to the v0.11.0 milestone Sep 9, 2024

donaldcampbelljr mentioned this issue Oct 2, 2024

v0.11.0 release #208

Merged

donaldcampbelljr closed this as completed Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposed schema changes #204

proposed schema changes #204

donaldcampbelljr commented Aug 30, 2024

donaldcampbelljr commented Aug 30, 2024

donaldcampbelljr commented Sep 6, 2024 •

edited

Loading

proposed schema changes #204

proposed schema changes #204

Comments

donaldcampbelljr commented Aug 30, 2024

donaldcampbelljr commented Aug 30, 2024

donaldcampbelljr commented Sep 6, 2024 • edited Loading

donaldcampbelljr commented Sep 6, 2024 •

edited

Loading