Skip to content

FEAT: Object-augmented schemas -- Object Type #1259

@dimitri-yatsenko

Description

@dimitri-yatsenko

Feature Request

Problem

Modern scientific pipelines must manage large, complex data objects (e.g., images, time-series, n-dimensional arrays) that are impractical to store directly in a relational database. The current approach of storing file paths as strings is brittle and error-prone; DataJoint has no awareness of the external file, cannot manage its lifecycle, and cannot guarantee its integrity. This disconnect breaks the seamless nature of the pipeline and places a significant manual burden on the user to maintain data consistency between the database and the external storage.

Requirements

Introduce the object attribute type, which natively supports a hybrid storage model where metadata resides in the database and the data object resides in an external store. This implementation must adhere to the DataJoint 2.0 Specification

Core requirements:

  1. object Attribute Type:
  • Introduce a new core attribute type named object.
  • When an attribute is declared as type object, the database table will store a reference key (e.g., path, UUID) and associated metadata, not the data object itself.
  1. dj.Object Interface:
  • Interfacing with objects stores in objects stores is implemented using the dj.Object base class
  • The dj.Object base class that users can subclass to define custom handlers for their external data objects.
  • Project configuration files select and configure the object store
  • Any class inheriting from dj.Object MUST implement the following standard interface:
  • put(self, store, key: str) -> dict: Writes the object's data to the specified storage backend under a given key and returns a dictionary of metadata to be stored in the database.
  • get(cls, store, key: str) -> "dj.Object": A class method to read data from the store using its key and reconstruct the Python object.
  • get_meta(self) -> dict: Returns a dictionary of metadata about the object instance.
  • verify(self, store, key: str) -> bool: Verifies the existence and integrity (e.g., via checksum) of the object in the external store.

Metadata Management:

  • For every attribute of type object, the system must automatically store essential metadata in the relational table alongside the object reference.
  • This metadata MUST include fields for object key/path, file format, size, and a checksum (e.g., MD5, SHA256) to ensure data integrity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIndicates new featuresstaleIndicates issues, pull requests, or discussions are inactive

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions