-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Feature Request
Problem
Modern scientific pipelines must manage large, complex data objects (e.g., images, time-series, n-dimensional arrays) that are impractical to store directly in a relational database. The current approach of storing file paths as strings is brittle and error-prone; DataJoint has no awareness of the external file, cannot manage its lifecycle, and cannot guarantee its integrity. This disconnect breaks the seamless nature of the pipeline and places a significant manual burden on the user to maintain data consistency between the database and the external storage.
Requirements
Introduce the object
attribute type, which natively supports a hybrid storage model where metadata resides in the database and the data object resides in an external store. This implementation must adhere to the DataJoint 2.0 Specification
Core requirements:
object
Attribute Type:
- Introduce a new core attribute type named
object
. - When an attribute is declared as type object, the database table will store a reference key (e.g., path, UUID) and associated metadata, not the data object itself.
dj.Object
Interface:
- Interfacing with objects stores in objects stores is implemented using the
dj.Object
base class - The
dj.Object
base class that users can subclass to define custom handlers for their external data objects. - Project configuration files select and configure the object store
- Any class inheriting from
dj.Object
MUST implement the following standard interface: put(self, store, key: str) -> dict
: Writes the object's data to the specified storage backend under a given key and returns a dictionary of metadata to be stored in the database.get(cls, store, key: str) -> "dj.Object"
: A class method to read data from the store using its key and reconstruct the Python object.get_meta(self) -> dict
: Returns a dictionary of metadata about the object instance.verify(self, store, key: str) -> bool
: Verifies the existence and integrity (e.g., via checksum) of the object in the external store.
Metadata Management:
- For every attribute of type object, the system must automatically store essential metadata in the relational table alongside the object reference.
- This metadata MUST include fields for object key/path, file format, size, and a checksum (e.g., MD5, SHA256) to ensure data integrity.