Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-30 / Data Model #30

Closed
dnsl48 opened this issue Dec 7, 2020 · 10 comments
Closed

RFC-30 / Data Model #30

dnsl48 opened this issue Dec 7, 2020 · 10 comments

Comments

@dnsl48
Copy link

dnsl48 commented Dec 7, 2020

RFC-30 / Data Model

About the RFC

The RFC describes the data model of the module, the decisions made and captures the reasoning behind those.

The data model defines

  • how the data is persisted in the database
  • how the framework components handle data
  • how the module may be extended and its components reused
  • what user stories the module covers and the problems it solves

Resources

Forum discussion (the module idea) - New links module?

Github discussion (this RFC) - issue-30

User Stories

Seeking some feedback from the community and internal teams we identified two main user stories that people would like to cover

  1. Link is a part of the parent (belongs to its parent) and not editable separately.
  2. Link is a shared object that multiple DataObjects may refer to (via has_one).

Technical concerns

For a potential implementation the following technical concerns have been stated:

  • If a Link DataObject needs subclasses for each link type, the number of SQL queries may become
    too high and the number of tables will rise really quick in projects with versioned and fluent
    modules. The overhead load to the Database is the concern in that scenario.
    Also, the number of database tables will slow down dev/build and history viewing.

  • Extra dependencies that the module may require. E.g. gorriecoe/silverstripe-linkfield requires
    silvershop/silverstripe-hasonefield, giggsey/libphonenumber-for-php and unclecheese/display-logic,
    which are not security audited

  • Persisting links in DB as data blobs (e.g. JSON or Serialized) is very opaque and hard to work
    with later on. Querying owners containing particular links may be impossible to achieve.

Existing solutions

sheadawson/silverstripe-linkable

The most popular community module at the time of writing, with just over 220k downloads.

No longer maintained. Recommends switching to gorriecoe/silverstripe-linkfield.

Implementation:
A link is a DataObject (class Link, table LinkableLink). DataObjects use has_one to implement links.
Custom link types implemented via SilverStripe\ORM\DataExtension for the Link DataObject, which
adds new custom database fields and extends the updateCMSFields method.
UI implementation:
Backend: LinkField class extending SilverStripe\Forms\TextField
Frontend: Entwine extension bundled with the module

gorriecoe/silverstripe-linkfield

The second most popular community module at the time of writing, with just over 37k downloads.

Is maintained at the moment of writing.
Built on top of gorriecoe/silverstripe-link. Has a dozen of satellite modules extending the functionality in a decoupled way.

Implementation:
A link is a DataObject (class Link, table Link, module gorriecoe/silverstripe-link). DataObjects use has_one or many_many.
Custom link types implemented via SilverStripe\ORM\DataExtension for the Link DataObject, which
add new custom database fields and extend the updateCMSFields method.
UI implementation:
Backend: LinkField extends a SilverStripe\Forms\FormField (module gorriecoe/silverstripe-linkfield)
Frontend: only CSS. Relies on gridfields and OOTB framework components.

Considered implementation options

Option A

Link is a DataObject.

Implemented by sheadawson and gorriecoe. Easiest to extend via DataExtension. Can be shaped to has_one, has_many, many_many. More complex relationship versioning and handling (although we have that solved through cascading publish etc.). More performance impact on query time. Can use lots of built-in editing UIs directly (e.g. GridField). Works with other core features like diff views, reports, etc. But also adds more database tables and schema complexity. Most consistent, SilverStripe devs know how to deal with this.

Pros:

  • getCMSFileds method is flexible and easy to override

Cons

  • too many extra DB entities (aka tables) for every link type (multiplied by versioned and fluent modules)

Notes:

  • may require versioned-snapshots module to indicate changed parents

Option B

Link is a DBField with serialised data (e.g. JSON).

Implemented by bannerblock. Harder to extend. Serialised JSON makes it hard to query individual fields (without using something like GitHub - phptek/silverstripe-jsontext: JSON storage and querying 1 and JSON database support). Harder to enforce consistent schema. Creates new dependency if done properly (e.g. through phptek’s module). Less native form handling (e.g. validation errors?)

Cons:

  • experimental option
  • not needed for MVP

Option C

Link is a CompositeField with individual database columns.

Harder to extend. Easy to query. Strong schema. Can be wrapped in arbitrary DataObject structures or relationships. Can “inline” columns into a DataObject to avoid relationship in cases where only one link is required. Can use lots of built-in editing UIs indirectly (e.g. GridField through DataObject). Less native form handling (e.g. validation errors?)

Cons:

  • no validation (that's a wider problem and we can live with it)

Potential data structure (columns): see PoC below

Extending: see PoC below

Frontend:

  • FormSchema (entwine/react)

Notes:

  • might also need a custom diff version viewer

Comparing the options

Option C is the most flexible way to store the data.
Option B could exist on top of Option C, if necessary, as a separate link type implementation.
Option A could exist on top of Option C, if necessary, as a separate data object containing the DBLink filed (Option C)

For users there's no much difference between Option A and Option B/C. The main difference is for devs implementing new link types.

The conclusion

Option C is preferable

Proof of Concept for the Option C (PoC)

Implemented in dev/rfc30/poc branch.

Base DbField implementation

ORM\DBField\Link is the base class extending DBComposite and providing the basic API for the children (particular link implementations).
The class is abstract, which guarantees all owners know exact link types they contain.
The abstract Link overrides DBComposite::compositeDatabaseFields() method returning a single own DB field - DBLink_Type plus the fields provided by child implementations via getCompositeDbFields.
For every link of the DataObject the field DBLink_Type stores the particular link implementation (class name).

The extension point for children is abstract public static function getCompositeDbFields(): array;. Every child (link type implementation) defines its own list of fields that would be added to the DataObject utilising the LinkField.

DataObject Extension

Extensions\DataObjectExtension provides extra capability on top of DataObject::populateDefaults that ensures all DBLink_Type fields are correctly initialized and contain correct values when creating empty DataObjects.

Link Type implementations

The PoC implements 3 examples:

Front-end rendering options

CMS rendering

The PoC is missing the CMS implementation for the fields, but it should be possible to provide the usual implementations for both Entwine and React front end methods.

Any particular FrontEnd implementation details are out of scope for this RFC (irrelevant for Data Model).

Validation

  • Backend Validation - may be implemented by FormFields. Extra validation may be implemented for every ORM\DBField\Link implementation (via overriding setValue method if necessary)
  • Frontend Validation - can be implemented via overriding FormField component implementations
  • Database Validation - data integrity enforced via DBLink_Type field and the strong schema structure for every link type. That won't allow persisting incorrect link types, even if a logical error happens in the application.

Data Structure

The provided PoC was tested with the following Page extends SiteTree DataObject model:

...

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'HyperLink' => LinkField\ORM\DBField\HyperLink::class,
        'SiteLink' => LinkField\ORM\DBField\SiteTreeLink::class,
    ];
}

That Page model produced the following database structure:

CREATE TABLE `Page` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `HyperLinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\HyperLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\HyperLink',
  `HyperLinkTitle` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `HyperLinkURL` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `SiteLinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink',
  `SiteLinkSiteTreeId` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `HyperLinkDBLink_Type` (`HyperLinkDBLink_Type`),
  KEY `SiteLinkDBLink_Type` (`SiteLinkDBLink_Type`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Both fields HyperLink and SiteLink only add their own fields to the final DataObject structure.

The only overhead in the generated schema is an extra *DBLink_Type field for every Link.
This means we only pay for what we use.

AnyLink

The suggested data model design (the Option C) is flexible enough to achieve AnyLink as a link type implementation on top of DBField\Link. See an example prototype implementation in the PoC - AnyLink.

The following Page implementation:

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'Link' => LinkField\ORM\DBField\AnyLink::class,
    ];
}

Produces the following data model:

Create Table: CREATE TABLE `Page` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `LinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\AnyLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\AnyLink',
  `LinkAnyLink_f2f0ebb1_LinkType` enum('SilverStripe\\LinkField\\ORM\\DBField\\Link','SilverStripe\\LinkField\\ORM\\DBField\\AnyLink','SilverStripe\\LinkField\\ORM\\DBField\\HyperLink','SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\Link',
  `LinkHyperLink_dba830f3_Title` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkHyperLink_dba830f3_URL` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkSiteTreeLink_09b5f94f_SiteTreeId` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `LinkDBLink_Type` (`LinkDBLink_Type`),
  KEY `LinkAnyLink_f2f0ebb1_LinkType` (`LinkAnyLink_f2f0ebb1_LinkType`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Thus, AnyLink is just another link type implementation that provides a composition of fields of all the other link type implementation and then can behave as a particular link type chosen by end users.

@chillu chillu mentioned this issue Dec 11, 2020
@chillu
Copy link
Member

chillu commented Dec 11, 2020

Had a catchup with Serge on this yesterday, and I agree with Option C. One tradeoff is that we can't leverage goerricoe/silverstripe-link as a data model baseline, and there's a less straightforward migration path from the ecosystem around this module. But they're all pretty straightforward implementations, so I'm not too worried.

Overall, the focus of this should be on creating a more user friendly UI (GridFields add too much friction). And to choose a data model which can support that.

@chillu
Copy link
Member

chillu commented Jan 11, 2021

Thanks for the updated Option C PoC Serge! Looking at the code, the DBCompositeField based structure seems to assume that the developer picks a certain link type upfront when defining their model. While that would keep down the amount of database columns, I reckon it's very much an edge case. In most cases, developers should be encouraged to give users access to all available link types in any context. For example, if a dev creates or installs a "linkable banner" block, the user by default should be able to insert links to files, pages, emails and external URLs. If a dev then creates a "DMS integration" with the ability to link to documents in that DMS, they shouldn't need to opt in to that everywhere a Link field is referenced in a DataObject, right?

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'Link' => LinkField\ORM\DBField\Link::class,
    ];
}

This would create the following database columns (roughly):

Link_Type (Enum of Hyperlink, SiteTree, File, Email)
Link_Title
Link_URL
Link_FileID
Link_SiteTreeID
Link_Email

Without a "container" class like this, I don't have faith in devs explicitly opting in to all potential link types. They'll just assume that a banner would only need links to pages, and once CMS authors identify a need to link to a PDF file it'll become an expensive code change.

@dnsl48
Copy link
Author

dnsl48 commented Jan 11, 2021

Thanks for the feedback, Ingo!

That's a valid concern. I have also considered the use case of choosing any link type, although I didn't add it to the initial PoC.
The suggested data model design (the Option C) is flexible enough to achieve that via implementing another link type on top of DBField\Link.

I have just added a prototype of such implementation to the PoC. See the AnyLink dbfield.

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'Link' => LinkField\ORM\DBField\AnyLink::class,
    ];
}

Produces the following data model:

  `LinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\AnyLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\AnyLink',
  `LinkAnyLink_f2f0ebb1_LinkType` enum('SilverStripe\\LinkField\\ORM\\DBField\\Link','SilverStripe\\LinkField\\ORM\\DBField\\AnyLink','SilverStripe\\LinkField\\ORM\\DBField\\HyperLink','SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\Link',
  `LinkHyperLink_dba830f3_Title` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkHyperLink_dba830f3_URL` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkSiteTreeLink_09b5f94f_SiteTreeId` int(11) NOT NULL DEFAULT '0',

@chillu
Copy link
Member

chillu commented Jan 12, 2021

Yep that looks better. Can you explain why we need a specific LinkType (LinkAnyLink_f2f0ebb1_LinkType) in addition to the generic LinkDBLink_Type field? In my view, those aliases (AnyLink, HyperLink, SiteTreeLink) are enough to create the right PHP behaviour from the data without hardcoding the FQCN in the database.

So in your example above, the Page already knows that the Link composite field is of type LinkField\ORM\DBField\AnyLink and can create the right PHP instances with that data. The AnyLink instance can then infer the class implementations based on the aliases via PHP or YAML config: HyperLink uses an instance of SilverStripe\\LinkField\\ORM\\DBField\\HyperLink. If another module wants to subclass HyperLink to add more behaviour and use the existing alias, that would be kept in PHP or YAML config. I'm just very allergic to overly verbose database columns, we already have an insanely large database schema in most production sites.

@dnsl48
Copy link
Author

dnsl48 commented Jan 12, 2021

Can you explain why we need a specific LinkType (LinkAnyLink_f2f0ebb1_LinkType) in addition to the generic LinkDBLink_Type field?

Maybe we don't need it. That's just a prototype implementation to prove the concept. The final implementation may be better optimised. The current implementation with a separate field for a link type of AnyLink, however, ensures the data integrity on the DB level. It may prevent some logical errors in the application (e.g. you won't be able to assign a HyperLink instance to the Page.Link field and will have to always provide it with an instance of AnyLink, because it's of type enum('SilverStripe\\LinkField\\ORM\\DBField\\AnyLink')).

I believe it is possible to have either implementation later and we don't have to "incorporate" that decision in the Data Model RFC.

@chillu
Copy link
Member

chillu commented Jan 12, 2021

I believe it is possible to have either implementation later and we don't have to "incorporate" that decision in the Data Model RFC.

In my view, the purpose of a "data model" centric RFC is to work through the data model issues, right? And the verbose "just in case" type fields in the database are a big part of that. You are correct that on an ORM level, accessing Page->Link() in your example will always need to return you an AnyLink instance. But when you call Page->Link()->forTemplate() or Page->Link()->URL(), it would inspect the Page.LinkType column and return you a template for the right type (e.g. HyperLink).

This brings up a somewhat related templating issue: How do you expect that we handle potential additional HTML attributes required by this new object? The most natural template implementation is <a href=$Link.URL>custom content</a>, but that makes it hard to add make target or rel configurable by CMS authors. Maybe that's rare enough that we can ignore it?

@dnsl48
Copy link
Author

dnsl48 commented Jan 12, 2021

And the verbose "just in case" type fields in the database are a big part of that.

I feel that decision is not critical for this RFC, because both AnyType implementations may co-exist with the same data-model described in this RFC. Eventually, the users may decide which one to use or implement their own implementations addressing even more edgy cases in their projects. I agree that it is important to decide the default implementation that we ship with the module, but I'm wary that may increase the amounts of discussions and prevent this RFC from becoming accepted for a while. On the other hand, we could accept this RFC and keep these questions for later (discuss separately in other GitHub issues). Such approach would allow us to keep working on other stuff in parallel, while we deciding the ideal implementation of the default AnyLink implementation.

How do you expect that we handle potential additional HTML attributes required by this new object? The most natural template implementation is <a href=$Link.URL>custom content</a>, but that makes it hard to add make target or rel configurable by CMS authors.

The AnyLink prototype in the PoC addresses that via casting AnyLink to HyperLink first and then rendering HyperLink in the place where AnyLink is put within the template. That means developers would have control over HyperLink template rendering via its own templates and class implementation.

The section Front-end rendering options of the RFC provides 2 ways to render links:

  • via templates
  • via class method XML

That may depend on the particular link type implementation (e.g. devs can extend HyperLink with their own implementation and provide different templates or XML method implementation)

that makes it hard to add make target or rel configurable by CMS authors

I would suggest the CMS authors should be able to configure these via FormField in CMS when they choose the type of AnyLink to be HyperLink.

@wilr
Copy link
Member

wilr commented Jan 12, 2021

How do you expect that we handle potential additional HTML attributes required by this new object? The most natural template implementation is custom content, but that makes it hard to add make target or rel configurable by CMS authors.

I imagine the base link model could follow the FormField API, getAttribute, setAttribute and in the templates provide a $AttributesHTML in the default template. That way people could inject / extend whatever attributes they wanted (tabindex, aria etc)

@maxime-rainville
Copy link

Had a glance at the RFC PR. I don't have major beef with it.

Developer experience

My main concern is making sure it's easy and straight forward for dev to define links on their data objects. I'm thinking we could create a Link alias for AnyLink like we do with other DBFiled.

So people could just define their links with:

private static $db = [
    'MyLink' => 'Link'
];

Or we could rename Link to something like AbstractLink' and rename AnyLinkto justLink`.

Basically, the average dev shouldn't have to invest a lot of time to understand the gut of the module to use it.

Type concerns

My gut feeling would be to have to keep a FQCN just to avoid any potential collision later on.

Other concerns

All the other concerns raised about templating and additional link attributes are secondary and can be handled later on. We're pretty far from shipping a stable product. If we make a bad decision we can come back and fix it later.

@dnsl48
Copy link
Author

dnsl48 commented Jan 18, 2021

We identified the following technical concerns regarding the PoC AnyLink implementation:

It aggregates database fields from every link type in the application. If we have 10 link types registered in the application, every one of those would have 3 fields (in average), the AnyLink would create 30 database fields (even though only 3 of those would be used for every particular link type). Multiplying that by several links for a data object makes that very inefficient.
E.g. Page has 5 links, 30 db fields by each would produce 150 database fields for that page.

We are going to work on an optimised AnyLink implementation in the following issue: #31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants