Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/auditable behavior #12

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

mringler
Copy link
Owner

@mringler mringler commented May 22, 2024

Creates a table storing audit information for the table holding the behavior.

CRUD operations on the audited table will automatically create entries in an automatically generated audit table, describing the change.

Parameters

The behavior accepts most of the parameters from synced table behavior (see #10). Additionally, auditable behavior uses these parameters:

Parameter Value Description Default
id_field_name literal Name of the id column in the audit table audit_id
audited_at_field_name literal Name of the column storing the audit timestamp audited_at
audit_event_field_name literal Name of column storing the audit event (insert, update, delete) audit_event
changed_values_field_name literal Name of the column storing the changed values changed_values
changed_values_field_type literal Propel text column type of the changed values column JSON
changed_values_field_size number Size of the changed values column null
ignore_fields CSV Names of columns that are not part of the audit null
omit_value_fields CSV Names of columns where audit should not include column value (like passwords) null
omit_value_types CSV Propel column types where audit should not include column value BLOB, CLOB
omit_value string The default placeholder used for values of omitted fields and types in the audit changed
audited_columns_on_insert CSV By default, the insert audit is empty, this parameter allows to specify columns that should be added. null
cascade_delete true|false Remove audit when source row is deleted. false
add_aggregation_to_source literal|true|false Adds a column storing the number of audits for each row to the source table. Value is either name of column or "true" for "number_of_audits". false

This implementation is somewhat specific, in that it tries to avoid duplicating the whole row data on input. Subsequent updates store the overridden values, which allows to rebuilt the row history backwards from the current row data. This approach has benefits, but it comes at a cost, particularly its fragility to manual row updates without audits.

The problem with manual updates that don't add audits are not just unaccounted, but they lead to wrong audits. Currently, the stored data of an audit would look something like this:

TYPE USER VALUES
insert A
update B fruit=apple

The recorded values are the ones being overridden by the update, which is why no values need to be recorded on insert. If current value of the fruit column is "orange", you would get this audit:

  • user A inserted fruit=apple
  • user B updated fruit=orange

However, the actual events with manual update might have been:

  • user A inserted fruit=null
  • manual update fruit=apple
  • user B updated fruit=pear (storing override of value "apple")
  • manual update fruit=orange

So instead of showing unaccounted changes, the audit just claims that the users inserted data which they did not insert. This is very dicey, if acceptable at all.

Having to rebuild the audit after reading it from DB adds another layer of possible mistakes and errors.


The clean approach would be to store the new values, instead of the overridden ones:

TYPE USER VALUES
insert A fruit=null
update B fruit=orange

possibly even the old values if you want to have at least some idea about manual updates:

TYPE USER OLD NEW
insert A fruit=null
update B fruit=apple fruit=pear

With this, the reported user input is always correct, even if the values do not match the row values due to manual updates.

However, this requires to duplicate all values on input, even when you know that most values will never change.

So if you know that there are no manual updates (or those updates also add audits), the first approach is more efficient, as it allows for faster inserts and requires less space.


Looking back, going the easy route would have spared me time and nerves, but I really didn't want to make full duplicates (I am dealing with rather large text columns that typically won't change after insert).
But with all the caveats and sensibilities, I am not sure if this works in a general case.

@mringler mringler force-pushed the feature/auditable_behavior branch 6 times, most recently from 7517893 to 0d11e67 Compare May 24, 2024 09:03
@mringler mringler force-pushed the feature/auditable_behavior branch 2 times, most recently from e9689a5 to 5dfd469 Compare May 28, 2024 12:58
Copy link

codecov bot commented May 28, 2024

Codecov Report

Attention: Patch coverage is 60.84538% with 352 lines in your changes missing coverage. Please review.

Project coverage is 69.58%. Comparing base (ef6e175) to head (5dfd469).
Report is 23 commits behind head on main.

Current head 5dfd469 differs from pull request most recent head a086023

Please upload reports for the commit a086023 to get more accurate results.

Files Patch % Lines
...pel/Generator/Behavior/SyncedTable/TableSyncer.php 58.37% 87 Missing ⚠️
...rator/Behavior/Versionable/VersionableBehavior.php 0.00% 64 Missing ⚠️
...nerator/Behavior/Archivable/ArchivableBehavior.php 0.00% 31 Missing ⚠️
...rator/Behavior/ConfigStore/ConfigStoreBehavior.php 0.00% 30 Missing ⚠️
...avior/SyncedTable/EmptyColumnAccessorsBehavior.php 0.00% 29 Missing ⚠️
...rator/Behavior/SyncedTable/SyncedTableBehavior.php 63.75% 29 Missing ⚠️
...nerator/Behavior/ConfigLoad/ConfigLoadBehavior.php 0.00% 25 Missing ⚠️
...ior/SyncedTable/SyncedTableBehaviorDeclaration.php 71.42% 18 Missing ⚠️
...erator/Behavior/ConfigStore/ConfigurationStore.php 0.00% 14 Missing ⚠️
...nerator/Behavior/ConfigStore/ConfigurationItem.php 0.00% 7 Missing ⚠️
... and 7 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main      #12      +/-   ##
============================================
+ Coverage     68.56%   69.58%   +1.01%     
- Complexity     8065     8305     +240     
============================================
  Files           232      259      +27     
  Lines         24550    25467     +917     
============================================
+ Hits          16832    17720     +888     
- Misses         7718     7747      +29     
Flag Coverage Δ
5-max 69.58% <60.84%> (+1.01%) ⬆️
7.4 69.58% <60.84%> (+1.01%) ⬆️
agnostic ?
pgsql 69.58% <60.84%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mringler mringler force-pushed the feature/auditable_behavior branch 7 times, most recently from 5dca523 to 540d035 Compare May 30, 2024 23:56
@kasparsatke
Copy link

kasparsatke commented Jun 29, 2024

Hi Moritz,

following up on your last comment in our discussion (propelorm#1983 (comment)) on the main repo - one way to check back for manual DB edits could be accomplished with triggers.

The triggers on the audit table (BEFORE and maybe also AFTER) instantiate a communication with Propel. Propel will then know if there have been any manual CRUD on the audio table.

As Propel should basically only ever do INSERT operations on the audit table (new row) if I understand you right - Propel should hand over a UUID or hash (each maybe salted with environment specific data) that is valid only for the time of the specific operation Propel does on the audit table. This value should be saved in another row field of the audit table (so you have two fields on every insert row in the audit table) and the UUID should maybe also be updated on each operation on the original table.

The UUID can basically function as a lock value. On any conducted operation on the DB, the triggers will give back the value to Propel. An additional value to check would be the connection id which should also be unique.

Communication with Propel can be done for example via a User Defined Function to call Propel. Although UDFs are a bit hacky and seem to be not evenly supported across different DB systems. Another option to call Propel might be a service broker.

If it’s basically not feasible at all to let the DB call Propel by itself there would still be the option to check the UUID of the row(s) of the audit table everytime Propel does a DB operation. Then the last UUIDs (and connection ids?) should probably be stored somewhere (encrypted?) by Propel to have a reference. One could also compare with the log files of the DB (if activated).

In this way you could in theory alert Propel for external changes in the DB. Additionally, one could try to never release the write lock on the audit table. Yet this is prone to fail with connection timeouts as the lock will be released after a certain timeout I believe.

All in all, each of the ideas provided above seem to be not 100% reliable/feasible yet a combination of them could accomplish a quite good safety measure for dealing with external DB edits.

As mentioned in propelorm#1983 (comment), there would also still be the possibility to move the audit table completely to a dedicated DB which uses the archive engine to prevent UPDATE and DELETE operations.

I hope, these ideas can be a way to make your auditable behavior more reliable.

All the best,
Kaspar.
P.S.: Ich merke gerade - jetzt hab ich die ganze Zeit in Englisch geschrieben, obwohl ich gar nicht weiß, ob hier noch jemand anderes die Diskussion verfolgt. 😁

@kasparsatke
Copy link

Addendum to my last option above:

While doing a somewhat unrelated Google query for „propel orm leadership“, I stumbled upon an elegant solution Propel supports to work with several DBs at once. I wonder how Google sometimes creates its results. Maybe because I did wildly search for different Propel related stuff today. 😄

Anyways: Propel allows to directly access different DBs within one schema.xml even with FKs across the different DBs which are then directly callable as related objects. One has has just to declare a schema parameter for the core DB within the schema.xml and than a foreignSchema parameter for every foreign key that lies within in a different DB, e.g. a DB with the archive engine and audit table. It is also possible to define the foreign DB directly within the same schema.xml.

Example (in German) given here: https://ansas-meyer.de/programmierung/mysql/propel-fremdschluessel-mehrere-datenbanken/

I hope that adds another puzzle piece towards sketching a solution.

Best,
Kaspar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants