Sensible defaults for data retention and cleanup for SAP Commerce, based on my CX Works article Data Maintenance and Cleanup
-
Download the latest release
-
Unpack to
hybris/bin/custom
-
If possible, disable saved values / change history (ref. help.sap.com, further recommendations in my article)
-
Add extension to your
localextensions.xml
<extension name="sanecleanup" />
-
🔴 Adapt the retention rules to your project requirements 🔴
(check the available properties inproject.properties
) -
Build and deploy.
(The rules will be automatically imported during system update)If you get a build error regarding missing types like the example below:
- Open
sanecleanup-items.xml
- Search for the type
- Comment-out the whole
<itemtype>
tag
invalid index sanecleanup for type _TYPE_ on [...] declared at ((sanecleanup))::YIndex[sanecleanup-items.xml:...] due to missing enclosing type '_TYPE_'
- Open
WARNING
The very first execution of the retention cron jobs will take a while, depending on how long your poject
is already live and if you have cleaned up anything in the past.
Consider performing a one-time cleanup before adding the extension / enabling the retention rules.
Especially the first run of cronJobLogCleanupCronJob
will take a very long time, if you have never removed any cronjob log files (type LogFile
).
Please consider importing and executing the script job defined in bulkdelete-cronjoblogs.impex before you set up the automated cleanup!
The job will remove all log files except the five most recent logs per CronJob.
(Disclaimer: the script was tested on MS SQL / Azure SQL and SAP HANA. It is not guaranteed to work for other databases)
If have never even thought about that topic - yes!
You can run the following scripts in the administration console to get a quick overview:
excessive-platform-types.groovy
- Generates a report about "known troublemakers"check-audit.groovy
- Check if you have too many audit logs
Here are some additional queries and "rules of thumb" that help you investigate further:
Type(s) | Query | Notes |
---|---|---|
AbstractRule |
SELECT
COUNT({ar:pk}),
MIN({ar:modifiedtime}) AS "oldest",
MAX({ar:modifiedtime}) AS "newest"
FROM
{AbstractRule AS ar},
{RuleStatus AS rs}
WHERE
{ar:status} = {rs:pk}
AND {rs:code} = 'PUBLISHED'
AND {ar:enddate} IS NOT NULL
AND {ar:enddate} < getutcdate() |
Are there any outdated rules? i.e rules that aren't valid anymore because their enddate is in the past. Warning: change |
BusinessProcess |
SELECT
{p:processDefinitionName},
{s:code} AS "status",
COUNT({p:pk}) AS "total",
MIN({p:modifiedTime}) AS "oldest",
MAX({p:modifiedTime}) AS "newest"
FROM
{BusinessProcess AS p
LEFT JOIN
ProcessState AS s
ON {p:state} = {s:pk} }
GROUP BY
{p:processDefinitionName},
{s:code}
ORDER BY
"total" DESC |
Are there too many (let's say > 1000) or very old BusinessProcess in your system? Also, if a lot of processes are stuck in "RUNNING" / "WAITING", you have to investigate what's wrong. (What is causing your processes to be stuck?) |
Cart |
SELECT
{b:uid} AS "BaseSite",
{u:uid} AS "USER",
CASE
WHEN
{c:saveTime} IS NULL
THEN
'regular'
ELSE
'saved'
END
AS "cart type",
COUNT({c:pk}) AS "total",
MIN({c:modifiedtime}) AS "oldest",
MAX({c:modifiedtime}) AS "newest"
FROM
{ Cart AS c
LEFT JOIN
USER AS u
ON {c:user} = {u:pk}
LEFT JOIN
BaseSite AS b
ON {c:site} = {b:pk} }
GROUP BY
{b:uid}, {u:uid},
CASE
WHEN
{c:saveTime} IS NULL
THEN
'regular'
ELSE
'saved'
END
ORDER BY
"total" DESC |
|
CronJob (auto-generated) |
SELECT
{t:code} AS "CronJob Type",
COUNT({c:pk}) AS "total",
MIN({c:modifiedtime}) AS "oldest",
MAX({c:modifiedtime}) AS "newest"
FROM
{CronJob AS c
JOIN
ComposedType AS t
ON {c:itemtype} = {t:pk}
LEFT JOIN
TRIGGER AS trg
ON {trg:cronjob} = {c:pk} }
WHERE
{trg:pk} IS NULL
AND {c:code} LIKE '00%'
AND {t:code} IN
(
'ImpExImportCronJob',
'CatalogVersionSyncCronJob',
'SolrIndexerCronJob'
)
GROUP BY
{t:code}
ORDER BY
"total" DESC |
Are there too many (>10) outdated, auto-geneated jobs in your system? |
CronJobHistory |
SELECT
{cj:code},
COUNT({h:pk}) AS "total",
MIN({h:modifiedtime}) AS "oldest",
MAX({h:modifiedtime}) AS "newest"
FROM
{cronjobhistory AS h
JOIN
cronjob AS cj
ON {h:cronjob} = {cj:pk} }
GROUP BY
{cj:code}
ORDER BY
"total" DESC |
Is there any job with > 50 histories and/or histories older than an hour? This cleanup is enabled by default in recent SAP Commerce patch releases, so this query shouldn't find anything. |
EmailMessage |
SELECT
{bp:processDefinitionName} AS "source",
{m:sent},
COUNT({m:pk}) AS "total",
MIN({m:modifiedtime}) AS "oldest",
MAX({m:modifiedtime}) AS "newest"
FROM
{EmailMessage AS m
LEFT JOIN
BusinessProcess AS bp
ON {m:process} = {bp:pk} }
GROUP BY
{bp:processDefinitionName},
{m:sent}
ORDER BY
"total" DESC |
|
ImpExImportCronJob (distributed impex) |
SELECT
{s:code} AS "status",
COUNT({i:pk}) AS "total",
MIN({i:modifiedtime}) AS "oldest",
MAX({i:modifiedtime}) AS "newest"
FROM
{ImpExImportCronJob AS i
LEFT JOIN
CronJobStatus AS s
ON {i:status} = {s:pk} }
WHERE
{i:code} LIKE 'distributed-impex-%'
GROUP BY
{s:code} |
|
ImpexMedia |
SELECT
COUNT(*)
FROM
{ImpexMedia AS i}
WHERE
(
{i:code} LIKE '0_______'
OR {i:code} LIKE
'generated impex media - %'
) |
Are there more than a handful (>100) of generated impex medias? |
ImportBatchContent |
SELECT
COUNT({c:pk}) AS "total",
MIN({c:modifiedTime}) AS "oldest",
MAX({c:modifiedTime}) AS "newest"
FROM
{ImportBatchContent AS c
LEFT JOIN
ImportBatch AS b
ON {b:importContentCode} = {c:code} }
WHERE
{b:pk} IS NULL |
Are there any left-over distributed import batches? |
LogFile |
SELECT
COALESCE({cj:code}, '<null>'),
COUNT({l:pk}) AS "total",
MIN({l:modifiedtime}) AS "oldest",
MAX({l:modifiedtime}) AS "newest"
FROM
{LogFile AS l
LEFT JOIN
CronJob AS cj
ON {l:owner} = {cj:pk} }
GROUP BY
{cj:code}
ORDER BY
"total" DESC |
Are there are cronjob with more than ~10 logs and/or logs older than 14 days? (those are default values for log file retention) |
ProcessTaskLog |
-- Query tested with MS SQL
-- Adjust the date calculation for
-- other databases
SELECT
COUNT({l:pk}) AS "total",
MIN({l:modifiedtime}) AS "oldest",
MAX({l:modifiedtime}) AS "newest"
FROM
{ProcessTaskLog AS l}
WHERE
{l:creationTime} < DATEADD(
MONTH,
-2,
GETUTCDATE()
) |
We recommend customer to BusinessProcess cleanup, which will eventually take care of TaskLogs cleanup. There might be the few scenarios for ProcessTaskLog cleanup:
|
SavedValues,SavedValueEntry |
-- total SavedValue / SavedValueEntry
SELECT
*
FROM
(
{{
SELECT
'SavedValues' AS "type",
COUNT({s:pk}) AS "total"
FROM
{savedvalues AS s} }}
UNION ALL
{{
SELECT
'SavedValueEntry' AS "type",
COUNT({e:pk}) AS "total"
FROM
{savedvalueentry AS e} }}
)
summary
-- SavedValues per item
SELECT
{s:modifiedItem} AS "item",
COUNT({s:pk}) AS "total",
MIN({s:modifiedtime}) AS "oldest",
MAX({s:modifiedtime}) AS "newest"
FROM
{SavedValues AS s }
GROUP BY
{s:modifiedItem}
ORDER BY
"total" DESC
-- orphaned SavedValueEntry
-- (there shouldn't be any)
SELECT
COUNT({e:pk}) AS "total",
MIN({e:modifiedtime}) AS "oldest",
MAX({e:modifiedtime}) AS "newest"
FROM
{SavedValueEntry AS e
LEFT JOIN
SavedValues AS s
ON {e:parent} = {s:pk} }
WHERE
{s:pk} IS NULL |
A lot of those items accumulated over the project lifetime.
If possible, disable storing saved values. ( |
SolrIndexOperation |
SELECT
{i:qualifier},
COUNT({o:pk}) AS "total",
MIN({o:modifiedTime}) AS "oldest",
MAX({o:modifiedTime}) AS "newest"
FROM
{SolrIndexOperation AS o
LEFT JOIN
SolrIndex AS i
ON {o:index} = {i:pk} }
GROUP BY
{i:qualifier}
ORDER BY
"total" DESC |
Too many solr operations (more than ~100 per index)? |
StoredHttpSession |
SELECT
COUNT({s:pk}) AS "total",
MIN({s:modifiedtime}) AS "oldest",
MAX({s:modifiedtime}) AS "newest"
FROM
{StoredHttpSession AS s} |
Excessive amount of session? This is hard to generalize as it highly depends on your site's traffic, but if you are near or over 5 digits, it's probably too much. Simarly, stale sessions (e.g older than a day) don't need to be retained. |
TaskCondition |
SELECT
COUNT({tc:pk}),
MIN({tc:modifiedtime}) AS "oldest",
MAX({tc:modifiedtime}) AS "newest"
FROM
{TaskCondition AS tc }
WHERE
{tc:task} IS NULL |
Is there an excessive amount of "premature events"? Or very old (older than a a few weeks) events? |
Please open an issue describing your problem or your feature request.
Any and all pull requests are welcome.
Please describe your change and the motiviation behind it.