-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archive ActionLogRecord entries #5916
Comments
|
@landreev in Slack, 11:58 AM Tue Jan 26, 2021
💩 |
Thought it might be helpful to describe how I've used the info in this actionlogrecord database table: To help quantify curation activities on the Harvard Dataverse repository for an infographic that used to be posted to the IQSS website's metrics page. The metrics for that page are now being pulled from RT tickets, specifically how the tickets are tagged. I think the curation team is meeting this week to talk about this (e.g. how well is our use of the RT ticket tags, both the tagging and the counting of tags, representing curation activity?), so I'm not sure if what's in the actionlogrecord could continue to be used for this. Also I used the actionlogrecord table to get a sense of and report the number of spam Dataverse collections in Harvard Dataverse that were deleted during the great spamming of 2020. And the table is used in a few helpful queries in that Dataverse Database Queries Google Doc, like for counting the number of times that the Private URL and Submit for Review features have been used, which I think went into decisions about prioritizing GitHub issues related to those features. In all of these cases, I don't think more than one or two year's worth of information has been needed. If the actionlogrecord table is trimmed, could the last two years be retained? I was also told a few years ago, although I forgot who said it, that the actionlogrecord table wasn't really designed and isn't maintained for uses like these. That makes perfect sense, and I've always wondered how these uses of the actionlogrecord table intersect with other efforts, like talks about an integration with a curation tool developed at Yale and work on sending to and using event data from Google Analytics. This is all to say that maybe using the actionlogrecord in the ways I described should be considered temporary while more maintainable ways of getting this information are explored. |
|
I'd like to address this quickly, before the upcoming Postgres upgrade and transfer. So, no automated mechanisms in the application - just dumping the table and saving the backup on S3, in case anyone needs to look up past entries. And adding 2 words in the admin guide: "btw, this table is going to be constantly growing; so it'll end up being huge on a busy installation. This is how we trim it once in a while: ..." |
|
As was mentioned in the opening comment, the size of the exported tables in SQL format is not an accurate representation of the database space use, as stored. Still, somewhat notable that, as of now, the actionlogrecord table accounts for almost 3/4 of the total SQL dump of the prod. db (32GB out of 44GB). Up from 1/2 back when the issue was opened. |
OK, a serious bloodletting has been administered to the database - purged everything up to 2021. Will post info on where the older records have been archived. |
Thanks @landreev. I'll move this over to in progress on the board. It sounds like what's remaining is:
|
All good; except for the maintenance page reference, above/in the post-planning update 5 days above - not sure what that was about. It may have been something from another conversation that ended up in this issue by mistake. (?) |
@landreev I understood that we wanted other installations to be able to go through the same process that we just did when their DBs start to grow because of the actionlogrecord entries, so we wanted to provide something on that page. If this is routine for a sysadmin and/or will be a relatively unique process for each installation depending on their setup, I'd be OK not adding the docs. What do you think? |
Oh, ok - yes, I was going to document it. I'll find a spot for a section on this in the admin guide somewhere. |
@landreev - oh yes, you're correct, that's not the right spot. Thanks for documenting and putting it somewhere more appropriate! |
As I mentioned last week, I'm now wondering if we should add a simple auto-cleanup option to the application, as part of this issue, while we are at it. The way it would work, an installation could specify a cutoff time period (as a number of days, probably?) - and then all the actionlogrecords older than that would be erased automatically every night, by a timer job. We would emphasize in the documentation/release note that it would be 100% the responsibility of the administrator to archive the records outside of the database before they are erased, in case they see any value in preserving them. I can see how having this option could be handy for an installation owner who doesn't see much of a need for keeping these around, and would prefer not having to worry/think about the table growing out of control. For an installation that does not enable the option, nothing changes, the records live on forever, until/unless manually removed. I'm curious what others think. If this feels too out of scope, I'm happy to document the problem and the manual solution in the guide and close the issue, as was the plan we agreed on earlier. |
My two cents are that I'm ok with this issue being about documenting the problem and the fix. A future issue could be about automating the fix. |
doc entries on trimming the actionlogrecord table. (#5916)
In the current implementation, action log entries keep accumulating unimpeded.
A quick (and somewhat anti-scientific) check suggests that our own prod. database is more than 50% action log entries by size, by now.
(the test performed was comparing the total output of
pg_dump
of the full database, vs.pg_dump --table actionlogrecord
. It's "anti-scientific" because the output is generated as sql text that does not correspond 1:1 to the size of the data structures stored. But the ratio should be about right as a ballpark).The text was updated successfully, but these errors were encountered: