-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically freeze modules after 3 years without updates #96
Conversation
What if it creates a new branch for each mod to freeze? This would solve the issue of the rebase. However this could clutter the repository with unused branches over time :/ |
I don't relish the notion of spamming the repo with hundreds of pull requests on the first pass. |
I'll review this properly when I wake up properly, but you do know my opinion already on making sacrifices to operational code relating to historical data 😉 We probably should list them, do 90% in single hand crafted PR and let the new task munch on the rest. |
cat netkan.json | jq 'to_entries | .[] | select(.value.last_indexed == null) | .key' | tr -d '"' > never.list
cat netkan.json | jq 'to_entries | .[] | select(.value.last_indexed and (.value.last_indexed | startswith("2015"))) | .key ' | tr -d '"' > 2015.list
cat netkan.json | jq 'to_entries | .[] | select(.value.last_indexed and (.value.last_indexed | startswith("2016"))) | .key ' | tr -d '"' > 2016.list
cat netkan.json | jq 'to_entries | .[] | select(.value.last_indexed and (.value.last_indexed | startswith("2017"))) | .key ' | tr -d '"' > 2017.list
cat netkan.json | jq 'to_entries | .[] | select(.value.last_indexed and (.value.last_indexed | startswith("2018"))) | .key ' | tr -d '"' > 2018.list
cat netkan.json | jq 'to_entries | .[] | select(.value.last_indexed and (.value.last_indexed | startswith("2019"))) | .key ' | tr -d '"' > 2019.list
wc -l *
> 315 never.list
> 10 2015.list
> 243 2016.list
> 96 2017.list
> 132 2018.list
> 1000 2019.list (Thanks @HebaruSan for letting me know of that tool, it's great!) There are indeed quite a few mods to start with. Also could we use the auto-freeer to prune all the frozen netkans (including manually frozen ones) from the status db? |
Nice! I forgot we could get the
Also it would be nice to recover the |
EDIT: Moved to #97 instead, since it's independent and should run first. |
5cba2c1
to
5df86da
Compare
5df86da
to
4c37691
Compare
|
4c37691
to
ac2f6f4
Compare
In hindsight, a single branch is probably ok. As long as our workflow is to delete the branch, the next time the Auto Freezer runs it'll pull a new copy of the repo and make a fresh branch from the latest version of master. I'll give it a proper review after work. |
Do we really need to worry about this? I say it's not worth adding logic for a rather edge case problem. If someone revives a mod, unfreezing it will inflate it straight away and update last indexed. |
Maybe, maybe not. I had a nagging sense at the back of my mind that I'd missed something, and that was an attempt to bring it to the foreground. If we make a hard assumption that we will always freeze these mods, then submitting a pull request is presenting a choice (merge or not) that isn't a choice at all, and we should just push to master instead. |
Note to self, how to list the stale mods from the command line:
|
I saw it as more of a sanity check rather than a choice. Ie something went wrong and it tried to freeze all the Netkans and we didn't notice for a few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see anything that should be a problem and the consequences of it not working are either reversible (revert a commit) or will result in an exception that will get logged. So I'll merge it in and give it a run! We might be able to index even more frequently if we're only tracking active mods. Three years is quite generous as well, but we can always build a formalised policy around this.
Eg: No mod releases for x releases of KSP and older than X days will result in a netkan being frozen (possibly adding a x_netkan_never_freeze property or something if we really must)
|
||
def freeze_idle_mods(self, days_limit): | ||
update_cutoff = datetime.datetime.now() - datetime.timedelta(days=days_limit) | ||
self.netkan_repo.remotes.origin.pull('master', strategy_option='ours') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unnecessary, but won't hurt anything. The container will clone the entire repo every run.
logging.info('Marking frozen: %s', mod.ModIdentifier) | ||
mod.frozen = True | ||
batch.save(mod) | ||
logging.info('Done!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bit of double handling, but rate limited DynamoDB writes are part of the free tier. We could potentially do this with a webhook, but I think doing this once a week is ok for now and will catch when we mark things as frozen by hand in the interim.
To note batch_write
requires some special handling, but we're getting rate limiting on our writes as a consequence of our ModStatus scan.
logging.info('Unable to fetch %s', name) | ||
|
||
(getattr(self.netkan_repo.heads, name, None) | ||
or self.netkan_repo.create_head(name)).checkout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably build a function for branch checkouts, one with tests, that we can use as they can be right finicky. Checking out the branch is one of the first things we do, so if it blows up it'll get logged and the container will bail out.
Motivation
Sorting http://status.ksp-ckan.space/ by Last Indexed, it's apparent that hundreds of indexed mods haven't been updated in 3 years or longer (maybe 40% of the total going by the scrollbar). These probably will never be updated (either because they don't need updates or because they're abandoned), yet we crawl them every day, some several times per day. This incurs extra CPU, network, and possibly RAM consumption.
Changes
Now once a week we run a new
auto-freezer
command that checks whether any of the currently indexed mods have aModStatus.last_indexed
value 1000 days ago or earlier (which is pretty close to 3 years). (This would be true, except that the legacy Perl webhooks never updated the status, so aNone
is treated as infinitely old, since it means the mod has not been indexed since we started tracking its status.None
value is unreliable. See #97 for help with that.) All such mods are renamed toidentifier.frozen
on afreeze/auto
branch, one commit per mod. This branch is pushed to GitHub and a pull request is created to allow team members to manually review the results and merge to master if appropriate. This will allow us to reduce the workload of the bot over time.To support pushing to KSP-CKAN/NetKAN, its remote string is updated to SSH format.
In addition, a new
ModStatus.frozen
property exists. After its normal pass, the auto freezer command will set this property toTrue
for mods that don't have an active .netkan file. This will allow us to create a filter on the status page to hide such mods.Help wanted
Things I couldn't figure out:
freeze/auto
branch needs to have new mods added to it as they're added to master, ideally without losing previous work, something likegit rebase master
. But I couldn't figure out how to do that in GitPython.Any help would be very much appreciated.