-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Establish retention policy and retent old data #3532
Comments
I just want to ack this request, and say that I don't know the answer to this yet, will have to do some internal research to figure out what applies here. |
@dstufft could you please reply to this thread with your data? cc @ewdurbin with his PSF hat on. Per our meeting today @nlhkabu is going to do some research on this toward #5863, understanding best practices & prior art in other similar sites. Simply Secure may have some good resources on this. |
I couldn't find anything on Simply Secure, but I did manage to find a couple of other sources: GDPR
National Cyber Security Center (UK Gov)This is a very useful guide: https://www.ncsc.gov.uk/guidance/introduction-logging-security-purposes
Prior art
|
Please note that the retention policy must not only include server log files, but also the action log for each of the packages. |
FWIW, exact time and IP address do serve a forensic purpose: they make it easier to triage and establish provenance when doing a postmortem. As an example: Project Foo has had 50 releases, 45 of which came from an IP range publicly associated with a hosting provider (probably CI) and published within 5 minutes of midnight at timezone X (probably a cronjob). The last 5 releases came from varying IPs, some of which show up in blacklists, and upload times indicate timezone Y. In terms of policy, it might make sense to research (if any research exists?) the average time between package breach and discovery/triage and use that (with a sufficient window) as the baseline for removing IPs and exact timestamps. |
Keep in mind: Privacy is a Human Right, but there is not right for forensics.
Obviously there as been none for the last 10 years. Thus there is no need to keep this data. Keeping data just for the vague case someone, somewhen might eventually be interested in this data is not a reason, but data retention without legal base. According to EU-GDPR neither forensics nor research are reasons to give date retention precedence over legally the persons rights. |
I found that pypi stores IP addresses and exact action dates for several years, e.g.
create | Nov 24, 2008, 11:33:29 AM | htgoebel from 79.207.178.171
According to the privacy policy there are four reasons to store the data. FMPOV none of these reasons requires storing this exact data for almost 10 years. The day and user might be of interest, but the exact time and IP address for sure is not.
Please establish a retention policy for delete old data and then delete this old data. Thanks!
Background: As you might know, the European General Data Protection Regulation (GDPR) requires all services offering to the European market to have a retention policy. Also an European court as decided that IP (v4) addresses are personal are personal data too.
The text was updated successfully, but these errors were encountered: