Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy screenshot usage in repository (size and number of files) #2699

Closed
MikeMcC399 opened this issue Apr 6, 2022 · 11 comments
Closed

Heavy screenshot usage in repository (size and number of files) #2699

MikeMcC399 opened this issue Apr 6, 2022 · 11 comments
Assignees
Labels
enhancement Improvement of an existing feature mirrored-to-jira screenshots New Screenshots provided technical-issue

Comments

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Apr 6, 2022

Where to find the issue

https://github.com/corona-warn-app/cwa-website/tree/master/src/assets/screenshots

Problem description

The screenshots directory in the cwa-website repository uses a large amount of space and includes a high number of files.

Each app version release causes a complete set of images to be committed to the repository for the combinations iOS/English, Android/English, iOS/German and Android/German. Many of these files have not changed between versions, meaning that there is a high level of duplication.

Metric Screenshots Total web
Size 966 MB / 70% 1.38 GB / 100%
Files 11.296 / 88% 12.881 / 100%

Summary

Screenshots cause ...

  • High use of space in total
  • High number of files in total
  • High number of files in each release PR
    • difficult to review
    • GitHub "Files changed" function in PRs does not work well with so many image files

Suggested change

De-duplicate the screenshot images and use a method where only changed screenshots need to be published.

Advantages

  • Reduce the size and complexity of the repository
  • If only changes need to be published with each release then reviewing the changes will be simpler. Currently more than 600 files are part of each related PR.
  • GitHub "Files changed" for reviewing PRs will work better

Internal Tracking ID: EXPOSUREAPP-14534

@MikeMcC399 MikeMcC399 added the enhancement Improvement of an existing feature label Apr 6, 2022
@dsarkar dsarkar added screenshots New Screenshots provided technical-issue labels Apr 6, 2022
@MikeMcC399 MikeMcC399 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2023
@MikeMcC399
Copy link
Contributor Author

MikeMcC399 commented Jan 4, 2023

cwa-website now takes up a massive 5.3 GB space on disk.

git is not really suited to managing large numbers of image files as it tries to keep differences on each of them. The hidden .git folder is taking up more than 3GB. For comparison the same folder for the Android app is 590MB.

I cannot copy the website over the network any more. I get an error:

error: RPC failed; curl 56 Recv failure: Connection was resets
send-pack: unexpected disconnect while reading sideband packet

I created a copy in order to test workflows separately, so I may not be able to contribute in this area any more. 🙁

It looks like some heavy git maintenance may be necessary!

Edit: Today (Jan 13, 2023) I was able to copy the website. It took quite a long time and I received a warning about large files.

@MikeMcC399 MikeMcC399 reopened this Jan 4, 2023
@dsarkar
Copy link
Member

dsarkar commented Jan 5, 2023

@MikeMcC399 ok, we will check, how to remove redundant image files.

@dsarkar
Copy link
Member

dsarkar commented Jan 5, 2023

Internal Tracking ID: EXPOSUREAPP-14534

@MikeMcC399
Copy link
Contributor Author

@dsarkar

ok, we will check, how to remove redundant image files.

Thank you! Unfortunately I don't think it is quite that simple, because even if files are removed they are still present in the history. To solve the issue it probably means removing files and rewriting the git history. I don't know if that is feasible without breaking things.

@Ein-Tim
Copy link
Contributor

Ein-Tim commented Jan 7, 2023

I'd also very much appreciate it if the file size of this repository would be decreased, as it takes 6 GB of disk space on my disk!

@MikeMcC399
Copy link
Contributor Author

MikeMcC399 commented Jan 7, 2023

Some things which could be considered:

  • remove old versions of screenshots (currently version 2.2 to 2.28 i.e. 27 sets of English/German Android/iOS screenshots)

See git-filter-repo, which has functionality for removing objects from the git history. It also includes the command

git filter-repo --analyze

to investigate the .git space usage. This shows active and deleted file usage. It already shows high space usage of deleted screenshots files. (Analysis report: filter-repo.zip.)

See GitHub help About large files on GitHub including:

and references to removing files.


I suspect however, that this is not solvable with consideration to a reasonable amount of effort within the lifetime of this project.


https://github.com/git-guides#what-is-git

"Git stores changes in SHA hashes, which work by compressing text files. That makes Git a very good version control system (VCS) for software programming, but not so good for binary files like images or videos."

@MikeMcC399
Copy link
Contributor Author

  • I noticed PR Experimental: Test cwa-website symlink handling #3327 related to this issue
  • I ask the question: Is it necessary to store a complete history of all screenshot versions? I assume that most users would only be interested in the current version and the previous version of the app.

@larswmh
Copy link
Member

larswmh commented Jan 18, 2023

@MikeMcC399

I ask the question: Is it necessary to store a complete history of all screenshot versions? I assume that most users would only be interested in the current version and the previous version of the app.

Keeping screenshots for all versions of the app was specifically requested by the stakeholders

@MikeMcC399
Copy link
Contributor Author

@larswmh

Keeping screenshots for all versions of the app was specifically requested by the stakeholders

Perhaps the stakeholders could reconsider whether they want to continue holding on to this requirement, given the major impact on the GitHub repository? If the stakeholders want the screenshots to be available for themselves, rather than for app users, then it might be possible to set up an alternate archive site for stakeholders use. Then the older versions could be removed from the main site.

@larswmh
Copy link
Member

larswmh commented Feb 20, 2023

@Ein-Tim @MikeMcC399

It was internally decided that we are not going to put further effort into this issue considering this projects remaining timespan and the unforseeable impact this could have in the long run. Git LFS would not reduce the repository size, but store large files in a more favorable environment for git. Symlinks are a nice workaround but are tedious to work with because of their complexity.

However, as seen for the upcoming version 3.1, we started linking to screenshots of previous versions. In the future, the screenshot size will only grow marginally.

We also decided to keep the git history as it is for transparency reasons.

Thanks for your understanding.

@larswmh larswmh closed this as not planned Won't fix, can't repro, duplicate, stale Feb 20, 2023
@MikeMcC399
Copy link
Contributor Author

@larswmh

Many thanks for reviewing the issue and coming to a decision. After researching the issue myself and the impact of various strategies I fully understand the decision not to change very much considering the remaining project time.

It's good that there is now visibility to this issue and that the screenshot linkage will help prevent the problem getting worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of an existing feature mirrored-to-jira screenshots New Screenshots provided technical-issue
Projects
None yet
Development

No branches or pull requests

4 participants