Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenXLSX does not do garbage collection in sharedStrings.xml #193

Closed
afalkenhahn opened this issue Oct 5, 2022 · 3 comments
Closed

OpenXLSX does not do garbage collection in sharedStrings.xml #193

afalkenhahn opened this issue Oct 5, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request resolved This issue has been resolved.

Comments

@afalkenhahn
Copy link

When overwriting a string in a cell with a new, unique string, the new string is simply appended to the end of the table in sharedStrings.xml without the old string getting removed from sharedStrings.xml in case it's no longer used.

@aral-matrix aral-matrix self-assigned this Aug 19, 2024
@aral-matrix aral-matrix added enhancement New feature or request wontfix This will not be worked on ready to close Pull request has been answered or implemented & is pending closure labels Aug 19, 2024
@aral-matrix
Copy link
Collaborator

This is an unfortunate consequence of the complex indexing that Excel does across worksheets - shared strings have no explicit index, they are referred by cells only by their position inside the shared strings xml array. This means every time a shared string would be deleted, the whole workbook would require re-indexing.

This could possibly be addressed in a future patch by a function "cleanupSharedStrings" or something like that, which does the reindexing once, on the users request, and letting the user control when the computation overhead happens.

I'll keep this open for now but can't promise a quick implementation :)

@aral-matrix aral-matrix removed wontfix This will not be worked on ready to close Pull request has been answered or implemented & is pending closure labels Aug 19, 2024
aral-matrix added a commit that referenced this issue Feb 2, 2025
…Iterator no longer creates missing rows unless iterator is dereferenced
@aral-matrix
Copy link
Collaborator

Guess what :) 4589a6c
XLDocument now has XLDocument::cleanupSharedStrings() (in the development-aral branch) - and it's not even half bad in terms of performance (tested with a huge workbook and ca. 500KB of shared strings XML).

@aral-matrix aral-matrix added testing Functionality has been implemented in development branch and is pending a merge into main ready to close Pull request has been answered or implemented & is pending closure resolved This issue has been resolved. and removed testing Functionality has been implemented in development branch and is pending a merge into main ready to close Pull request has been answered or implemented & is pending closure labels Feb 2, 2025
@aral-matrix
Copy link
Collaborator

Functionality is now merged into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request resolved This issue has been resolved.
Projects
None yet
Development

No branches or pull requests

2 participants