Skip to content

Commit

Permalink
added a release note explaining the nature of the improvement. (#8097)
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Nov 23, 2021
1 parent 089efbe commit b508edf
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions doc/release-notes/8097-indexall-performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### Indexing performance on datasets with large numbers of files

We discovered that whenever a full reindexing needs to be performe d, datasets with large numbers of files take exceptionally long time to index (for example, in the IQSS repository it takes several hours for a dataset that has 25,000 files). In situations where the Solr index needs to be erased and rebuilt from scratch (such as a Solr version upgrade, or a corrupt index, etc.) this can significantly delay the repopulation of the search catalog.

We are still investigating the reasons behind this performance issue. For now, even though some improvements have been made, a dataset with thousands of files is still going to take a long time to index. But we've made a simple change to the reindexing process, where such datasets are indexed at the very end of the batch, after all the datasets with fewer files have been reindexed. This does not improve the total reindexing time, but will repopulate the bulk of the search catalog much faster for the users of the installation.

0 comments on commit b508edf

Please sign in to comment.