- Hoover: Fixed issue with indexing Nextcloud28-linked folders that contain special characters.
- Hoover: Fixed issue with displaying "Filter Chilps" in search interface.
- Hoover: version downgrade in specific Minio containers requires removing the
directories at
collections/.minio.sys
andvolumes/snoop/blobs/.minio.sys
. Since these directories will contain a very high number of files, it's recommended to first rename them into.minio.sys.old
(or, if possible, move them to a parent directory) and then delete the renamed directory while the system is online. - Follow "clean reset" procedure with cluster version 0.18.2
Recommended upgrade procedure manual intervention:
./liquid halt
sudo mv /opt/node/collections/.minio.sys /opt/node/collections/.minio.sys.old
sudo mv /opt/node/volumes/snoop/blobs/.minio.sys /opt/node/volumes/snoop/blobs/.minio.sys.old
time sudo rm -rfv /opt/node/collections/.minio.sys.old /opt/node/volumes/snoop/blobs/.minio.sys.old
[... in another console, continue with normal deployment ...]
- Nextcloud: Fixed issue where users would lose access to shared folders after logging out and back in again.
- Hoover: Fixed issue where processing data from Minio S3 would return the error "File Not Found" after iterating through a couple of directory levels.
- Nextcloud: added integration with the "group folders" extension; this is now the only folder source for syncing collections with Hoover.
- Grist: added configuriation switch for setting sandbox on or off.
- Follow "clean reset" procedure with cluster version 0.18.2
- Removed the "rocketchat" integration. Desired chat logs should be exported manually from the application prior to upgrading.
- Deprecating the old "nextcloud" integration, it is superseded by nextcloud28". Details below. Please export any Nextcloud-related data; this integration will be removed next release.
- Hoover UI: Removed the "print view" button on documents, as it malfunctioned, feature is to be reworked in the future.
- Nextcloud v28:
- Integrated latest version of Nextcloud software.
- Access Control: All Nextcloud v28 user & access control settings are available.
- Integrated Collabora real-time office solution with Nextcloud.
- Hoover Web Integration: Individual Nextcloud folders can now be imported into Hoover from the web page admin. Functionality can be enabled by generating a Nextcloud "app password" / "api key" and pasting it in the Hoover admin for your own user.
- Hoover UI:
- Fixed bug where some PDF documents would fail to load.
- Fixed display errors in the "table view" of search results.
- Grist integration: Fixed issue where Grist sandbox would not work on some systems.
- Prophecies integration: Fixed issue where Prophecies permission checkbox would not appear in home page admin site.
Added optional integrations for new applications that can be hosted:
- grist - collaborative spreadsheets
- ICIJ/Prophecies - collaborative factchecking and data validations
Only users with the correct permisisons can access these new apps. Initial installation requires some manual intervention first time each app is activated; see the docs/ folder.
- Hoover: Fixed issue where the search page URL would be corrupted after selecting a result and refreshing.
- Hoover: Fixed issue on servers with more than 50 collections where the search page would take 15s to load.
- Hoover: Fixed UI bug where filenames and path would appear truncated in the result list, for collections indexed before June 2023.
- Hoover: sped up
checkdata
command and added optional parameters for recovering from S3 blob data loss. - Hoover: UI now keeps all the document sub-tabs loaded in the browser, resulting in quicker switching between Text, Meta and other tabs.
- Hoover: Fixed UI bug where a red "request cancelled" error would appear immediately after searching.
- Follow "clean reset" procedure with cluster version 0.18.0
- Matrix:
- Added new system for end-to-end encrypted text messaging.
- Supports Web Client and Android/iOS applications (Element IM for Matrix)
- Requires new subdomains:
matrix.
andmatrix-ui.
- Enabled by default; disable with
[apps]matrix=false
. - Does not yet support Federation or Audio/Video conference.
- Hoover:
- Enabled browser-side and server-side caching of collection data and metadata.
- This will drastically improve the loading times of all pages, for all users, for a whole week after the initial page load.
- Collection data documents will now be cached in the browser. This includes all documents for pages you visit. To stop the collection data documents from reaching your disks, either use the Tor Browser, or reconfigure your own browser to not store or cache any data.
- Added automatic splitting for large PDFs. You can now browse and search inside all pages of PDF files that consist of more than 10,000 pages and bigger than 1GB.
- Added search shortcut buttons from the search query. Clicking these buttons will search for them in the current document.
- Enabled browser-side and server-side caching of collection data and metadata.
- Admin Guide: Added guide for using WireGuard and Tails
- Hoover:
- Simplified internal HTTP server structure. Page load times significantly improved.
- Disabled unpacking the internals of DOCX and PDF files by default. Added config option to enable this feature per-collection.
- Hoover UI:
- fixed bug where collection name would not visible in preview list of results
- creating tags sometimes didn't trigger refreshing aggregations and search results
- fixed aggregation bucket count after clicking "More" button
- added a way way to stop searching inside search document (clear search box button)
- fixed "search for emails replying to this one" button
- fixed automatic retry for all search/api requests
- Hoover UI: fixed bug in PDF viewer where the page number, control and navigation buttons would malfunction.
This version brings version updates for cluster infrastructure components (Nomad, Consul, Vault).
- Follow "clean reset" procedure with cluster version 0.16.4
- Hoover UI: Added functionality for searching inside PDF documents, text files, and the Meta table.
- Hoover Processing: Added new integration and dashboards for monitoring internal stats of the processing SQL database.
- Hoover: Added integration for the Sentry crash reporting and performance monitoring toolkit. Instructions for installing and setting up this system are available under .
- Deployment: The
liquid deploy
command now better shows all service error outputs in one place. - Hoover UI: Modified search request scheme to return results incrementally as they are found in the various collections, instead of waiting for all results to be found before showing list.
- Hoover UI: collections list now displays the given title, not the collection ID.
- Hoover processing: fixed problem where files would not be searchable by their duplicate filename or duplicate paths. Previously, only the first instance of a duplicate file would be searchable by path, path fragment and filename. Now, all the possible filenames and paths are indexed and shown in the Meta table.
- Hoover processing: fixed issue where processing would slow down after ~24h after deployment.
- Hoover UI: Fixed functional and performance issues with the Finder component.
- Rocketchat: Fixed deployment issues with Rocketchat on new instances.
- Deployment: The ini file is now checked for outdated / wrong values. Run
./liquid resources
on the new version to check your configuration before deploying.
- Hoover: Fixed issue where some email formats would not be correctly identified and processed. Also fixed issue where HTML tags would appear in processed text.
- New App: Integrated new Wiki app, Wiki.js. This system has modern features, such as visual editing, role-based access control, and comments.
- Hoover: Fixed bug where, on some collections, document pages would intermittently raise errors when fetching the location list.
- Hoover: Fixed bug where Translate would fail on documents where OCR was enabled but not executed.
- Nextcloud: Added second Nextcloud instance, with different permission flags, that admins may enable. See configuration section
- Deployment: Fixed bug where the
./liquid deploy
command would download more images than necessary.
- Dokuwiki: Fix issue where Sitemap would not expand entries for users with restricted access.
- Hoover: Fixed "File Finder" issue where the root folder would be displayed as a different file.
- Hoover: Fixed bug causing wrong collection display in the Batch Search view.
- Dokuwiki: Limited expansion of new Sitemap entries to one level.
- Dokuwiki: Added ability to create custom Sitemaps for user-made namespaces, and place them in any page.
- Dokuwiki: Fixed bug where Sitemap would not display private wiki content allowed from ACL / Virtual Manager.
- Hoover: fixed bug causing delay while indexing new data in the uploads (NextCloud) collection.
- Sysadmin: added new tracing system at port 9975, with Hoover-specific performance metrics and charts.
- Hoover: better processing performance on large collections.
- Hoover: fixed bug causing excessive disk space usage from system logs.
- Hoover: the collection configuration for very expensive operations (entity extraction, translation, image classification, image detection) must now be explicitly enabled for every collection.
- Hoover: Fixed worker slowdown issue for old containers by adding restart timeout of 3-5 days for all hoover data workers.
- Dokuwiki: Added Virtual Group Plugin, which allows Access Control for all Wiki Instances. Both Group Management and Access Control are managed from the Dokuwiki Admin Page.
- Hoover: Added new configuration flag
snoop_unarchive_threads
for parallel unpacking of BZ2 type archives. BZ2 archives will now be unpacked with greater speed. Any other archive types (zip, rar) are not affected.
- Hoover: Fixed issue where Tika Temporary Files folder would grow unbounded in
size, using up all available disk space on the
/
partition. The workaround without this fix is to simply "stop" the Tika containers from Nomad UI every time this happens. The data has been moved to the Nomad data folder, by default/opt/cluster/var/nomad/alloc/...
. Additionally, the Temporary folder has been set-up to auto-delete files older than a few days.
- Hoover: Batch Search now has an internal queue to support very long batch search queries, and a large number of users searching in parallel.
- Hoover: Fixed Batch Search function issue where the search would error out with "Bad Request" on large lists.
- Removed 7z-fuse archive mounting feature. This migration will remove the feature from all collections, and resets the unarchive tasks needed to re-create the archived files.
- Admin: Home page optionally proxies dashboards for Grafana, Nomad, Snoop, as well as search and processing queues. Only available to Admins with SuperUser access permission. To enable, set
[liquid] enable_superuser_dashboards = true
inliquid.ini
.
- Fixed bug with archive mounting, where large amounts of storage would be used by logs. Disabled the archive mounting feature by default. Added warnings to switch off configuration related to archive mounting.
- Fixed bug where long searches would sometimes fail to show aggregations.
- Initiated reprocessing of email-related tasks, to resolve the "Invalid DateTime" bug. To upgrade, set
process = True
on all collections. - Fixed bug where processing queue memory would become full, and processing would completely halt. A larger number of collections can now be processed at the same time with
process = True
.
- Fixed Hoover bug where OCR processing would sometimes fail.
- Fixed Hoover bug where some files would produce errors if they had an unusual Russian encoding.
- Hoover script for batch importing of tags from a CSV file.
- Hoover script for checking for data loss and deleting orphaned objects.
- Admin: feature 'delete users' also deletes them in all apps.
- Fixed UI bug that would display an error when searches take more than one minute.
- Fixed ephemeral bug that would leak storage space when PDF previews are used with 2 or more OCR langauges.
- Fixed related to recursive archive mounts.
- Fixed homepage service deployment problem, saved 30 seconds.
- Hoover: Backup procedure now includes arguments to optionally backup and restore original collection data. Also, original collection data backup has been enabled for "uploads" in the
bin/periodic-backup.sh
script.
- Hoover: Fixed processing of some variants of
application/mbox
MBox Email Archives which would previously fail to unpack. - Hoover: Removed mismatching OCR tabs from documents where a language was detected and OCR is available for it.
- Hoover: Removed Translations made from one target language into another one.
- Authentication: Fixed bug where user sessions would be lost after server redeployment.
- Fixed Hoover bug that would stop new Tags from being indexed.
- Fixed issue where synced collections (such as "uploads") would not update the index.
- Fixed performance problem caused by recursive archive mounts.
This version fixes bugs in Rocketchat and Hoover configuration.
- Fixed Rocketchat issue where new servers would fail to start.
- Fixed Hoover processing stability issue caused by 7z mount process leak.
- Fixed bug with Hoover
retrytasks
command and UI button.
- Fixed issue where Hoover indexing would hang on very large collections.
- Fixed S3 mount process leak, which could crash systems under a few days of load.
- Hoover: Improve processing performance by re-using network mounts.
- Hoover: Collection data archive mounting can be disabled, and normal unpacking will be used instead. Config flag:
disable_archive_mounting
.
This release brings performance improvements for the Hoover processing pipeline.
- Follow "clean reset" procedure with cluster version 0.15.3
- Hoover: Added configuration for OCR parallelism: configuration.
- Hoover: Added configuration for describing files to be skipped from processing: configuration.
- Fixed problem where Hoover processing pipeline would cause server to run out of memory.
- Fixed performance issue where processing would run much slower than normal.
This is a bug-fixing release targeted at Hoover internals and Monitoring.
- Follow "clean reset" procedure with cluster version 0.15.0
- Hoover: Fixed issue where mail fields wouldn't appear (From, To, text) for some mail formats.
- Monitoring: Fixed issue with monitoring apps not working (Grafana, Prometheus).
- Scheduling: Fixed bug where dead and de-activated nodes would be counted as valid in the resource checker.
This bugfixing version brings stability improvements for multi-host deployments.
- Follow "clean reset" procedure with cluster version 0.14.2
- Hoover: Skip Windows and Linux installation files and extensions by default. Added new configuration flags to control what file types and extensionis are skipped.
- Hoover: Fixed issue when optional processes (OCR, NLP and Image Recognition, etc) would be turned on and then off again on an active collection.
Hypothesis is removed from the project starting with this version.
- Follow "clean reset" procedure with cluster version 0.13.7
- Hoover now recognizes tables (CSV, Excel, ODT) and splits them into smaller parts that can be viewed in the UI.
- Hoover: Run OCR analyzer on Office type documents (doc, docx, odt). Previously, OCR would only run on PDF files only.
- Hoover: Improved performance of whole-document OCR with existing text.
- Hoover: Collection Access Management now implemented for Admins too. Admins can't give access to collections they're not a part of.
- Hoover: Fixed ETA display for document processing.
- Fixed problem where some user sessions would still be active after user logout.
- Hoover: Fix performance issue related to document processing.
- Hoover: Fixed bug where collections couldn't be deleted if they had a certain name.
- Hoover: Fixed bug where Entity Extraction wouldn't work on some languages (Japanese, Russian, Arabic).
- For very large installations expect a few hours of downtime during release.
- Follow "clean reset" procedure with cluster version 0.13.6
- New restriction for collection names: at least 3 characters in length. Before upgrading, please backup the offending collections and restore them with names longer than 2 characters.
- Make sure the
/
filesystem has at least120 GB
for new Docker images, or bind mount/var/lib/docker
to a place with more space. - When updating, the service
hoover-snoop
will run migrations that may take a few hours. Because of that, do not restart the./liquid deploy
command before checking that migrations are finished in the Nomad UI, atJobs > hoover > snoop-web > snoop
.
- Hoover: Image AI: Image Classification and Object Recognition. Filter images by the objects detected inside by AI models we download and run. Configuration for Image Classification and Object Recognition
- Hoover: Named Entity Extraction -- automatically extract entities (persons, locations, organizations). Filter documents by the entities that appear in text. Configuration for Named Entity Extraction and Language Detection
- Hoover: Machine Translation -- automatically translate first paragraph of document text between languages using LibreTranslate. Translation user interface is also available in Hoover, to manually translate text. Configuration for Machine Translation
- Hoover: Generate and display thumbnails for small documents, pictures and Office files. The thumbnails are shown in the document result list, and in the document header. Configuration for Thumbnail Generator
- Hoover: Convert Office files to PDF for easier viewing in the browser. Configuration for PDF Preview
- App Permissions: User access to specific applications is now configurable by system admins, at the User or Group level.
- RocketChat platform now available in RochetChat Mobile App for Android and IOS. Push Notifications optional. Steps to Enable RocketChat Push; Configuration Flag for RocketChat Push
- RocketChat auto-logout interval is now configuarable separately. Configuration for Rocketchat Auto-Logout
- User Management: new users can now be onboarded into Hoover collections without needing to wait for them to log in and open Hoover for the first time.
- Hoover Collections can now be configured individually for all the optional features. New Per-Collection Configuration Flags, Example usage for all Collection Flags.
- Hoover: Fixed a bug limiting PDF viewer performance for large files.
- Hoover UI: added component to filter bucket values in aggregation results.
- Hoover UI: improved PDF viewer with new toolbar: added viewer for table of contents, in-document bookmarks, page thumbnails, and file attachments.
- Hoover UI: Help texts have been moved under "?" icons to save screen space.
- Fixed a broken Hoover link pointing to the Hoover documentation page.
This version brings Hoover UI improvements, as well as a new TOTP device change form, and updated password change forms.
- Hoover Insights View: page with aggregate data (file/data counts, common terms) for each collection, as well as advanced information on the processing status and ETA breakdown.
- Home Page: Users can now change their TOTP device without admin intervention. Users can also add multiple TOTP devices to the same account, and also remove old devices from their device list.
- Hoover: Added more icons (especially for tags) and updated some existing ones.
- Hoover: Changed File Browser (Finder) implementation into a custom one, to allow for future improvements.
This version brings some Hoover UI improvements, and a script to benchmark Hoover search times.
- Hoover: Added command to benchmark Hoover search durations for a range of concurrent users and output a scatter plot with search time vs. searched collection size.
- Hoover UI: aggregation N/A bucket counts are now loaded when element becomes visible, instead of being loaded at search time. This should help reduce the search aggregation response times.
- Hoover UI: Added a configurable delay before retrying a failed request, default is 3s.
- Hoover UI: Fixed bug where the "Email To" field would collapse multiple email addresses into a single string, obstructing the use of the "Open a new search for this term" button on that field.
In the node
repository, run pipenv install
to install the new plotting libraries.
Then, you can simply run ./liquid deploy
.
This is a Hoover hotfix release that removes a problem with search queries that take more than 60s.
- Fixed an issue where requests (or other search queries) would return an error if the time exceeded 60s.
This version is a Hoover hotfix release that adjusts various parameters for shorter search times. This should help lower search times and have more expansive searches fit the timeout.
- Added buttons to collapse categories and filter panes.
- Increased Hoover search timeout from 50s to 100s.
- Reduced Hoover search result bucket count from 100 to 44. More results can be still pulled when clicking on the "More" button.
- Reduced number of matched highlights per result from 3 to 2.
- Added management command
./liquid remove-last-es-data-node
to migrate data off the last Elasticsearch data server. This command automates some manual steps required for this operation.
This is a Hoover UI hotfix release that brings more stability when searching in a large number of collections. This is done by splitting aggregation search requests into smaller ones, and by retrying timed out and failed requests.
- Added configuration option called
hoover_ui_agg_split
for splitting aggregations into consecutive requests. - Added configuration option called
hoover_ui_search_retry
for maximum number of retries allowed for failed search requests. See the example config file for more details.
This is a Hoover UI hotfix release.
- Bring back search times closer previous known times by removing the implicit
NOT Public Tag: trash
filter from Search (added inv0.14.0
). This tag will behave like any other public tag (same as beforev0.14.0
).
This release brings Hoover UI and backend improvements, as well as re-written User Guides. These User Guide pages include a new, more complete Hoover User Guide. as well as updated User Guide pages for all other apps: Rocket.Chat, DokuWiki, CodiMD, Nextcloud, and Hypothesis.
- Added more structure to the aggregations by grouping them into categories. Only one list is shown at a time; the others only show aggregated hit counts.
- Added aggregations for document size and text word count.
- Fixed UI bug where download links inside document children lists would be wrong.
- Fixed performance problem when unpacking very large
.tar
archives. - Fixed bug where the
trash
tag couldn't be ignored when searching. - Fixed bug where modified search query would be lost when changing Sort or Filters.
- Fixed bug where some documents would be opened in a new tab instead of downloaded.
This release brings Hoover UI improvements and some new Hoover developer documentation.
- Tags Autocomplete: When creating Tags, Hoover now displays the most commonly used Tags in the collection. New tags are added by clicking on them. Tags can be filtered by typing their partial name.
- Added new aggregation for Content Type (Mime Type).
- Fixed bug where TIF images wouldn't render: added browser renderer for
.TIF
/.TIFF
images.
This is a bug fixing release targeting small Hoover UI issues.
- Added buckets for filtering search results by content type.
- Added missing redirect rules for annotations made on Hoover documents before November 2020.
- Fixed a bug where Document pages would stay blank or loading in case of document fetching error. The pages will now display a proper error message.
All new features in this release are related to Hoover.
- Added buckets for missing values on all fields in left panel. Buckets are first in field, labeled in italic as N/A. Missing values for fields can be combined with all the other operators on that field: including, excluding.
- Added PDF viewer directly in document page. All old annotation URLs and old document viewer URLs now redirect to this page. Annotations now work in the PDF preview, even for scanned and OCR'd documents.
- Added Contextual Menu on fields under the Meta document tab. Options now include: adding field value to current search, opening a new search from field value. Timestamp fields now have options for filtering by that year, month, or week.
- Added thunderbird-like histogram displays for dates. Multiple buckets can be selected by clicking and dragging a line over them with the mouse. Once selected, the intervals can be used as a filter, or selected as individual buckets.
All improvements in this release are related to Hoover.
- Where possible, the fields from the Meta tab will now append their search to the filters buckets instead of the query string.
- Improve scrolling behavior for buckets. All buckets are now of a fixed height and contain more elements by default.
- When clicking on a PDF document, the UI jumps by default to its OCR PDF preview tab, so you can annotate the scans. The texts for the document are available below on the same tab.
- Mention collection in search result card.
- After selecting a collection, Hoover will now pull all the aggregation buckets, even if you didn't fill in a query yet.
- Added more progress spinner UI components for better interaction on slow servers.
- Fixed bug where Hoover PDF OCR preview would display an error for longer PDFs.
- Fixed bug where Hoover annotation tooltip info would display a negative number "Indexed -5 seconds ago" after clicking on lock.
- Removed Hoover Admin buttons for unsupported actions: adding/removing users and collections. User management is done in the home page, and collection management is done through the
liuqid.ini
configuration file. - Fixed buttons for deleting a selected filter from the filter preview bubble.
- Removed links from Hypothesis sidebar to facebook, twitter, google plus,
mailto:
to prevent accidental leaking. The icons currently still exist but error out with 404 on our page.
- Replaced the
go.rocket.chat
channel invite URLs with our own rocketchat page to prevent accidental leaking. The links will now error out with 404 on our page.
We have upgraded Hoover's database to the latest version, and that means a dump/restore is needed as part of this deployment. The dump/restore won't be bigger than 500 MB, and won't take more than 5min.
- create full app backup before upgrading:
./liquid backup TMP_BACKUP --no-collections
- follow "clean reset" procedure with cluster version 0.13.1
- restore app backup after deploy is done:
./liquid restore-apps TMP_BACKUP
- verify that Hoover groups and permissions are still correct by visiting Hoover Admin
- delete the backup:
rm -rf TMP_BACKUP
Read the output of git show v0.X.X
or the tag descriptions at https://github.com/liquidinvestigations/node/releases.