dataspects for MediaWiki is based on Meilisearch and instant-meilisearch.
flowchart BT
subgraph Extension:Dataspects
mediawikiAPI("<b>MediaWiki API</b>
- LocalSettings.php
- <a href='https://mwstakeorg.dataspects.com/w/api.php?action=help&modules=dataspectsapi'>dataspectsapi</a>")
sQLite("<b>SQLite</b><br/>for managing search facet configs")
Cypress("<b><a href='https://www.cypress.io/'>Cypress</a></b><ul><li><a href='https://github.com/dataspects/mediawiki-extensions-Dataspects/tree/main/cypress/e2e'>end-to-end and component tests</a></li><li><a href='https://htmlpreview.github.io/?https://github.com/dataspects/mediawiki-extensions-Dataspects/blob/master/doc/search-facets.cy.js.html'>automatic documentation</a></li></ul>")
AnalysisPipelines("<a href='https://github.com/dataspects/mediawiki-extensions-Dataspects/tree/main/src/jobs'>Analysis Pipelines</a><ol><li>read a facet from storage</li><li>use <span style='color:orange;'>modules/services</span> to conclude annotations</li><li>write altered documents back to storage</li></ol>")
end
storage("<b>Storage</b><ul><li>Meilisearch</li><li>Neo4j</li></ul>")
analyzers("<b style='color:orange;'>Services</b><ul><li>Tika</li><li>spaCy</li></ul>")
subgraph Internet
userAgent("<b>Special:Dataspects</b> (Algolia <a href='https://www.algolia.com/doc/guides/building-search-ui/what-is-instantsearch/js/'>InstantSearch</a>)<ul><li><a href='https://github.com/dataspects/mediawiki-extensions-Dataspects/blob/main/resources/ext.dataspectsSearch/profiles.json'>hit profile <-> searchResultClass matching</a></li></ul><b>Special:DataspectsBackstage</b>")
internetSources("<b>- mediawiki.org</b><br/><b>- semantic-mediawiki.org</b><br/><b>- riot.im</b><br/>...")
end
subgraph Workstation
DataspectsCLI("<b><a href='https://github.com/dataspects/dataspects'>dataspects (Go CLI)</a></b>
- export DS_MEILI_MASTERKEY=
- export INDEX=")
end
DataspectsCLI-.-|configure/manage|storage
userAgent-->|<b>Search</b><br/>wgDataspectsSearchKey|storage
userAgent-->mediawikiAPI
mediawikiAPI-->|<b>CRUD</b><br/>wgDataspectsWriteKey|storage
AnalysisPipelines-.-analyzers
AnalysisPipelines-.-storage
DataspectsCLI-->|<b>Read</b>|internetSources
mediawikiAPI-->|<b>CRUD</b>|sQLite
classDef default text-align:left;
linkStyle 0,3,6 stroke:#ff0000
linkStyle 1,4,5 stroke:#00ff00
- EPPO | Namespaces
- DataspectsAPI | dataspectsapi
- Explain a facet: Module:ExplainFacet | Aspect "The MWStake story"
- Curate HTML before indexing, see wgHTMLElementsToBeRemovedBeforeIndexingContent
- AnalyzeAndAnnotateMeiliDocs
- implements Algolia's InstantSearch
- provides meta data on search results: currently 'last indexed' and 'searchResultClass', see LEX230108155400
- formats the controlled use of cognitive keywords (CoKe), see LEX230108160200
- save search facets, see LEX230108163200,
maintenance/manageSQlite3.php --initialize
- show original page contents under search results, see LEX230108165801
- compact search results, see LEX230108165800
- enable multiple sources
- search includes non-article pages (i.e. templates, forms, etc.), see onPageSaveComplete
- reveal nested template calls graphically, see LEX230108161800
- extracts and indexes metadata and text from uploaded files, see DataspectsTikaJob
- initialize new EPPO topic types, see LEX230108161000
- see statistics on indexing activity, see LEX230108165200
- see statistics on data sources, see LEX230108165201
- check current dataspects configuration, see LEX230108165600
- Delete docs from indexes
- COMMAND:
sudo docker exec canasta-dockercompose_web_1 bash -c 'php extensions/Dataspects/maintenance/feedAll.php'
- COMMAND:
dataspects__feed-mediawiki-category-to-index.sh
- MONITOR:
mwstakeorg__localhost__debug-log.sh
Example: configure dataspects for Canasta
- Add to Canasta MediaWiki container:
composer require --with-all-dependencies meilisearch/meilisearch-php:0.25.0 symfony/http-client laudis/neo4j-php-client
- RESET: data storage backends (see below CONFIGURE: the data storage backends)
- LOAD:
w# php tests/phpunit/phpunit.php --filter testResetTestData extensions/Dataspects/tests/phpunit/unit/DataspectsTest.php
- RUN:
- Cypress
- E2E tests
- Component tests
- Services tests (TIKA)
- PHP unit tests
- Cypress
sudo docker exec -it canasta-dockercompose_web_1 /bin/bash
root@95e3ef5ecc17:/var/www/mediawiki/w# php tests/phpunit/phpunit.php \
extensions/Dataspects/tests/phpunit/unit/DataspectsTest.php
Debug API: https://localhost/w/api.php
- ACTION:
mwstakeorg__localhost__make-test-documentation-TDM.sh
- Special:DataspectsBackstage
mwstakeorg__status.sh
- CHECK: Base images Meilisearch, Neo4j, Tika
- CHECK: Derived images canasta-dataspects, spacy-dataspects
- FIXME: jq to Dockerfile
- CHECK: Environment variables in
.env
which set$wgDataspects*
variables inLocalSettings.php
- OPTION: temporarily change
$wgDataspects*
variables inLocalSettings.php
:- ADVANTAGES:
- no need to restart the Docker compose stack
- preserve proper development
.env
- STORED-PROCEDURE:
mwstakeorg__localhost__TEST-load-test-data_and_Cypress.php.sh
- ADVANTAGES:
- OPTION: temporarily change envs in docker-compose.override.yml
- PREPARE:
source *.config
files (e.g.localhost.config
andproduction.config
) exporting the environment variables- RESET Meilisearch:
meilisearch__index_reset.sh
which appliessrc/indexsettings.json
- RESET Meilisearch:
- RESET SQLite:
- delete
sqlite/dataspects.sqlite
- run
php extensions/Dataspects/maintenance/manageSQlite3.php --initialize
- delete
- RESET Neo4j:
MATCH(n) DETACH DELETE n
#!/bin/bash
# https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
curl \
-T /home/lex/python-regular-expressions-cheat-sheet.pdf \
http://localhost:9998/rmeta
sudo docker exec -it canasta-dockercompose_web_1 /bin/bash
tail -f apache2/error_log.current
yarn add/update the libraries and then copy the corresponding files into place.
Install nvm/node curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.2/install.sh | bash nvm ls-remote --lts nvm install v16.18.0 npm install -g yarn
yarn add the libs
lex@lexThinkPad:~/Downloads/dataspects-search-js-libraries$ yarn add
@meilisearch/instant-meilisearch instantsearch.js vis-network
Copy into place, e.g. lex@lexThinkPad:~/Downloads/dataspects-search-js-libraries$ cp node_modules/vis-network/dist/vis-network.min.js ~/mwstakeorgdevclone/extensions/Dataspects/resources/ext.dataspectsSearch/