Performance Test - scheduled for 2024-06-07 #162

brent-hartwig · 2024-05-28T16:32:26Z

Primary objective:
Test the system stability and performance impact of #160. Given previous analysis, we believe #160 will:

Improve system stability when compared with previous 2024 performance tests when instances of the v8 engine crashed early on an frequently. We are looking to get much closer to June 2023's Scenario J performance test when there were zero v8 engine crashes. This was later repeated by Scenario L (another run of Scenario J).
Reduce facet response time, compared to https://git.yale.edu/lux-its/marklogic/issues/1164's original implementation.
Reduce facet response size, compared to https://git.yale.edu/lux-its/marklogic/issues/1164's original implementation.

If available, #161 can be part of this performance test as well. Given the aforementioned analysis was of a search for works, semantic facets were not involved; as such, it is welcome but not required by this test.

Code and Configuration Changes:
Required: implementation of #160 and associated frontend and middle tier implementations.
Optional: implementation of #161; if included, its associated frontend and middle tier implementations are required.

Environment and versions (update as needed):

Environment: TST which is configured to the GREEN backend.
MarkLogic 11.0.3 (downgraded from 11.2 nightly build)
Backend v1.17.0 + Add Build Time Data Constant For Collection #167 + facet pagination
Middle Tier v1.1.9 + facet pagination
Frontend v1.27.0 + facet pagination
Dataset produced on 2024-05-29

Scenario AK of the Perf Test Line Up: "See if paginating facet requests (#160 and #161) restores system stability by reducing --if not eliminating-- v8 engine crashes."

Backend application server configuration:

lux-request-group-1 on port 8003: The middle tier is expected to send all requests here except search and relatedList requests. Maximum of 6 concurrent requests.
lux-request-group-2 on port 8004: The middle tier is expected to send all search and relatedList requests to this application server. Maximum of 12 concurrent requests.
Maximum of 18 concurrent requests per node.

For more information please see the documentation: LUX Performance Testing Procedure

Tasks:

Collect Data (Details from procedure):

Seong: Stop collecting OS-level metrics and attach to the ticket
Seong: Stop collecting middle-tier metrics and attach to the ticket
Seong: Collect data from AWS and attach to ticket.
Brent: Download the monitoring history (level=raw) and attach to the ticket.
Brent: Take screenshots of select monitoring history graphs.
Brent: Collect, trim, and attach backend logs to the ticket.
Brent: Pull app server queue metrics, attach to the ticket, and record in Perf: Key Metrics.
Brent: Update online spreadsheet tabs with what is known at this point.

Restore and Verify Environment:

Revert this test's code and configuration changes
Enable middle-tier caching.
Smoke test the front end.

Analyze:

Upon receipt, review report from QA and update related portions of the online spreadsheet tabs.
Mine the backend logs? Yes.
Determine if the test is valid: The test is considered valid, but does come with caveats. See the executive summary.
Determine if the performance is acceptable: again with caveats, performance is considered acceptable. This test was more about determining whether facet request pagination improved system stability, which it did.

Notes:

Test duration: 1:02:08 p.m. - 3:16:30 p.m (Friday, June 7)

The text was updated successfully, but these errors were encountered:

brent-hartwig · 2024-06-06T19:17:39Z

The specific backend branch being tested is 161-paginate-via-search-facet-requests. I believe it to be a branch of 160-paginate-non-semantic-facet-requests, which is a branch of release1.18.

In case this test is a raging success and we have time for a second test, we could re-run #132: include alternative names when resolving name search criteria. I created the 161-pagination-PLUS-100-alternative-names branch which is the above plus #100's edits.

cc: @gigamorph, @kamerynB, @prowns, @jffcamp, @clarkepeterf

gigamorph · 2024-06-07T19:32:37Z

Middle tier stats:
20240607-tst-middle-tier-stats.txt

gigamorph · 2024-06-07T19:47:24Z

ML load balancer:

gigamorph · 2024-06-07T20:04:25Z

CPU utilization of ML nodes:

504s from ML load balancer:

504s from web cache:

brent-hartwig · 2024-06-07T20:22:44Z

Trimmed backend log files:

brent-hartwig · 2024-06-07T20:46:03Z

Acknowledged, understood, issue submitted, or resolved: see follow-up in #181.

There were fewer backend requests than expected --yet there are not significant differences between the total number of executed Virtual User (VU) flows/transactions, overall error rate, and average response times with this test's baseline (22 Jun 23) and a more recent test (14 Feb 24). Test duration doesn't explain it either. Comparison:

	J (1033)	AG (1145)	AK	AK vs. J	AK vs. AG
Date	22 Jun 23	14 Feb 24	7 Jun 24	N/A	N/A
App server thread count(s)	6 & 12	18	6 & 12	N/A	N/A
Backend search request count (200 status code)	27,438	12,214	2,758	-90%	-77%
Backend facet request count (200 status code)	82,687	63,367	32,348	-61%	-49%
Total backend request count (all status codes)	1,016,344	688,599	307,511	-70%	-55%
Frontend response count with status code in the 200s	1,046,953	1,046,953	830,098	-21%	-21%
Total frontend response count (all status codes)	1,173,748	1,173,748	1,073,190	-9%	-9%
Backend requests per frontend response	0.87	0.59	0.29	-67%	-51%
Frontend response per backend request	1.15	1.70	3.49	202%	105%
Total middle tier request count (all status codes)	UNK	UNK	363,788	UNK	UNK
Backend requests per middle tier request	UNK	UNK	0.85	UNK	UNK
Middle tier requests per backend request	UNK	UNK	1.18	UNK	UNK
Total number of executed VU flows	28,573	36,729	32,230	3,657	-4,499
Error rate	2.42%	5.42%	4.41%	1.99%	-1.01%
Average, unweighted VU flow response time (secs)	2.262	2.941	3.297	1.035	0.356
Test duration (HH:MM)	1:55*	2:12	2:15	00:20*	00:03

If the system isn't being pushed as hard as previous tests, the results of this test could be skewed. Regardless, today's test was so much better than those on 9 May 24 (#132) and 22 May 24 (#151) as there were zero v8 engine crashes!

Possible explanations thought of to date (not mutually-exclusive):

Fewer backend requests for select frontend requests. Facts diluting this possibility:
- The search estimate endpoint was changed to only accept the criteria for one search per request.
- There were 9.5K to 24.5K fewer frontend searches during this test, based on which test you compare to.
- A single frontend search for works makes 197 backend requests.
- (9.5K * 197 is over 1.8M, which far exceeds 22 Jun 23's total request count of 1.0M)
Performance test changed. There was a decrease in the number of responses given to the VUs. Unknown if that indicates fewer requests we made due to a change(s) to the performance test, the frontend making fewer requests, or a mix of both. The overall frontend decrease of 9% is smaller than the 55% - 70% backend decrease. The ratio between frontend responses and backend requests reduced significantly but the backend is not fulfilling all frontend requests, including images and analytics.
One or more caches between the browser and MarkLogic's load balancer. This comment digs into the number of frontend and middle tier requests more.
What else?

* Logs were trimmed to 1:55. The QA report states the test ran for 2:15. Found a note that read "Test stopped after full ramp up but did not run for the full 15 minutes afterwards." Regardless, for the purposes of this comparison, 22 Jun 23's test running shorter would skew the results to the benefit of 7 Jun 24's test.

brent-hartwig · 2024-06-07T20:54:21Z

Acknowledged, understood, issue submitted, or resolved: see follow-up in #181.

@gigamorph noticed early on that there were thousands of 504s (gateway timeouts) from the ML load balancer early on. As shown above, that persisted throughout the test but 504s at the web cache later never exceeded 100 per minute. The likely explanation is that the data service proxies automatically retried the 504s. Looking at this and previous tests' app server metrics, a contributing factor could be a full request queue, for the lux-request-group-1 app server. That app server's queue reported being full 82% of the minutely data points, implying it was maxed out for quite a bit of the test. Given the other app server's queue was not used at all, we should consider redirecting one or more request types to it.

brent-hartwig · 2024-06-10T13:21:48Z

ML Monitoring History

Time period: 1700 - 1920 UTC

CPU:

File IO Detail:

Memory:

Intra-cluster activity, 1 of 2:

Intra-cluster activity, 2 of 2:

Both HTTP app servers:

Just the lux-request-group-1 app server (all requests here except search and relatedList requests, up to 6 threads per node):

Note the difference in the y axes for lux-request-group-1 and lux-request-group-2.

Just the lux-request-group-2 app server (only search and relatedList requests, up to 12 threads per node):

Note the difference in the y axes for lux-request-group-1 and lux-request-group-2.

Data node characteristics for the lux-content database alone:

brent-hartwig · 2024-06-10T13:29:09Z

ML monitoring history exports:

network-detail-20240610-130259.xls
servers-detail-20240610-130334.xls
xdqp-server requests detail-20240610-130107.xls
cpu-detail-20240610-124745.xls
databases-detail-20240610-131620.xls
file-i_o detail-20240610-125223.xls
memory-detail-20240610-130034.xls

brent-hartwig · 2024-06-10T13:44:45Z

ML App Server Metrics

The following reflects group one's queue was full for the majority of the test. This could explain many of the 504s the MarkLogic load balancer is returning. Please see this above comment for more details and suggested follow-up.

Script: extractAppServerQueueMetrics.js*

Input: 20240607-1700-1920-app-server-stats.json

Output:

[
  {
    "appServer":"lux-request-group-1",
    "largestQueueSize":36,
    "largestQueueSizeCount":115,
    "greaterThanOneCount":125,
    "dataPointCount":141,
    "totalQueuedRequestCount":4327,
    "values":"omitted per script configuration"
  },
  {
    "appServer":"lux-request-group-2",
    "largestQueueSize":0,
    "largestQueueSizeCount":141,
    "greaterThanOneCount":0,
    "dataPointCount":141,
    "totalQueuedRequestCount":0,
    "values":"omitted per script configuration"
  }
]

* The input includes one instance of a queue size of 37, which exceeds the maximum. I changed the script to use 36 for group one, then added one to that total to account for the single report of 37.

brent-hartwig · 2024-06-10T16:25:31Z

OS-level metrics (via sar):

These were downloaded from Green's nodes but filenames do not reflect nodes 104, 73, 20.

sar_ip-10-5-156-171.its.yale.edu_2024-06-07T165241.out.gz
sar_ip-10-5-156-171.its.yale.edu_2024-06-07T165241_screen.out.gz
sar_ip-10-5-157-195.its.yale.edu_2024-06-07T165318.out.gz
sar_ip-10-5-157-195.its.yale.edu_2024-06-07T165318_screen.out.gz
sar_ip-10-5-254-21.its.yale.edu_2024-06-07T165336.out.gz
sar_ip-10-5-254-21.its.yale.edu_2024-06-07T165336_screen.out.gz

brent-hartwig · 2024-06-10T18:45:05Z

Acknowledged, understood, issue submitted, or resolved: see monitoring test https://git.yale.edu/lux-its/ml-cluster-formation/issues/25.

Looks like two requests failed due to an attempt to send too much data between nodes:

20240607-green-as-test-node-104-ErrorLog-trimmed.txt:2024-06-07 17:43:31.069 Warning: XDQPServerConnection::init: 172.18.94.14 XDMP-XDQPINVREQ: Invalid XDQP request, size=5388703008, client=0, conn=10.5.156.104:7999-172.18.94.14:50564
20240607-green-as-test-node-73-ErrorLog-trimmed.txt:2024-06-07 18:29:36.054 Warning: XDQPServerConnection::init: 172.18.94.12 XDMP-XDQPINVREQ: Invalid XDQP request, size=5388703008, client=0, conn=10.5.157.73:7999-172.18.94.12:48512

Which line up with these entries:

20240607-green-as-test-node-104-8004-ErrorLog-trimmed.txt:2024-06-07 17:43:36.153 Notice: SVC-SOCPEER: Socket peer name error: getpeername 10.5.156.104:8004: Transport endpoint is not connected
20240607-green-as-test-node-73-8004-ErrorLog-trimmed.txt:2024-06-07 18:29:45.069 Notice: SVC-SOCPEER: Socket peer name error: getpeername 10.5.157.73:8004: Transport endpoint is not connected

brent-hartwig · 2024-06-10T19:35:53Z

Acknowledged, understood, issue submitted, or resolved: submitted Log additional details of failed searchWillMatch requests #177

Encountered five JS-FATAL errors. They all match the following pattern, meaning they were all for searchWillMatch requests. These were found in the 8003 application log files, as opposed to the v8 engine crash log entries we're accustomed to finding in the main error logs.

2024-06-07 18:26:59.300 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
2024-06-07 18:26:59.300 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:26:59.300 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:26:59.300 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:26:59.300 Notice: JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07 18:26:59.300 Notice:+in /luxErrorHandler.mjs [javascript]

Nodes and times associated to all instances:

$ grep JS-FATAL 20240607-green-as-test-node-*-800[3-4]-ErrorLog-trimmed.txt
20240607-green-as-test-node-104-8003-ErrorLog-trimmed.txt:2024-06-07 17:30:33.370 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-104-8003-ErrorLog-trimmed.txt:2024-06-07 18:32:31.255 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-104-8003-ErrorLog-trimmed.txt:2024-06-07 18:58:53.361 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-20-8003-ErrorLog-trimmed.txt:2024-06-07 18:52:31.780 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-73-8003-ErrorLog-trimmed.txt:2024-06-07 18:26:59.300 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()

cc: @clarkepeterf

brent-hartwig · 2024-06-10T19:52:23Z

Acknowledged, understood, issue submitted, or resolved: submitted Improve conditional logic for returning a 408 status code (time out) #178 (low priority)

While there is a JS-BAD instance for each above-discussed JS-FATAL instance, this comment speaks to the other JS-BAD instances. This subset is related to semantic facet requests. Each instance is provided below. Because they all fail after one minute and the request and app server timeouts are both 59 seconds, I believe these are simply timed-out requests. I also checked 22 Jun 23's logs and found four instances; just the "Failed to calculate" log entries are included below, at the end.

The response status code for all was 500, meaning the data service proxies would not have retried them.

2024-06-07 18:27:16.422 Info: [Event:id=LuxFacets] Searching for the 'responsibleUnits' facet values.
2024-06-07 18:27:16.423 Info: [Event:id=LuxSearch] Search 4295481604482034528 parameters: {"searchCriteria":{"_scope":"agent","OR":[{"memberOfInverse":{"AND":[{"curated":{"containingItem":{"text":"booked","_lang":"en"}}},{"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}]}},{"AND":[{"curated":{"containingItem":{"text":"booked","_lang":"en"}}},{"NOT":{"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}}]}]},"searchScope":null,"mayChangeScope":false,"page":1,"pageLength":101,"requestContext":"viaSearchFacet","mayExceedMaximumPageLength":true,"mayEstimate":true,"sortDelimitedStr":"","facetsSoon":false,"synonymsEnabled":false}
...
2024-06-07 18:28:15.099 Info: [Event:id=LuxSearch] Search 4295481604482034528 errored out: {"stack":"JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj\nin [anonymous], at 44:5 [javascript]\nin ./utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]\nin ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]\nin ./searchLib.mjs, at 178:38, in _search() [javascript]\nin ./searchLib.mjs, at 81:9, in search() [javascript]\nin ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]\nin ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]\nin ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]\nin /ds/lux/facets.mjs, at 12:9 [javascript]\nin ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]\nin /ds/lux/facets.mjs, at 11:0 [javascript]\nin xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"}) [javascript]\nin /ds/lux/facets.mjs [javascript]\nin /ds/lux/facets.mjs [javascript]","mlerr":{},"data":["checkMaybeLocal","wrapValue valObj"],"message":"Unexpected failure","name":"JS-BAD","retryable":false,"stackFrames":[{"uri":"[anonymous]","line":44,"column":5,"operation":".toArray()","language":"javascript"},{"uri":"./utils/utils.mjs","line":111,"column":14,"operation":"evalInContentDatabase","language":"javascript","isEval":false},{"uri":"./SearchCriteriaProcessor.mjs","line":188,"column":17,"operation":"getSearchResults","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":178,"column":38,"operation":"_search","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":81,"column":9,"operation":"search","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":313,"column":17,"operation":"_getViaSearchFacetValues","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":269,"column":17,"operation":"_getViaSearchFacets","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":52,"column":37,"operation":"getFacets","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":12,"column":9,"language":"javascript","isEval":false},{"uri":"../../lib/requestHandleLib.mjs","line":10,"column":9,"operation":"handleRequest","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":11,"column":0,"operation":"xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"})","language":"javascript","isEval":false}]}
2024-06-07 18:28:15.099 Info: [Event:id=LuxSearch] {"requestId":"4295481604482034528", "requestCompleted":false, "milliseconds":{"total":58676, "process":1}, "estimate":0, "page":1, "pageLength":101, "requestContext":"viaSearchFacet", "returned":0, "scope":"agent", "changedScope":false, "criteria":{"OR":[{"memberOfInverse":{"AND":[{"curated":{"containingItem":{"text":"booked", "_lang":"en"}}}, {"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}]}}, {"AND":[{"curated":{"containingItem":{"text":"booked", "_lang":"en"}}}, {"NOT":{"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}}]}]}, "ignoredTerms":[], "query":"\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.prefixer(\"http://www.cidoc-crm.org/cidoc-crm/\");\nconst la = op.prefixer(\"https://linked.art/ns/terms/\");\nconst lux = op.prefixer(\"https://lux.collections.yale.edu/ns/\");\nconst skos = op.prefixer(\"http://www.w3.org/2004/02/skos/core#\");\n        const docs = fn.subsequence(\n          cts.search(cts.andQuery([\n          cts.jsonPropertyValueQuery('dataType', [\"Person\", \"Group\"], ['exact']),\n          cts.orQuery([cts.documentQuery(cts\n    .triples(\n      [],\n      [crm(\"P107i_is_current_or_former_member_of\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      cts.andQuery([cts.documentQuery(cts\n    .triples(\n      cts\n    .triples(\n      [],\n      [la(\"member_of\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      cts.orQuery([\n      cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n      cts.tripleRangeQuery(\n    [],\n    [lux(\"itemAny\")],\n    fn.insertBefore(\n    cts.values(\n      cts.iriReference(),\n      '',\n      ['eager', 'concurrent'],\n      cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n    ), \n    0,\n    sem.iri('/does/not/exist')\n  ), '=', [], 1\n  )\n    ])\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist')),\n      [lux(\"agentOfCuration\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      undefined\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist'))), cts.fieldValueQuery(\n      [\"agentTypeId\"],\n      [\"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3\"],\n      [\"exact\"],\n      1\n    )])\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist'))), cts.andQuery([cts.documentQuery(cts\n    .triples(\n      cts\n    .triples(\n      [],\n      [la(\"member_of\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      cts.orQuery([\n      cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n      cts.tripleRangeQuery(\n    [],\n    [lux(\"itemAny\")],\n    fn.insertBefore(\n    cts.values(\n      cts.iriReference(),\n      '',\n      ['eager', 'concurrent'],\n      cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n    ), \n    0,\n    sem.iri('/does/not/exist')\n  ), '=', [], 1\n  )\n    ])\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist')),\n      [lux(\"agentOfCuration\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      undefined\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist'))), cts.notQuery(cts.fieldValueQuery(\n      [\"agentTypeId\"],\n      [\"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3\"],\n      [\"exact\"],\n      1\n    ))])])\n        ]), [\"unfaceted\"]), \n          1, \n          101\n        ).toArray();\n        const results = [];\n        for (const doc of docs) {\n          results.push({id: doc.baseURI, type: doc.xpath('/json/type')});\n        }\n        results;"}
2024-06-07 18:28:15.099 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58690 milliseconds: responsibleUnits
2024-06-07 18:28:15.100 Notice: JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj
2024-06-07 18:28:15.100 Notice:+in [anonymous], at 44:5 [javascript]
2024-06-07 18:28:15.100 Notice:+in ./utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]
2024-06-07 18:28:15.100 Notice:+in ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]
2024-06-07 18:28:15.100 Notice:+in ./searchLib.mjs, at 178:38, in _search() [javascript]
2024-06-07 18:28:15.100 Notice:+in ./searchLib.mjs, at 81:9, in search() [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs, at 12:9 [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs, at 11:0 [javascript]
2024-06-07 18:28:15.100 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:15.102 Warning: {"errorResponse":{"statusCode":500,"status":"Internal Server Error","messageCode":"JS-BAD","message":"Unexpected failure"}}

2024-06-07 18:26:22.219 Info: [Event:id=LuxFacets] Searching for the 'responsibleCollections' facet values.
2024-06-07 18:26:22.219 Info: [Event:id=LuxSearch] Search 14325137610764299934 parameters: {"searchCriteria":{"_scope":"set","AND":[{"containingItem":{"text":"booked","_lang":"en"}},{"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]},"searchScope":null,"mayChangeScope":false,"page":1,"pageLength":101,"requestContext":"viaSearchFacet","mayExceedMaximumPageLength":true,"mayEstimate":true,"sortDelimitedStr":"","facetsSoon":false,"synonymsEnabled":false}
...
2024-06-07 18:27:21.160 Info: [Event:id=LuxSearch] Search 14325137610764299934 errored out: {"stack":"JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj\nin [anonymous], at 35:5 [javascript]\nin ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]\nin ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]\nin ./searchLib.mjs, at 178:38, in _search() [javascript]\nin ./searchLib.mjs, at 81:9, in search() [javascript]\nin ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]\nin ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]\nin ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]\nin /ds/lux/facets.mjs, at 12:9 [javascript]\nin ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]\nin /ds/lux/facets.mjs, at 11:0 [javascript]\nin xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"}) [javascript]\nin /ds/lux/facets.mjs [javascript]\nin /ds/lux/facets.mjs [javascript]","mlerr":{},"data":["checkMaybeLocal","wrapValue valObj"],"message":"Unexpected failure","name":"JS-BAD","retryable":false,"stackFrames":[{"uri":"[anonymous]","line":35,"column":5,"operation":".toArray()","language":"javascript"},{"uri":"../utils/utils.mjs","line":111,"column":14,"operation":"evalInContentDatabase","language":"javascript","isEval":false},{"uri":"./SearchCriteriaProcessor.mjs","line":188,"column":17,"operation":"getSearchResults","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":178,"column":38,"operation":"_search","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":81,"column":9,"operation":"search","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":313,"column":17,"operation":"_getViaSearchFacetValues","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":269,"column":17,"operation":"_getViaSearchFacets","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":52,"column":37,"operation":"getFacets","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":12,"column":9,"language":"javascript","isEval":false},{"uri":"../../lib/requestHandleLib.mjs","line":10,"column":9,"operation":"handleRequest","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":11,"column":0,"operation":"xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"})","language":"javascript","isEval":false}]}
2024-06-07 18:27:21.160 Info: [Event:id=LuxSearch] {"requestId":"14325137610764299934", "requestCompleted":false, "milliseconds":{"total":58941, "process":1}, "estimate":0, "page":1, "pageLength":101, "requestContext":"viaSearchFacet", "returned":0, "scope":"set", "changedScope":false, "criteria":{"AND":[{"containingItem":{"text":"booked", "_lang":"en"}}, {"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]}, "ignoredTerms":[], "query":"\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.prefixer(\"http://www.cidoc-crm.org/cidoc-crm/\");\nconst la = op.prefixer(\"https://linked.art/ns/terms/\");\nconst lux = op.prefixer(\"https://lux.collections.yale.edu/ns/\");\nconst skos = op.prefixer(\"http://www.w3.org/2004/02/skos/core#\");\n        const docs = fn.subsequence(\n          cts.search(cts.andQuery([\n          cts.jsonPropertyValueQuery('dataType', [\"Set\"], ['exact']),\n          cts.andQuery([cts.documentQuery(cts\n    .triples(\n      [],\n      [la(\"member_of\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      cts.orQuery([\n      cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n      cts.tripleRangeQuery(\n    [],\n    [lux(\"itemAny\")],\n    fn.insertBefore(\n    cts.values(\n      cts.iriReference(),\n      '',\n      ['eager', 'concurrent'],\n      cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n    ), \n    0,\n    sem.iri('/does/not/exist')\n  ), '=', [], 1\n  )\n    ])\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist'))), cts.fieldValueQuery(\n      [\"setTypeId\"],\n      [\"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d\"],\n      [\"exact\"],\n      1\n    )])\n        ]), [\"unfaceted\"]), \n          1, \n          101\n        ).toArray();\n        const results = [];\n        for (const doc of docs) {\n          results.push({id: doc.baseURI, type: doc.xpath('/json/type')});\n        }\n        results;"}
2024-06-07 18:27:21.161 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58944 milliseconds: responsibleCollections
2024-06-07 18:27:21.161 Notice: JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj
2024-06-07 18:27:21.161 Notice:+in [anonymous], at 35:5 [javascript]
2024-06-07 18:27:21.161 Notice:+in ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]
2024-06-07 18:27:21.161 Notice:+in ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]
2024-06-07 18:27:21.161 Notice:+in ./searchLib.mjs, at 178:38, in _search() [javascript]
2024-06-07 18:27:21.161 Notice:+in ./searchLib.mjs, at 81:9, in search() [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs, at 12:9 [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs, at 11:0 [javascript]
2024-06-07 18:27:21.161 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:27:21.165 Warning: {"errorResponse":{"statusCode":500,"status":"Internal Server Error","messageCode":"JS-BAD","message":"Unexpected failure"}}

2024-06-07 18:27:21.891 Info: [Event:id=LuxFacets] Searching for the 'responsibleCollections' facet values.
2024-06-07 18:27:21.892 Info: [Event:id=LuxSearch] Search 6675450089992221355 parameters: {"searchCriteria":{"_scope":"set","AND":[{"containingItem":{"text":"booked","_lang":"en"}},{"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]},"searchScope":null,"mayChangeScope":false,"page":1,"pageLength":101,"requestContext":"viaSearchFacet","mayExceedMaximumPageLength":true,"mayEstimate":true,"sortDelimitedStr":"","facetsSoon":false,"synonymsEnabled":false}
...
2024-06-07 18:28:21.288 Info: [Event:id=LuxSearch] Search 6675450089992221355 errored out: {"stack":"JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj\nin [anonymous], at 35:5 [javascript]\nin ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]\nin ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]\nin ./searchLib.mjs, at 178:38, in _search() [javascript]\nin ./searchLib.mjs, at 81:9, in search() [javascript]\nin ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]\nin ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]\nin ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]\nin /ds/lux/facets.mjs, at 12:9 [javascript]\nin ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]\nin /ds/lux/facets.mjs, at 11:0 [javascript]\nin xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"}) [javascript]\nin /ds/lux/facets.mjs [javascript]\nin /ds/lux/facets.mjs [javascript]","mlerr":{},"data":["checkMaybeLocal","wrapValue valObj"],"message":"Unexpected failure","name":"JS-BAD","retryable":false,"stackFrames":[{"uri":"[anonymous]","line":35,"column":5,"operation":".toArray()","language":"javascript"},{"uri":"../utils/utils.mjs","line":111,"column":14,"operation":"evalInContentDatabase","language":"javascript","isEval":false},{"uri":"./SearchCriteriaProcessor.mjs","line":188,"column":17,"operation":"getSearchResults","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":178,"column":38,"operation":"_search","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":81,"column":9,"operation":"search","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":313,"column":17,"operation":"_getViaSearchFacetValues","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":269,"column":17,"operation":"_getViaSearchFacets","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":52,"column":37,"operation":"getFacets","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":12,"column":9,"language":"javascript","isEval":false},{"uri":"../../lib/requestHandleLib.mjs","line":10,"column":9,"operation":"handleRequest","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":11,"column":0,"operation":"xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"})","language":"javascript","isEval":false}]}
2024-06-07 18:28:21.288 Info: [Event:id=LuxSearch] {"requestId":"6675450089992221355", "requestCompleted":false, "milliseconds":{"total":59396, "process":1}, "estimate":0, "page":1, "pageLength":101, "requestContext":"viaSearchFacet", "returned":0, "scope":"set", "changedScope":false, "criteria":{"AND":[{"containingItem":{"text":"booked", "_lang":"en"}}, {"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]}, "ignoredTerms":[], "query":"\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.prefixer(\"http://www.cidoc-crm.org/cidoc-crm/\");\nconst la = op.prefixer(\"https://linked.art/ns/terms/\");\nconst lux = op.prefixer(\"https://lux.collections.yale.edu/ns/\");\nconst skos = op.prefixer(\"http://www.w3.org/2004/02/skos/core#\");\n        const docs = fn.subsequence(\n          cts.search(cts.andQuery([\n          cts.jsonPropertyValueQuery('dataType', [\"Set\"], ['exact']),\n          cts.andQuery([cts.documentQuery(cts\n    .triples(\n      [],\n      [la(\"member_of\")],\n      [],\n      '=',\n      ['eager', 'concurrent'],\n      cts.orQuery([\n      cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n      cts.tripleRangeQuery(\n    [],\n    [lux(\"itemAny\")],\n    fn.insertBefore(\n    cts.values(\n      cts.iriReference(),\n      '',\n      ['eager', 'concurrent'],\n      cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n    ), \n    0,\n    sem.iri('/does/not/exist')\n  ), '=', [], 1\n  )\n    ])\n    )\n    .toArray()\n    .map((x) => sem.tripleObject(x))\n    .concat(sem.iri('/does/not/exist'))), cts.fieldValueQuery(\n      [\"setTypeId\"],\n      [\"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d\"],\n      [\"exact\"],\n      1\n    )])\n        ]), [\"unfaceted\"]), \n          1, \n          101\n        ).toArray();\n        const results = [];\n        for (const doc of docs) {\n          results.push({id: doc.baseURI, type: doc.xpath('/json/type')});\n        }\n        results;"}
2024-06-07 18:28:21.289 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 59400 milliseconds: responsibleCollections
2024-06-07 18:28:21.289 Notice: JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj
2024-06-07 18:28:21.289 Notice:+in [anonymous], at 35:5 [javascript]
2024-06-07 18:28:21.289 Notice:+in ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]
2024-06-07 18:28:21.289 Notice:+in ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]
2024-06-07 18:28:21.289 Notice:+in ./searchLib.mjs, at 178:38, in _search() [javascript]
2024-06-07 18:28:21.289 Notice:+in ./searchLib.mjs, at 81:9, in search() [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs, at 12:9 [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs, at 11:0 [javascript]
2024-06-07 18:28:21.289 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:21.292 Warning: {"errorResponse":{"statusCode":500,"status":"Internal Server Error","messageCode":"JS-BAD","message":"Unexpected failure"}}

From 22 Jun 23:

20230622-dev-node-154-8003-ErrorLog-trimmed.txt:2023-06-22 16:39:56.106 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58206 milliseconds: responsibleUnits
20230622-dev-node-237-8003-ErrorLog-trimmed.txt:2023-06-22 16:39:57.087 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58221 milliseconds: responsibleCollections
20230622-dev-node-5-8003-ErrorLog-trimmed.txt:2023-06-22 17:17:21.023 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58536 milliseconds: responsibleCollections
20230622-dev-node-5-8003-ErrorLog-trimmed.txt:2023-06-22 17:17:21.290 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58796 milliseconds: responsibleUnits

cc: @clarkepeterf

brent-hartwig · 2024-06-10T20:04:49Z

Acknowledged, understood, issue submitted, or resolved: submitted Log additional details of failed searchWillMatch requests #177, covering one of two endpoints discussed herein.

Out of the remaining entries non-trace event entries found in the 8003 and 8004 application error logs, there are three JS-JAVASCRIPT instances. They all report Error running JavaScript request: null and --per the stack trace-- fail in either the searchWillMatch of facets data service. These data services have 9 - 14 lines of code. Not sure what would be null that isn't already protected. Perhaps they are failing within their respective library code? Perhaps ML failed to load the code? The associated access and request log entries do not provide additional insights. A bit of a mystery but also only three requests.

2024-06-07 17:27:14.115 Notice: JS-JAVASCRIPT: Error running JavaScript request: null
2024-06-07 17:27:14.115 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 17:27:14.115 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 17:27:14.123 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"BadRequestError","message":"null"}}

2024-06-07 18:14:29.560 Notice: JS-JAVASCRIPT: Error running JavaScript request: null
2024-06-07 18:14:29.560 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:14:29.560 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:14:29.563 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"BadRequestError","message":"null"}}

2024-06-07 18:29:05.201 Notice: JS-JAVASCRIPT: Error running JavaScript request: null
2024-06-07 18:29:05.201 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:29:05.201 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:29:05.205 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"BadRequestError","message":"null"}}

brent-hartwig · 2024-06-11T13:55:33Z

Request response times:

Endpoint	< 10 millis	< 100 millis	< 1 second	< 3 seconds	3-10 seconds	> 10 seconds	Total
Search	2.25%	32.81%	80.93%	100.00%	0.00%	0.00%	2,758
Facets	5.56%	26.83%	88.57%	95.54%	2.79%	1.66%	32,348
Related Lists	0.00%	0.00%	47.31%	92.65%	5.49%	1.87%	911
Search Will Match	0.71%	15.25%	60.79%	82.57%	2.18%	15.25%	22,760
Search Estimate	5.30%	35.37%	98.84%	100.00%	0.00%	0.00%	15,385
Document	97.89%	99.98%	100.00%	100.00%	0.00%	0.00%	169,433
Other endpoints							51,416
Total Requests							295,011

brent-hartwig · 2024-06-11T14:43:30Z

Mined log output:

2024-06-07-perf-test-request-metrics.xlsx. Included separately as this one is more likely to be updated than the others.
other-mined-logs-output.zip

brent-hartwig · 2024-06-11T17:22:08Z

Executive Summary

Facet request pagination improved system stability. There were no v8 engine crashes were recorded in the main error logs.
- Caveat A: There was fewer requests than expected yet without facet request pagination, we couldn't complete the performance smoke test without the v8 engine crashing.
- Caveat B: There were five JS-FATAL instances within the application error logs, which may be a different way v8 engine crashes are reported. All were associated to searchWillMatch requests. We may want to log more information about searchWillMatch requests and/or configure a system monitoring test in Blue/Green. Follow-up tickets: Log additional details of failed searchWillMatch requests #177 and https://git.yale.edu/lux-its/ml-cluster-formation/issues/19
There were thousands of 504s (gateway timeouts) per minute throughout the test. We believe the data service proxies automatically retried these requests and ended up propping up this test's success. Unknown to what extent this was true in previous tests. We may want to take steps to lighten the load on this facility. Follow-up tickets: Improve backend request distribution lux-middletier#67 and Offer means to retrieve multiple primary names in a single request #180.
It's still possible for requests to ask for too much data to be passed between ML nodes. Only two such requests during this test. Here too we may want a new system monitoring test to tell us how often this is happening in Blue/Green.
Despite the custom application server error handler being in place and evidence of three timed out semantic facet requests, ML never returned a 408 (timed out). Follow-up ticket: Improve conditional logic for returning a 408 status code (time out) #178
There were three instances when requests failed whereby the logs were light on details. We may just want to check for these in future performance tests.

* #162: perf test procedure and issue template updates. * Removed a runtime related list check for target scopes other than item and work. The set scope is now a valid target scope and this trace event has never otherwise fired.

gigamorph · 2024-06-12T19:56:50Z

From the middle tier application log:

There are some peculiar errors that may be worth noting. The root cause must be some error from in the frontend, but @brent-hartwig - would you say this error "Document is not JSON" was returned by MarkLogic?

failed search estimate for [object Object] - Document is not JSON

There were 1,366 of them.

Also interesting is that there were only(?) five JS-BAD errors and all of them were with the same data:

2024-06-07T18:32:31.256Z error: [121] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T17:30:33.371Z error: [64] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T18:52:31.782Z error: [80] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T18:26:59.301Z error: [81] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T18:58:53.379Z error: [65] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj

And there were 22 errors about retries for 504, e.g.,

failed to get doc for /data/place/067073a4-c71a-4238-8531-4de99805941c?profile=name - call to /ds/lux/document.mjs: retry failed for 504 status after 31 attempts over 121.204 seconds with path:

As previously discussed, this is the behavior of the marklogic client library that we cannot change, but 31 attempts over 121 seconds seems just too much, doesn't quite make sense to me.

brent-hartwig · 2024-06-12T20:23:34Z

Acknowledged, understood, issue submitted, or resolved: see Log additional details of failed searchEstimate requests #183 and monitoring test https://git.yale.edu/lux-its/ml-cluster-formation/issues/24.

@gigamorph, thanks for pointing out the 'document is not JSON' errors. I encountered them and thought I chased down the issue but must have forgotten about it. Here's a complete set of entries per instance:

2024-06-07 17:35:41.880 Notice: XDMP-JSONDOC: Document is not JSON
2024-06-07 17:35:41.880 Notice:+in /ds/lux/searchEstimate.mjs [1.0-ml]
2024-06-07 17:35:41.884 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"XDMP-JSONDOC","message":"Document is not JSON"}}

Based the grep results/count of the middle line, all instances occurred within searchEstimate.mjs, leading me to believe something in the performance test is compelling the middle tier to send invalid search criteria. Undoubtedly there are other possibilities but I don't know that the backend logs will help with that determination.

brent-hartwig · 2024-06-12T20:25:49Z

@gigamorph, the five JS-BAD errors that reached the middle tier are a subset of all JS-BAD errors reported in the backend. These particular five are failed searchWillMatch requests. I'm yet to determine why they failed, and here too, the backend logs may not provide additional insights. Please see this comment for the original write-up.

brent-hartwig · 2024-06-12T21:07:19Z

@gigamorph,

As previously discussed, this is the behavior of the marklogic client library that we cannot change, but 31 attempts over 121 seconds seems just too much, doesn't quite make sense to me.

Agreed. Since the backend logs document https://lux.collections.yale.edu/data/place/067073a4-c71a-4238-8531-4de99805941c was successfully found 119 times, I suspect some to all of the "failed to get doc" errors (all URIs) recorded by the middle tier were caused by a MarkLogic application server request queue being full for the majority of the test. The original comment associating 504s to a full app server queue may be found here. And the associated follow-up tickets are project-lux/lux-middletier#67 and #180.

That aside, we have also discussed submitting a support ticket / RFE requesting more control over data service retry behavior. I'm not opposed to doing so but wouldn't use this example. Hopefully the middle tier receives 504s (or 503s) when an app server request queue is full. Such requests should be retried.

One of 119 successfully requests for the above document:

2024-06-07 17:41:43.220 Info: [Event:id=LuxNamedProfiles] Applied the 'name' profile to 'https://lux.collections.yale.edu/data/place/067073a4-c71a-4238-8531-4de99805941c' in 1 milliseconds.

brent-hartwig · 2024-06-13T14:52:28Z

Related to there being fewer than expected backend requests...

Frontend to Backend Request Ratio

During a performance test, how many frontend requests should there be for every backend request? We have never attempted to establish this metric. We would either have to account for requests the frontend makes to other sources (such as for images and analytics) or include them.

When included, this test has 105% to 202% more frontend requests compared to the 14 Feb 24 and 22 Jun 23 tests, respectively. Since SI was introduced in Mar 2024, we can't really compare to tests older than that.

Perhaps we can exclude those other requests. We asked QA if their test logs can tell us how many middle tier requests there were, ideally by status code. (Middle tier requests have paths beginning with "/api/" or "/data/".)

Middle Tier to Backend Request Ratio

@gigamorph was able to determine there were 363,788 middle tier requests during the test via:

SELECT  COUNT(*) FROM lux_cloudfront_tst
WHERE date = DATE '2024-06-07' AND time > '17:02:00' AND time < '19:16:30'
AND (cs_uri_stem LIKE '/api/%' OR cs_uri_stem LIKE '/data/%');

Given most if not all middle tier requests result in multiple backend requests, neither of us would expect more middle tier requests than backend requests, yet that's what the counts look like as there were only 307,511 backend requests. We do not yet know why.

Middle tier request counts by status code:

Status code	Requests	Percent	Requests	Percent
0	66,949	18.40%
200	178,199	48.98%
304	9,700	2.67%
Sub-total: through 300s			254,848	70.05%


400	1,218	0.33%
404	755	0.21%
500	12	0.00%
502	485	0.13%
503	106,465	29.27%
504	5	0.00%
Sub-total: 400s and 500s			108,940	29.95%
Totals	363,788		363,788

Questions:

Were requests with status code 0 okay? I'm leaning towards yes. From https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html (h/t Seong), "...indicates that the viewer closed the connection before the server could respond to the request." This could be an indication of a VU transaction ending and not waiting for the rest of the transaction's responses yet still considering the transaction successful.
Why was 304 (not modified) returned for nearly 10K requests when caching should be disabled? Answered here!
Why was the performance test's error rate only 4.41% when 29.95% of the requests had a status response code in the 400s or 500s?

Sampling of requests with 503 response status codes:

kamerynB · 2024-06-13T15:18:23Z

@brent-hartwig SI was implemented in March. For almost every clickable action, there is an SI event being pushed. I'm not sure if that helps though.

brent-hartwig · 2024-06-13T15:22:20Z

Thanks, @kamerynB. It does. I updated my comment accordingly.

brent-hartwig · 2024-06-13T18:32:18Z

Why was 304 (not modified) returned for nearly 10K requests when caching should be disabled?

Seong researched and provided the following explanation. I agree we do not need to pursue this. In this test, they only accounted for 3% of the middle tier responses.

The node express library that middle tier uses as the web middleware, automatically generates and attaches an etag to each response. ETag is basically hash of the response data that enables the client to determine if the content has changed since the last request. Any well-established client including the web browsers will include an "If-None-Match: ${etag}" header in its requests telling the server to only send the full response (w/ 200) if the etag for the new response is not the previous one that the client(browser) cache currently has. Otherwise the server will just send back the 304 code with an empty body to save the traffic load.

So, that's why you see a lot of 304s in the browser network console. And the thing is, CloudFront and the web cache (Varnish) will each likely do the same thing, so if we need to (I doubt we need), it is not easy to disable all this 304 responses.

brent-hartwig · 2024-06-18T18:02:42Z

Decisions on executive summary items from conversation with @jffcamp and @prowns today:

No. 1's caveat A and no. 2: By 17 Jun 24, Seong and Xinjian enabled logging on all load balancers (frontend, web cache, middle tier, and MarkLogic in Blue, Green, and Dev). This is expected to provide additional request/response insights by status code.
No. 1's caveat B: Brent to address Log additional details of failed searchWillMatch requests #177 followed by Xinjian implementing https://git.yale.edu/lux-its/ml-cluster-formation/issues/19.
No. 3: Brent to submit a ticket. We want to know if this is happening in PROD, and details about such requests. TBD if additional logging is needed for such insights versus only knowing it happened.
No. 4: Leave for now.
No. 5: The searchWillMatch instances will be helped by Log additional details of failed searchWillMatch requests #177. There was also a facets instance, that would not be.

brent-hartwig · 2024-06-21T13:51:52Z

Closing this ticket as all findings have been "acknowledged, understood, issue submitted, or resolved". For the most part, links were added to #181 comments that continue the associated investigation or links to new tickets were added.

brent-hartwig added performance testing labels May 28, 2024

brent-hartwig assigned gigamorph, xinjianguo, brent-hartwig, clarkepeterf, jffcamp, prowns and kamerynB May 28, 2024

This was referenced May 28, 2024

Paginate non-semantic facet requests #160

Closed

Paginate semantic facet requests #161

Closed

roamye mentioned this issue May 31, 2024

Add pagination to facets on results pages project-lux/lux-frontend#252

Closed

9 tasks

brent-hartwig added a commit that referenced this issue Jun 10, 2024

#162: perf test procedure and issue template updates.

7733f38

brent-hartwig mentioned this issue Jun 13, 2024

Offer means to retrieve multiple primary names in a single request #180

Closed

10 tasks

brent-hartwig mentioned this issue Jun 17, 2024

Performance Test - scheduled for 2024-06-18 #181

Closed

31 tasks

brent-hartwig mentioned this issue Jun 20, 2024

Log additional details of failed searchEstimate requests #183

Closed

9 tasks

brent-hartwig closed this as completed Jun 21, 2024

brent-hartwig mentioned this issue Jun 27, 2024

161 pagination plus 100 alternative names rebased #200

Merged

brent-hartwig mentioned this issue Jul 17, 2024

Performance Test - scheduled for 2024-07-17 #226

Closed

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Test - scheduled for 2024-06-07 #162

Performance Test - scheduled for 2024-06-07 #162

brent-hartwig commented May 28, 2024 •

edited

Loading

brent-hartwig commented Jun 6, 2024 •

edited

Loading

gigamorph commented Jun 7, 2024

gigamorph commented Jun 7, 2024

gigamorph commented Jun 7, 2024 •

edited

Loading

brent-hartwig commented Jun 7, 2024

brent-hartwig commented Jun 7, 2024 •

edited

Loading

brent-hartwig commented Jun 7, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024

brent-hartwig commented Jun 10, 2024

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 11, 2024

brent-hartwig commented Jun 11, 2024

brent-hartwig commented Jun 11, 2024 •

edited

Loading

gigamorph commented Jun 12, 2024

brent-hartwig commented Jun 12, 2024 •

edited

Loading

brent-hartwig commented Jun 12, 2024

brent-hartwig commented Jun 12, 2024 •

edited

Loading

brent-hartwig commented Jun 13, 2024 •

edited

Loading

kamerynB commented Jun 13, 2024

brent-hartwig commented Jun 13, 2024

brent-hartwig commented Jun 13, 2024

brent-hartwig commented Jun 18, 2024

brent-hartwig commented Jun 21, 2024

Performance Test - scheduled for 2024-06-07 #162

Performance Test - scheduled for 2024-06-07 #162

Comments

brent-hartwig commented May 28, 2024 • edited Loading

brent-hartwig commented Jun 6, 2024 • edited Loading

gigamorph commented Jun 7, 2024

gigamorph commented Jun 7, 2024

gigamorph commented Jun 7, 2024 • edited Loading

brent-hartwig commented Jun 7, 2024

brent-hartwig commented Jun 7, 2024 • edited Loading

brent-hartwig commented Jun 7, 2024 • edited Loading

brent-hartwig commented Jun 10, 2024

ML Monitoring History

brent-hartwig commented Jun 10, 2024

brent-hartwig commented Jun 10, 2024 • edited Loading

ML App Server Metrics

brent-hartwig commented Jun 10, 2024 • edited Loading

brent-hartwig commented Jun 10, 2024 • edited Loading

brent-hartwig commented Jun 10, 2024 • edited Loading

brent-hartwig commented Jun 10, 2024 • edited Loading

brent-hartwig commented Jun 10, 2024 • edited Loading

brent-hartwig commented Jun 11, 2024

brent-hartwig commented Jun 11, 2024

brent-hartwig commented Jun 11, 2024 • edited Loading

Executive Summary

gigamorph commented Jun 12, 2024

From the middle tier application log:

brent-hartwig commented Jun 12, 2024 • edited Loading

brent-hartwig commented Jun 12, 2024

brent-hartwig commented Jun 12, 2024 • edited Loading

brent-hartwig commented Jun 13, 2024 • edited Loading

Frontend to Backend Request Ratio

Middle Tier to Backend Request Ratio

kamerynB commented Jun 13, 2024

brent-hartwig commented Jun 13, 2024

brent-hartwig commented Jun 13, 2024

brent-hartwig commented Jun 18, 2024

brent-hartwig commented Jun 21, 2024

brent-hartwig commented May 28, 2024 •

edited

Loading

brent-hartwig commented Jun 6, 2024 •

edited

Loading

gigamorph commented Jun 7, 2024 •

edited

Loading

brent-hartwig commented Jun 7, 2024 •

edited

Loading

brent-hartwig commented Jun 7, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 10, 2024 •

edited

Loading

brent-hartwig commented Jun 11, 2024 •

edited

Loading

brent-hartwig commented Jun 12, 2024 •

edited

Loading

brent-hartwig commented Jun 12, 2024 •

edited

Loading

brent-hartwig commented Jun 13, 2024 •

edited

Loading