Server-side aggregation of matches for many pieces of content when testing rules #344
Replies: 3 comments
-
This seems like the right approach to me (handling server-side rather than client-side). |
Beta Was this translation helpful? Give feedback.
-
Is it possible this will create a large enough amount of work for CAPI that it could affect its performance? Should we check in with that team? I ask this as we'll probably want to check against a very large number of articles - probably more than most queries that go to CAPI - though on the other hand we don't create new rules that often so might not need to do large numbers of corpus checks. |
Beta Was this translation helpful? Give feedback.
-
We can, and probably should do that! My assumption here is that, as Michael B. once said to me, 'CAPI is hardcore'. Taking a look at the status page, it's currently serving ~283 reqs/second across private and public accounts. Safe to say that we'll want to cache our results no matter what happens for some duration. Suspect we could get away with a simple time-expired cache that keeps pages for some reasonable duration TBC. One place that will be harder: the checker! We should consider the load there, as lots of checking it may affect PROD Typerighter users. Having said that, a cache might be useful here, as I suspect users will go backwards and forwards between the same pattern often, esp. if they're working to understand the difference between pattern A and pattern B, and even if there's not really an impact on load, the speed benefit will impact our users. There are standard, powerful cache implementations available as a part of Play, so this shouldn't be too much work. We could also want to look at prioritising traffic within the rule management service. I think we should look at the real impact of checking rules on the PROD service before we take this step – the service routinely checks 5000 word pieces with ~13,000 rules and has a maximum p95 check duration of 500ms max. and 1-200ms average, so 5,000,000 words with 1 rule feels like it'll be within an order of magnitude. We'll find out. I think there are some product questions to answer here. On my mind – do we have different sorts of checks? For example, a deterministic, 'standard' check with a large but predefined search, plus a CAPI search check to cover particular pieces? The 'standard' check is a good noise check, but may be inadequate when checking matches against neologisms etc. |
Beta Was this translation helpful? Give feedback.
-
We would like to be able to test a new rule against existing content in CAPI. The diagram below shows one way we might do this.
We must communicate with CAPI and the checker service to do this. I think we should prefer orchestrating on the server:
n
articles in CAPI.Beta Was this translation helpful? Give feedback.
All reactions