-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance as database grows #1425
Comments
A few months back I ran a quick test to measure the throughput of the IBM FHIR server over time. Big disclaimer: these results should be taken with a grain of salt since they were mostly used as a quick benchmark, but here's the throughput in FHIR resources/s for a total of 6 million resources: The resources were sent to the server as update-as-create Bundles each containing between 10 and 100 resources (each bundle contained one Encounter and some associated Conditions, Procedures, Observations). The left side shows the throughput using a Postgres 12.3 with default options, the right one with the attached, optimized The sender/client is a Spring Boot Batch app reading data from a DB, mapping it to FHIR Bundles and sending them via REST, using 8 threads in parallel. When sending the resources to S3 (minio) the throughput was around 10.000 resources/s. I ran the same tests against a HAPI FHIR server (v4, R4) and averaged 800 resources/s, but it remained more consistent over time. Definitely interested in some sort of "FHIRBench" which can benchmark servers in various scenarios:
(each one multi-threaded, at various fill-stages of the DB, etc.) |
@chgl thank you for this - really useful. There's a new project called fhir-bucket (not yet merged to master) which periodically scans S3 buckets looking for new files (NDJSON and JSON) to load and is designed to do so with a lot of parallelism (loading activity is coordinated across several instances). This covers bulk-load type scenarios. The FHIRBench scenarios you described are also being discussed, and contributions are welcome if you are able to. |
@chgl also, we recently introduced a change to our logical id generation to force right-hand index inserts. This change is included in our 4.3.3 release. This particularly benefits PostgreSQL data sources because it reduces the number of blocks being modified and therefore the number of full page writes being logged, which otherwise gets worse as the database grows. |
Its probably also worth noting that we've made a lot of performance updates since 4.3.0 as we started testing in earnest. I'd be really curious to see how 4.3.3 compares to what you saw on 4.3.0.
A couple of those were db2-specific, but most will benefit PostgreSQL as well. |
Oops, I should have refreshed before submitting that :-) |
Thanks for the information! I've seen the performance improvements in the release notes and am curious as well :) - I will hopefully find some time in the next two weeks. |
Some update running on v4.3.3: I've played around with the memory limits a bit more because I believe the drops in throughput are caused by either swapping or GC or both - note that these runs are executed on just 8GB of RAM. It's still not optimized enough yet, but I've found another interesting discovery: the job I use supports sending either transactions with conditional updates or update-as-create bundles. The left shows the throughput when using transactions, the right when using update-as-create. I assumed update-as-create would generally achieve higher throughput as it requires less processing. Do you have any insights/recommendations here? (quick edit: one factor might acutally be #1362 which doesn't apply when setting custom resource ids) CPU and RAM stats: |
@chgl I'm actively looking into performance right now so hopefully will get a chance to take a look at what you're showing here. Are you running with openjdk? JDK Mission Control is pretty useful looking for GC pauses. Add something like this to the jvm.options file in your
Then you can load the run1.jfr file with JDK Mission Control. My personal suspicion is that the dips could be related to database checkpoints. If you have the database files and WAL on different mount points then it's easy to tell looking at iostat if you get a surge in writes to the database files with a drop in writes to the WAL. I'll cook up some queries to try and glean more info from the db stats tables. |
Signed-off-by: Robin Arnold <robin.arnold23@ibm.com>
With over 70 million resources resident in the target database (PostgreSQL 12), a bulk load of resources using the fhir-bucket project was able to ingest over 700 resources/second using a concurrency of 100 threads processing bundles containing up to 100 resources each. We are currently planning a refactor of the part of the schema used to store references and this should improve scaling by reducing the size of our indexes significantly. |
Is your feature request related to a problem? Please describe.
Load large volumes of data and track how ingestion performance varies over time.
Asses how access and search performance varies.
Understand scaling, such as memory and disk requirements for a large volume of data.
Use the new fhir-bucket project to load resources stored in COS.
The text was updated successfully, but these errors were encountered: