Investigate performance as database grows #1425

punktilious · 2020-08-18T21:09:12Z

Is your feature request related to a problem? Please describe.
Load large volumes of data and track how ingestion performance varies over time.

Asses how access and search performance varies.

Understand scaling, such as memory and disk requirements for a large volume of data.

Use the new fhir-bucket project to load resources stored in COS.

chgl · 2020-08-19T14:57:53Z

A few months back I ran a quick test to measure the throughput of the IBM FHIR server over time. Big disclaimer: these results should be taken with a grain of salt since they were mostly used as a quick benchmark, but here's the throughput in FHIR resources/s for a total of 6 million resources:

The resources were sent to the server as update-as-create Bundles each containing between 10 and 100 resources (each bundle contained one Encounter and some associated Conditions, Procedures, Observations). The left side shows the throughput using a Postgres 12.3 with default options, the right one with the attached, optimized postgres.conf.txt active. DB, server, and client were run on the same Ubuntu 18.04 VM with 16GB of RAM and 8 vCPUs. If I recall correctly, the CPUs weren't entirely saturated. The containers were reset between the runs.

The sender/client is a Spring Boot Batch app reading data from a DB, mapping it to FHIR Bundles and sending them via REST, using 8 threads in parallel. When sending the resources to S3 (minio) the throughput was around 10.000 resources/s.

I ran the same tests against a HAPI FHIR server (v4, R4) and averaged 800 resources/s, but it remained more consistent over time.

Definitely interested in some sort of "FHIRBench" which can benchmark servers in various scenarios:

single-resource posts
multi-resource bundle posts
conditional creates
conditional updates
... the same for reading
bulk im/export
horizontal scalability (do YugaByte/CockroachDB help)

(each one multi-threaded, at various fill-stages of the DB, etc.)

jvm.options.txt
postgres.conf.txt
docker-compose.yml.txt

punktilious · 2020-08-19T17:17:17Z

@chgl thank you for this - really useful. There's a new project called fhir-bucket (not yet merged to master) which periodically scans S3 buckets looking for new files (NDJSON and JSON) to load and is designed to do so with a lot of parallelism (loading activity is coordinated across several instances). This covers bulk-load type scenarios. The FHIRBench scenarios you described are also being discussed, and contributions are welcome if you are able to.

punktilious · 2020-08-19T17:24:17Z

@chgl also, we recently introduced a change to our logical id generation to force right-hand index inserts. This change is included in our 4.3.3 release. This particularly benefits PostgreSQL data sources because it reduces the number of blocks being modified and therefore the number of full page writes being logged, which otherwise gets worse as the database grows.

lmsurpre · 2020-08-19T17:25:20Z

Its probably also worth noting that we've made a lot of performance updates since 4.3.0 as we started testing in earnest. I'd be really curious to see how 4.3.3 compares to what you saw on 4.3.0.
@punktilious can go into the details, but here are some highlights from the release notes:

A couple of those were db2-specific, but most will benefit PostgreSQL as well.

lmsurpre · 2020-08-19T17:31:13Z

Oops, I should have refreshed before submitting that :-)

chgl · 2020-08-19T17:45:30Z

Thanks for the information! I've seen the performance improvements in the release notes and am curious as well :) - I will hopefully find some time in the next two weeks.

chgl · 2020-08-28T15:11:45Z

Some update running on v4.3.3: I've played around with the memory limits a bit more because I believe the drops in throughput are caused by either swapping or GC or both - note that these runs are executed on just 8GB of RAM. It's still not optimized enough yet, but I've found another interesting discovery: the job I use supports sending either transactions with conditional updates or update-as-create bundles.

The left shows the throughput when using transactions, the right when using update-as-create. I assumed update-as-create would generally achieve higher throughput as it requires less processing. Do you have any insights/recommendations here? (quick edit: one factor might acutally be #1362 which doesn't apply when setting custom resource ids)

CPU and RAM stats:

punktilious · 2020-09-01T20:07:51Z

@chgl I'm actively looking into performance right now so hopefully will get a chance to take a look at what you're showing here. Are you running with openjdk? JDK Mission Control is pretty useful looking for GC pauses. Add something like this to the jvm.options file in your WLPHOME/usr/servers/fhir-server directory:

-XX:StartFlightRecording=disk=true,dumponexit=true,filename=run1.jfr,settings=profile,path-to-gc-roots=true
-XX:FlightRecorderOptions=stackdepth=96

Then you can load the run1.jfr file with JDK Mission Control.

My personal suspicion is that the dips could be related to database checkpoints. If you have the database files and WAL on different mount points then it's easy to tell looking at iostat if you get a surge in writes to the database files with a drop in writes to the WAL. I'll cook up some queries to try and glean more info from the db stats tables.

Signed-off-by: Robin Arnold <robin.arnold23@ibm.com>

punktilious · 2020-09-16T02:11:49Z

With over 70 million resources resident in the target database (PostgreSQL 12), a bulk load of resources using the fhir-bucket project was able to ingest over 700 resources/second using a concurrency of 100 threads processing bundles containing up to 100 resources each. We are currently planning a refactor of the part of the schema used to store references and this should improve scaling by reducing the size of our indexes significantly.

punktilious added the performance performance label Aug 18, 2020

punktilious self-assigned this Aug 18, 2020

chgl mentioned this issue Aug 28, 2020

Scalability: single DB instance, multiple servers hapifhir/hapi-fhir-jpaserver-starter#48

Closed

lmsurpre added this to the Sprint 17 milestone Sep 15, 2020

punktilious added a commit that referenced this issue Sep 15, 2020

issue #1425 measure performance as database grows

4ccbb43

Signed-off-by: Robin Arnold <robin.arnold23@ibm.com>

punktilious closed this as completed Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate performance as database grows #1425

Investigate performance as database grows #1425

punktilious commented Aug 18, 2020

chgl commented Aug 19, 2020 •

edited

Loading

punktilious commented Aug 19, 2020

punktilious commented Aug 19, 2020 •

edited

Loading

lmsurpre commented Aug 19, 2020 •

edited

Loading

lmsurpre commented Aug 19, 2020

chgl commented Aug 19, 2020

chgl commented Aug 28, 2020 •

edited

Loading

punktilious commented Sep 1, 2020

punktilious commented Sep 16, 2020

Investigate performance as database grows #1425

Investigate performance as database grows #1425

Comments

punktilious commented Aug 18, 2020

chgl commented Aug 19, 2020 • edited Loading

punktilious commented Aug 19, 2020

punktilious commented Aug 19, 2020 • edited Loading

lmsurpre commented Aug 19, 2020 • edited Loading

lmsurpre commented Aug 19, 2020

chgl commented Aug 19, 2020

chgl commented Aug 28, 2020 • edited Loading

punktilious commented Sep 1, 2020

punktilious commented Sep 16, 2020

chgl commented Aug 19, 2020 •

edited

Loading

punktilious commented Aug 19, 2020 •

edited

Loading

lmsurpre commented Aug 19, 2020 •

edited

Loading

chgl commented Aug 28, 2020 •

edited

Loading