Load testing for catalog-next #449

adborden · 2020-09-24T16:47:15Z

User Story

As a data.gov operator, I want to load test catalog-next so that I have confidence that there are no critical performance issues that should be resolved prior to catalog-next's launch.

Acceptance Criteria

GIVEN I can represent the existing load on catalog-classic
WHEN I apply this load to catalog-next
AND I analyze the performance logs post-test
THEN I don't see any critical performance issues in CPU, Memory, Disk, Network, Database throughput

Details / tasks

Define how much load the system receives in the current production environment
Define a set of tools to test the load (maybe ab)
If possible, get a list of indexed URLs in Google, Bing (and other if needed)
- Test this URLs in the new environment and propose fixes or create a list of 301 redirections

Notes

https://locust.io/ is the recommended load testing tool at the gitter CKAN channel

adborden · 2020-09-24T16:48:22Z

@avdata99 can you elaborate on why you want to look at the indexed URLs in the search engines? I wasn't sure how that's related.

avdata99 · 2020-09-24T19:31:49Z

I guess there are thousands of links that point to the catalog. When we move to the new one, we will send 404s for many of them. I'm not sure if the list of indexed URLs is the best starting point, maybe the 1,000 or 10,000 most visited URLs in Analytics.

I think it is a good idea to try to minimize this impact. If we get this list and test these URLs in the staging environment, maybe we can discover some clever redirects that might help here. Or at least know how big this impact will be

kimwdavidson · 2020-09-28T17:48:52Z

Use access logs to determine current load on production site

CodeShtuff · 2020-10-20T21:02:44Z

Began looking at locust and familiarizing myself with the tool.
Have a quick/dirty local POC for running locust in containers to generate load against a given site. Question that arises from this initial review is where this load should be generated from (tester's desktop, server, or?) and if that matters.
Once we have the desired list of URLs, and the desired number of users and spawn rate for the attempts, should be able to create the relevant script and configurations for locust to perform the load test.

Note:
Locust will only generate the load and provide the response time metrics as seen from the outside. To be able to verify the items described in the acceptance criteria (CPU, Memory, Disk, Network, Database throughput) another tool would be required (tool still to be determined) or at the very least have a visual check of values from within the system(s).
May also want to add acceptance criteria that the site appears responsive during the load test as experienced by a normal user browsing to provide a subjective experience result.

adborden · 2020-10-21T23:43:11Z

@danmayol can you share any code you have, like docker-compose.yml or a locust config?

I think we'd want to run this from the jumpbox, however we don't want to install docker or any containers there. That's not a deal breaker, we could run this from a laptop if we thought locust was the best tool.

Honestly, I'm looking for something simple that would take an apache access log or similar format and just randomize or replay requests based on the distribution of requests in there. If we can write some simple python in order to do that for locust, that sounds good. We're only testing read requests here, there's no write traffic on catalog, so this should be pretty simple.

I haven't used it before, but sar is fairly straight forward and I think covers everything we want (CPU, Memory, Disk, Network), and logs data on a configurable interval. I'm not sure about how to measure the backing services like solr and RDS. If we have New Relic configured, we'll get some insight there.

CodeShtuff · 2020-10-22T13:25:01Z

@adborden Absolutely! Much the same thoughts I had (to not over-engineer this).

I've also been thinking that measuring CPU, Memory, etc. is good for benchmarking what resource usage is under load and looking for variances, but the real value in load testing would be to get raw numbers of the number of concurrent users/load before the site itself becomes slow/unresponsive to impact actual use. In other words, even if CPU usage and memory is minimal, if we find that we can only support 10 concurrent users before the site is slow, that is a more meaningful data point at which point we would then want to look at the resource use to see if that is the underlying cause for the performance breakdown. Having logging from something like sar would definitely be good, but I wonder if that is more a system monitoring goal and not a load testing goal? (just brainstorming)

This is what I have been playing with in regards to locust. We should of course not be locked into this, simply where I started based on the ticket notes.

docker-compose.yml:

version: "3"
services:
  master:
    image: locustio/locust
    volumes:
      - ./scripts:/scripts
    ports:
      - "8089:8089"
    
    # Example with current catalog as target host
    #command: -f /scripts/locustfile.py --master -H https://catalog.data.gov/dataset
    
    # Example command if we want this to run headless
    #command: -f /scripts/locustfile.py --master -H https://catalog-next.sandbox.datagov.us --headless -u 1000 -r 100 --run-time 1h30m

    # Target host of catalog-next sandbox
    command: -f /scripts/locustfile.py --master -H https://catalog-next.sandbox.datagov.us

  worker:
    image: locustio/locust
    command: -f /scripts/locustfile.py --worker --master-host=master
    volumes:
      - ./scripts:/scripts

locustfile.py (simple test to run through a list of url paths from file and get the url):

from locust import HttpUser, TaskSet, task

class BasicTaskSet(TaskSet):
    #def on_start(self):
    #    """ on_start is called when a Locust start before 
    #        any task is scheduled
    #    """
    # Can run a login for example if needed for that user
    #    self.login()

    # Sample login 
    #def login(self):
    #    self.client.post("/user/login",
    #                     {"username":"user",
    #                      "password":"password"})

    @task(1)
    def index(self):

        fname = "/scripts/locust-catalognext-urls"
        with open (fname, "r") as urlfile:
            data = urlfile.readlines()

        for url in data:
            url = url.rstrip("\n")
            self.client.get(url)
            
            # If basic auth used
            #self.client.request(method="GET", url="/", auth=("user", "pass"))

class BasicTasks(HttpUser):
    tasks = [BasicTaskSet]
    min_wait = 5000
    max_wait = 9000

locust-catalognext-urls (just some manually grabbed URLs, still need to get a better list or generate from a log as you mentioned):

/
/dataset
/organization
/user/login
/dataset?page=250
/dataset/veteraninformationservice
/dataset/enhancement-of-rna-from-ffpe-v27-2

If run manually through a gui (local desktop) then we define the number of users and spawn rate manually. Or we can run headless and have the desired values passed in (sample command line in script). As the container is being passed the command, just a matter of performing a docker-compose up and then letting it run (if headless) or going to the browser (http://localhost:8089) and starting the load test.

Let me know what you think and how we should proceed (and we can discuss in scrum, zoom, or slack if easier). Thanks!

CodeShtuff · 2020-10-28T14:47:43Z

Aaron and I discussed a bit on Monday. Some notes on what we discussed included:

Although this may be blocked from being able to run a true load test until we get the production environment ready (which will be sized properly for confirming load) it will be good to get this to a ready state so that all that is left is running of the load tests.
To be able to better simulate 'real' traffic patterns, logs will be exported from production to be used as the seed/example for running the load tests.
Given the desire to start small and improve upon testing as the need arises, locust is a good start point. The current plan is to have someone run the load test from their desktop.

Aside from the conversation:

In further analyzing the first version of the above locust test script, realized there was a flaw in the logic for the test in that one locust task would walk through querying all the urls in the presented file (as fast as it could) which would limit our ability to control the delay periods (min_wait, max_wait) between queries and make it harder to get a true representation of load effect.

As such, reworked it slightly to have each task call a single url which is randomly chosen from the urls in the file.
With one call per task (and the task performed by each user) we can then utilize user counts and hatch rates (as well as the wait values) to better model various load scenarios (slow ramp up, spikes, long waits between retries, etc). The random selection will need some tweaking if we decide to pull in the urls directly from the log files (as that would be a large number of entries, we will likely be working with subsets).

Last but not least, one item I neglected to mention originally is that if desired we can also scale the worker containers so that the load is not being generated by a single container (although still from a single host of course). This can be accomplished simply by adding the desired number of scaled worker to the docker-compose call, for example to run 5 worker containers:
docker-compose up --scale worker=5

Tar ball with all relevant files here:

locust_poc.tar.gz

adborden · 2020-10-31T00:25:59Z

I pulled a week of logs from catalog production. They are not sorted in chronological order, so beware.

https://drive.google.com/file/d/1cYvziM8IwIOeqj2cD6z6ychHIrvjqH6h/view?usp=sharing

adborden · 2020-11-16T19:46:38Z

@avdata99 here are requests to catalog.data.gov over a two week period. This can be used to estimate maximum concurrent users we're seeing today.

avdata99 · 2020-11-16T19:53:24Z

Next steps:

Validate the number of concurrent users (we now are using 200). Check if the configuration is correct.
Invite @jbrown-xentity next days and run these tests while someone monitors the machine status while we run these tests.

avdata99 · 2021-01-19T19:30:32Z

New test results

Basic tests

 Name                                                          # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET /                                                             40     0(0.00%)  |    2515    1899    5172    2400  |    0.67    0.00
 GET /group                                                        33     0(0.00%)  |     853     700    3171     760  |    0.55    0.00
 GET /harvest                                                      35     0(0.00%)  |     981     764    1694     870  |    0.59    0.00
 GET /organization                                                 37     0(0.00%)  |    1090     865    2031    1000  |    0.62    0.00
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                       145     0(0.00%)  |    1403     700    5172    1000  |    2.44    0.00

Response time percentiles (approximated)
 Type     Name                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      /                                                                2400   2500   2800   2900   3000   3600   5200   5200   5200   5200   5200     40
 GET      /group                                                            760    790    800    810    840   1100   3200   3200   3200   3200   3200     33
 GET      /harvest                                                          870    930   1100   1200   1300   1600   1700   1700   1700   1700   1700     35
 GET      /organization                                                    1000   1100   1100   1100   1300   1900   2000   2000   2000   2000   2000     37
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                       1000   1300   2100   2200   2500   2900   3200   3600   5200   5200   5200    145

Test from apache logs

 Name                                                          # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET LANG_dataset                                                  21   10(47.62%)  |    3130     728    8329    2500  |    0.46    0.22
 GET api                                                           45     1(2.22%)  |    2297     586    7269    1800  |    0.98    0.02
 GET dataset                                                     1139  968(84.99%)  |    2976     568   32221    2100  |   24.89   21.16
 GET group                                                         27   25(92.59%)  |    5009     642   32124    2000  |    0.59    0.55
 GET harvest                                                       70   33(47.14%)  |    5721     637   12487    6400  |    1.53    0.72
 GET org                                                           63   58(92.06%)  |    3056     607   16251    1900  |    1.38    1.27
 GET others                                                        19  19(100.00%)  |    3559     606   13651    2900  |    0.42    0.42
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                      1384 1114(80.49%)  |    3147     568   32221    2100  |   30.25   24.35

Response time percentiles (approximated)
 Type     Name                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      LANG_dataset                                                     2500   3800   4200   5100   6100   6700   8300   8300   8300   8300   8300     21
 GET      api                                                              1800   2200   2900   3400   5000   5100   7300   7300   7300   7300   7300     45
 GET      dataset                                                          2100   2900   3600   3800   5500   8100  16000  16000  32000  32000  32000   1139
 GET      group                                                            2000   2500   3000   5000  16000  32000  32000  32000  32000  32000  32000     27
 GET      harvest                                                          6500   8200   8500   8800  11000  11000  12000  12000  12000  12000  12000     70
 GET      org                                                              1900   2700   3700   3900   5200  13000  16000  16000  16000  16000  16000     63
 GET      others                                                           2900   3500   4500   5100   6000  14000  14000  14000  14000  14000  14000     19
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                       2100   3000   3700   4100   6400   8600  16000  16000  32000  32000  32000   1384

Error report
 # occurrences      Error                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------
 967                GET dataset: HTTPError('404 Client Error: Not Found for url: dataset',)                             
 10                 GET LANG_dataset: HTTPError('404 Client Error: Not Found for url: LANG_dataset',)                   
 19                 GET others: HTTPError('404 Client Error: Not Found for url: others',)                               
 33                 GET harvest: HTTPError('404 Client Error: Not Found for url: harvest',)                             
 58                 GET org: HTTPError('404 Client Error: Not Found for url: org',)                                     
 25                 GET group: HTTPError('404 Client Error: Not Found for url: group',)                                 
 1                  GET api: HTTPError('400 Client Error: BAD REQUEST for url: api',)                                   
 1                  GET dataset: HTTPError('500 Server Error: Internal Server Error for url: dataset',)                 
--------------------------------------------------------------------------------------------------------------------------------------------

avdata99 · 2021-01-28T16:39:45Z

We disabled ckanext.datagovcatalog.add_packages_tracking_info to skip the extra package_show call and /dataset/nnnnn endpoint improved.
Is not enough

 Name                                                          # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET api-group-list                                               251     1(0.40%)  |     547     206    5176     370  |    0.35    0.00
 GET api-organization-list                                        250     1(0.40%)  |     504     197    6491     360  |    0.35    0.00
 GET api-package-search                                          1067    10(0.94%)  |   10638     200   72114   10000  |    1.49    0.01
 GET api-package-search-harvest                                   251     0(0.00%)  |    3630     604   15024    3100  |    0.35    0.00
 GET api-package-show                                             686    14(2.04%)  |    1954     242   31594    1300  |    0.96    0.02
 GET dataset                                                    20497   743(3.62%)  |    3681     175   66511    2600  |   28.62    1.04
 GET datasets-home                                                181     3(1.66%)  |   27151     232   87092   24000  |    0.25    0.00
 GET group                                                        669    18(2.69%)  |   25121     242  141853   21000  |    0.93    0.03
 GET groups-home                                                    1     0(0.00%)  |   15602   15602   15602   15602  |    0.00    0.00
 GET harvest-source                                              1463    48(3.28%)  |   12642     180   46590   12000  |    2.04    0.07
 GET home                                                         774    47(6.07%)  |   27947     197  102849   25000  |    1.08    0.07
 GET organization                                                1104    39(3.53%)  |   19044     270  109841   12000  |    1.54    0.05
 GET organizations-home                                             8    1(12.50%)  |   31321   10918   55787   28000  |    0.01    0.00
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                     27202   925(3.40%)  |    6340     175  141853    3000  |   37.98    1.29

Response time percentiles (approximated)
 Type     Name                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      api-group-list                                                    370    480    600    640   1100   1300   2100   2300   5200   5200   5200    251
 GET      api-organization-list                                             360    420    500    560    860   1100   1700   1900   6500   6500   6500    250
 GET      api-package-search                                              10000  14000  15000  16000  20000  24000  28000  31000  68000  72000  72000   1067
 GET      api-package-search-harvest                                       3100   3600   4000   4600   6000   6800   8400  11000  15000  15000  15000    251
 GET      api-package-show                                                 1300   1700   2100   2500   5000   6600   7800  11000  32000  32000  32000    686
 GET      dataset                                                          2600   3700   4700   5700   7900   9600  13000  15000  32000  37000  67000  20497
 GET      datasets-home                                                   24000  31000  37000  41000  52000  61000  68000  70000  87000  87000  87000    181
 GET      group                                                           21000  30000  35000  39000  51000  60000  70000  79000 142000 142000 142000    669
 GET      groups-home                                                     16000  16000  16000  16000  16000  16000  16000  16000  16000  16000  16000      1
 GET      harvest-source                                                  12000  15000  17000  19000  23000  28000  35000  40000  46000  47000  47000   1463
 GET      home                                                            26000  35000  39000  42000  50000  61000  73000  85000 103000 103000 103000    774
 GET      organization                                                    12000  23000  31000  35000  46000  56000  68000  79000  99000 110000 110000   1104
 GET      organizations-home                                              29000  36000  54000  54000  56000  56000  56000  56000  56000  56000  56000      8
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                       3000   4900   7000   8200  15000  25000  39000  49000  80000 110000 142000  27202

Error report
 # occurrences      Error                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------
 269                GET dataset: HTTPError('500 Server Error: Internal Server Error for url: dataset',)                 
 470                GET dataset: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET api-organization-list: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET api-group-list: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 44                 GET home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 9                  GET api-package-search: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 37                 GET organization: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 4                  GET dataset: HTTPError('502 Server Error: Proxy Error for url: dataset',)                           
 13                 GET api-package-show: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 46                 GET harvest-source: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 18                 GET group: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 2                  GET organization: HTTPError('502 Server Error: Proxy Error for url: organization',)                 
 1                  GET api-package-show: HTTPError('502 Server Error: Proxy Error for url: api-package-show',)         
 3                  GET home: HTTPError('502 Server Error: Proxy Error for url: home',)                                 
 3                  GET datasets-home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET organizations-home: HTTPError('502 Server Error: Proxy Error for url: organizations-home',)     
 1                  GET api-package-search: HTTPError('502 Server Error: Proxy Error for url: api-package-search',)     
 2                  GET harvest-source: HTTPError('502 Server Error: Proxy Error for url: harvest-source',)             
--------------------------------------------------------------------------------------------------------------------------------------------

thejuliekramer · 2021-01-28T16:41:59Z

After disabling ckanext.datagovcatalog.add_packages_tracking_info to skip the extra package_show call

thejuliekramer · 2021-01-29T17:49:46Z

Update: Currently investigating why the organization index is so slow

avdata99 · 2021-01-29T17:56:32Z

The slowest URL is /organization
I assume we expect cached responses for a fixed URL
Is cloud-front activated for this environment?
@adborden @thejuliekramer @jbrown-xentity

adborden · 2021-01-29T18:28:06Z

Currently there's no CloudFront CDN enabled for catalog-next. There will be as part of the launch.
BUT I don't think that would help. I believe CKAN 2.8 does not set meaningful cache headers, so CloudFront is going to pass most requests through to CKAN anyway. Only static assets are really benefiting from the cache today.

There were recent changes to the cache logic in ckan/ckan#4781 but I think those only landed in CKAN 2.9.

avdata99 · 2021-02-01T18:02:59Z

Same test with 50 users (instead 250)

 Name                                                          # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET api-group-list                                                50     0(0.00%)  |     233     186     564     210  |    0.06    0.00
 GET api-organization-list                                         51     0(0.00%)  |     232     183     525     210  |    0.06    0.00
 GET api-package-search                                           720     3(0.42%)  |    9952     217   32075    9800  |    0.88    0.00
 GET api-package-search-harvest                                    52     0(0.00%)  |    2143    1113    3629    2100  |    0.06    0.00
 GET api-package-show                                             796    13(1.63%)  |     480     192    5809     260  |    0.98    0.02
 GET dataset                                                    22391  1124(5.02%)  |     836     201   29319     630  |   27.45    1.38
 GET datasets-home                                                201    11(5.47%)  |    2739     355   21911    2200  |    0.25    0.01
 GET group                                                        761    47(6.18%)  |    2527     232   17857    2100  |    0.93    0.06
 GET groups-home                                                    3     0(0.00%)  |    1214     378    2672     590  |    0.00    0.00
 GET harvest-source                                              1583  245(15.48%)  |    4424     202   29987    2200  |    1.94    0.30
 GET home                                                         886    64(7.22%)  |    2928     397   27913    2400  |    1.09    0.08
 GET organization                                                1082    46(4.25%)  |    1819     231   15839    1300  |    1.33    0.06
 GET organizations-home                                            13     1(7.69%)  |    2114     322   14445     870  |    0.02    0.00
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                     28589  1554(5.44%)  |    1416     183   32075     670  |   35.05    1.91

Response time percentiles (approximated)
 Type     Name                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      api-group-list                                                    210    220    230    240    340    410    560    560    560    560    560     50
 GET      api-organization-list                                             210    220    230    240    260    420    520    530    530    530    530     51
 GET      api-package-search                                               9800  12000  14000  15000  16000  18000  20000  23000  32000  32000  32000    720
 GET      api-package-search-harvest                                       2200   2300   2400   2400   2500   2500   2700   3600   3600   3600   3600     52
 GET      api-package-show                                                  270    360    430    510    830   1200   3500   4600   5800   5800   5800    796
 GET      dataset                                                           630    750    850    940   1200   1800   3900   4700   6400  12000  29000  22391
 GET      datasets-home                                                    2200   2500   2900   3300   4200   5100   9900  12000  22000  22000  22000    201
 GET      group                                                            2100   2500   2800   3000   3900   6000   9500  12000  18000  18000  18000    761
 GET      groups-home                                                       590    590   2700   2700   2700   2700   2700   2700   2700   2700   2700      3
 GET      harvest-source                                                   2200   7500   8400   8800   9600  10000  11000  13000  26000  30000  30000   1583
 GET      home                                                             2400   2800   3200   3500   4400   5900  11000  13000  28000  28000  28000    886
 GET      organization                                                     1300   1900   2400   2700   3700   5000   8000   9600  14000  16000  16000   1082
 GET      organizations-home                                                870   1000   1700   1800   2700  14000  14000  14000  14000  14000  14000     13
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                        670    860   1100   1300   2900   5700   9700  12000  19000  29000  32000  28589

Error report
 # occurrences      Error                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------
 1122               GET dataset: HTTPError('500 Server Error: Internal Server Error for url: dataset',)                 
 46                 GET organization: HTTPError('500 Server Error: Internal Server Error for url: organization',)       
 244                GET harvest-source: HTTPError('500 Server Error: Internal Server Error for url: harvest-source',)   
 47                 GET group: HTTPError('500 Server Error: Internal Server Error for url: group',)                     
 63                 GET home: HTTPError('500 Server Error: Internal Server Error for url: home',)                       
 11                 GET datasets-home: HTTPError('500 Server Error: Internal Server Error for url: datasets-home',)     
 13                 GET api-package-show: HTTPError('500 Server Error: INTERNAL SERVER ERROR for url: api-package-show',)
 1                  GET organizations-home: HTTPError('500 Server Error: Internal Server Error for url: organizations-home',)
 3                  GET api-package-search: HTTPError('500 Server Error: INTERNAL SERVER ERROR for url: api-package-search',)
 1                  GET home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 2                  GET dataset: HTTPError('502 Server Error: Proxy Error for url: dataset',)                           
 1                  GET harvest-source: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
--------------------------------------------------------------------------------------------------------------------------------------------

adborden · 2021-02-01T19:23:26Z

Here are some things to try:

Check the application logs for errors, why so many 500s?
Check solr hosts: resources, errors, bad queries?
Apply the load to catalog-classic to make sure we're comparing apples to apples.

thejuliekramer · 2021-02-01T23:01:38Z

Created Solr Performance Issues in Catalog-Next #560

avdata99 · 2021-02-02T16:01:04Z

Test with 250 users after solr optimization

 Name                                                          # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET api-group-list                                               250     1(0.40%)  |     748     196    8432     350  |    0.37    0.00
 GET api-organization-list                                        250     2(0.80%)  |    1013     203   11480     370  |    0.37    0.00
 GET api-package-search                                          1059    10(0.94%)  |   10224     186   68404   10000  |    1.58    0.01
 GET api-package-search-harvest                                   250     3(1.20%)  |    3292     259   12035    2900  |    0.37    0.00
 GET api-package-show                                             644    20(3.11%)  |    1857     213   11020    1200  |    0.96    0.03
 GET dataset                                                    20930   781(3.73%)  |    3396     180   66257    2400  |   31.20    1.16
 GET datasets-home                                                202     5(2.48%)  |   26714     229   83100   25000  |    0.30    0.01
 GET group                                                        669    22(3.29%)  |   22115     191   88815   18000  |    1.00    0.03
 GET groups-home                                                    5    1(20.00%)  |    7394     236   19698    5000  |    0.01    0.00
 GET harvest-source                                              1483    26(1.75%)  |   10994     185   41743   11000  |    2.21    0.04
 GET home                                                         835    46(5.51%)  |   25250     215   85680   22000  |    1.24    0.07
 GET organization                                                1104    29(2.63%)  |   17804     188   78928   10000  |    1.65    0.04
 GET organizations-home                                             5    1(20.00%)  |   25348     572   58790   20000  |    0.01    0.00
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                     27686   947(3.42%)  |    5843     180   88815    2800  |   41.27    1.41

Response time percentiles (approximated)
 Type     Name                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      api-group-list                                                    360    460    570    700   1200   2900   7100   8300   8400   8400   8400    250
 GET      api-organization-list                                             370    520    740    930   2800   5400   6800   7000  11000  11000  11000    250
 GET      api-package-search                                              10000  13000  15000  16000  18000  21000  24000  29000  68000  68000  68000   1059
 GET      api-package-search-harvest                                       2900   3300   3700   4000   6100   6900   9200   9900  12000  12000  12000    250
 GET      api-package-show                                                 1200   1800   2200   2700   5200   6100   7000   7700  11000  11000  11000    644
 GET      dataset                                                          2400   3500   4600   5400   7500   9100  11000  13000  18000  23000  66000  20930
 GET      datasets-home                                                   25000  33000  37000  41000  50000  60000  69000  76000  83000  83000  83000    202
 GET      group                                                           18000  26000  32000  36000  46000  54000  66000  69000  89000  89000  89000    669
 GET      groups-home                                                      5000  11000  11000  20000  20000  20000  20000  20000  20000  20000  20000      5
 GET      harvest-source                                                  11000  14000  15000  17000  20000  24000  28000  31000  41000  42000  42000   1483
 GET      home                                                            22000  32000  38000  41000  50000  60000  68000  72000  86000  86000  86000    835
 GET      organization                                                    10000  21000  29000  33000  47000  57000  65000  70000  78000  79000  79000   1104
 GET      organizations-home                                              20000  40000  40000  59000  59000  59000  59000  59000  59000  59000  59000      5
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                       2800   4600   6500   7700  13000  21000  38000  49000  71000  86000  89000  27686

Error report
 # occurrences      Error                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------
 296                GET dataset: HTTPError('500 Server Error: Internal Server Error for url: dataset',)                 
 3                  GET api-package-search-harvest: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 479                GET dataset: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 19                 GET api-package-show: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 2                  GET api-organization-list: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 26                 GET harvest-source: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 19                 GET group: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 39                 GET home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 28                 GET organization: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET organizations-home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET api-group-list: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 3                  GET home: HTTPError('502 Server Error: Proxy Error for url: home',)                                 
 10                 GET api-package-search: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 5                  GET dataset: HTTPError('502 Server Error: Proxy Error for url: dataset',)                           
 4                  GET home: HTTPError('500 Server Error: Internal Server Error for url: home',)                       
 2                  GET datasets-home: HTTPError('500 Server Error: Internal Server Error for url: datasets-home',)     
 3                  GET datasets-home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET dataset: HTTPError('404 Client Error: Not Found for url: dataset',)                             
 3                  GET group: HTTPError('500 Server Error: Internal Server Error for url: group',)                     
 1                  GET groups-home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET organization: HTTPError('502 Server Error: Proxy Error for url: organization',)                 
 1                  GET api-package-show: HTTPError('502 Server Error: Proxy Error for url: api-package-show',)         
--------------------------------------------------------------------------------------------------------------------------------------------

avdata99 · 2021-02-02T20:58:15Z

Test with 75 users and static assets

[2021-02-02 17:57:03,563] victor-pc/INFO/locust.runners: Stopping 75 users
[2021-02-02 17:57:03,697] victor-pc/INFO/locust.runners: 75 Users have been stopped, 0 still running
 Name                                                          # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET api-group-list                                                79     0(0.00%)  |     237     185     894     210  |    0.04    0.00
 GET api-organization-list                                         80     0(0.00%)  |     245     188     863     220  |    0.04    0.00
 GET api-package-search                                         12847     9(0.07%)  |    1017     188   64962     950  |    5.70    0.00
 GET api-package-search-harvest                                    77     0(0.00%)  |     365     200    1440     380  |    0.03    0.00
 GET api-package-show                                            7173     7(0.10%)  |     392     198    2984     290  |    3.18    0.00
 GET dataset                                                     2534    27(1.07%)  |     775     242    3219     660  |    1.12    0.01
 GET dataset_search                                              4567   142(3.11%)  |    3963     556   14044    3600  |    2.03    0.06
 GET datasets-home                                               2070    62(3.00%)  |    3645     183   13323    3200  |    0.92    0.03
 GET group                                                       7620   268(3.52%)  |    3123     246   15864    2700  |    3.38    0.12
 GET groups-home                                                   21     0(0.00%)  |    1064     373    3344    1100  |    0.01    0.00
 GET harvest-source                                             16325   272(1.67%)  |    1351     220    6417    1200  |    7.24    0.12
 GET harvest-sources-home                                           5     0(0.00%)  |     974     527    1501     960  |    0.00    0.00
 GET home                                                        9135   304(3.33%)  |    3974     202   13029    3600  |    4.05    0.13
 GET organization                                               11754   275(2.34%)  |    2364     213   69222    1900  |    5.21    0.12
 GET organizations-home                                            89     0(0.00%)  |    2209     628    7802    1800  |    0.04    0.00
 GET static_assets                                              30627     7(0.02%)  |     485     174    4014     240  |   13.58    0.00
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                    105003  1373(1.31%)  |    1605     174   69222     970  |   46.57    0.61

Response time percentiles (approximated)
 Type     Name                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      api-group-list                                                    210    230    240    250    260    330    540    890    890    890    890     79
 GET      api-organization-list                                             220    230    240    260    280    400    800    860    860    860    860     80
 GET      api-package-search                                                950   1000   1100   1100   1400   1700   2100   2300   3000   4100  65000  12847
 GET      api-package-search-harvest                                        380    390    410    420    530    620    950   1400   1400   1400   1400     77
 GET      api-package-show                                                  290    340    400    450    730    970   1400   1600   2400   3000   3000   7173
 GET      dataset                                                           660    780    890    960   1200   1600   1900   2200   2900   3200   3200   2534
 GET      dataset_search                                                   3600   4300   4900   5300   6300   7300   8400   9300  12000  14000  14000   4567
 GET      datasets-home                                                    3200   4000   4500   4900   6000   7100   8400   9000  12000  13000  13000   2070
 GET      group                                                            2700   3400   3900   4200   5300   6400   8000   8800  12000  16000  16000   7620
 GET      groups-home                                                      1100   1200   1300   1300   1400   1700   3300   3300   3300   3300   3300     21
 GET      harvest-source                                                   1200   1500   1700   1900   2300   2700   3200   3600   4700   6300   6400  16325
 GET      harvest-sources-home                                              960   1000   1000   1500   1500   1500   1500   1500   1500   1500   1500      5
 GET      home                                                             3600   4400   4900   5300   6500   7400   8700   9600  12000  13000  13000   9135
 GET      organization                                                     1900   2700   3400   3800   5000   6200   7500   8600  12000  15000  69000  11754
 GET      organizations-home                                               1800   2700   2900   3300   4000   5800   7200   7800   7800   7800   7800     89
 GET      static_assets                                                     240    370    550    640   1400   1600   1800   2000   3000   3500   4000  30627
--------|------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                        970   1600   2200   2600   4000   5200   6600   7600  11000  14000  69000 105003

Error report
 # occurrences      Error                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------
 1                  GET datasets-home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 61                 GET datasets-home: HTTPError('500 Server Error: Internal Server Error for url: datasets-home',)     
 3                  GET group: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 6                  GET static_assets: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 4                  GET api-package-search: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 3                  GET home: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 272                GET organization: HTTPError('500 Server Error: Internal Server Error for url: organization',)       
 142                GET dataset_search: HTTPError('500 Server Error: Internal Server Error for url: dataset_search',)   
 301                GET home: HTTPError('500 Server Error: Internal Server Error for url: home',)                       
 269                GET harvest-source: HTTPError('500 Server Error: Internal Server Error for url: harvest-source',)   
 27                 GET dataset: HTTPError('500 Server Error: Internal Server Error for url: dataset',)                 
 265                GET group: HTTPError('500 Server Error: Internal Server Error for url: group',)                     
 7                  GET api-package-show: HTTPError('500 Server Error: INTERNAL SERVER ERROR for url: api-package-show',)
 3                  GET organization: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 5                  GET api-package-search: HTTPError('500 Server Error: INTERNAL SERVER ERROR for url: api-package-search',)
 3                  GET harvest-source: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
 1                  GET static_assets: HTTPError('502 Server Error: Proxy Error for url: static_assets',)               
--------------------------------------------------------------------------------------------------------------------------------------------

adborden mentioned this issue Sep 24, 2020

catalog-next minimum viable product launch #400

Closed

15 tasks

ghost assigned woodt and CodeShtuff Oct 20, 2020

thejuliekramer mentioned this issue Oct 30, 2020

Test/validate catalog-next launch plan in sandbox #401

Closed

2 tasks

ghost unassigned CodeShtuff Oct 30, 2020

kimwdavidson unassigned woodt Nov 6, 2020

avdata99 mentioned this issue Nov 12, 2020

Load testing tool GSA/catalog.data.gov#167

Closed

avdata99 assigned avdata99 and thejuliekramer Nov 12, 2020

jbrown-xentity mentioned this issue Nov 18, 2020

SQLAlchemy connection pool overflow and TimeoutError on catalog-next #513

Open

This was referenced Jan 20, 2021

Advanced test GSA/datagov-load-testing#1

Merged

Probably a solr host definition error GSA/data.gov#2680

Closed

adborden mentioned this issue Jan 29, 2021

[timebox: 5d] Catalog load- and stress-testing in cloud.gov GSA/data.gov#2701

Closed

11 tasks

avdata99 mentioned this issue Feb 1, 2021

Add Cache-Control max-age to catalog-next GSA/data.gov#2707

Closed

thejuliekramer mentioned this issue Feb 1, 2021

Solr Performance Issues in Catalog-Next #560

Open

ghost unassigned avdata99 and thejuliekramer Feb 2, 2021

avdata99 assigned adborden Feb 2, 2021

mogul added this to the Sprint 20210218 milestone Feb 18, 2021

mogul closed this as completed Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load testing for catalog-next #449

Load testing for catalog-next #449

adborden commented Sep 24, 2020 •

edited by avdata99

Loading

adborden commented Sep 24, 2020

avdata99 commented Sep 24, 2020

kimwdavidson commented Sep 28, 2020

CodeShtuff commented Oct 20, 2020

adborden commented Oct 21, 2020

CodeShtuff commented Oct 22, 2020

CodeShtuff commented Oct 28, 2020

adborden commented Oct 31, 2020

adborden commented Nov 16, 2020

avdata99 commented Nov 16, 2020

avdata99 commented Jan 19, 2021 •

edited

Loading

avdata99 commented Jan 28, 2021 •

edited

Loading

thejuliekramer commented Jan 28, 2021 •

edited

Loading

thejuliekramer commented Jan 29, 2021

avdata99 commented Jan 29, 2021

adborden commented Jan 29, 2021 •

edited

Loading

avdata99 commented Feb 1, 2021

adborden commented Feb 1, 2021

thejuliekramer commented Feb 1, 2021

avdata99 commented Feb 2, 2021

avdata99 commented Feb 2, 2021

Load testing for catalog-next #449

Load testing for catalog-next #449

Comments

adborden commented Sep 24, 2020 • edited by avdata99 Loading

User Story

Acceptance Criteria

Details / tasks

Notes

adborden commented Sep 24, 2020

avdata99 commented Sep 24, 2020

kimwdavidson commented Sep 28, 2020

CodeShtuff commented Oct 20, 2020

adborden commented Oct 21, 2020

CodeShtuff commented Oct 22, 2020

CodeShtuff commented Oct 28, 2020

adborden commented Oct 31, 2020

adborden commented Nov 16, 2020

avdata99 commented Nov 16, 2020

avdata99 commented Jan 19, 2021 • edited Loading

avdata99 commented Jan 28, 2021 • edited Loading

thejuliekramer commented Jan 28, 2021 • edited Loading

thejuliekramer commented Jan 29, 2021

avdata99 commented Jan 29, 2021

adborden commented Jan 29, 2021 • edited Loading

avdata99 commented Feb 1, 2021

adborden commented Feb 1, 2021

thejuliekramer commented Feb 1, 2021

avdata99 commented Feb 2, 2021

avdata99 commented Feb 2, 2021

adborden commented Sep 24, 2020 •

edited by avdata99

Loading

avdata99 commented Jan 19, 2021 •

edited

Loading

avdata99 commented Jan 28, 2021 •

edited

Loading

thejuliekramer commented Jan 28, 2021 •

edited

Loading

adborden commented Jan 29, 2021 •

edited

Loading