Memory leak when running from python #560

MoshePerez · 2022-03-24T18:43:09Z

It looks like memory is not being cleared after each scan. here is a short snippet to reproduce:

import os
import gc
import psutil
from sslyze import Scanner, ServerScanRequest, ServerNetworkLocation


def print_memory_used(msg):
    object_count = len(gc.get_objects())
    process = psutil.Process(os.getpid())
    memory_used = process.memory_info().rss
    print(f"{msg}: object count: {object_count}, memory used: {memory_used / 1024**2} MB")


def sslyze_scan(hostname, port):
    request = ServerScanRequest(ServerNetworkLocation(hostname=hostname, port=port))
    scanner = Scanner()
    scanner.queue_scans([request])
    results = list(scanner.get_results())


for i in range(1,5):
    print_memory_used(f"before run {i}")
    sslyze_scan("mozilla.com", 443)
    print_memory_used(f"after run {i}")

Output:

before run 1: object count: 37141, memory used: 37.79296875 MB
after run 1: object count: 46649, memory used: 56.79296875 MB
before run 2: object count: 46649, memory used: 56.79296875 MB
after run 2: object count: 53465, memory used: 67.953125 MB
before run 3: object count: 53465, memory used: 67.953125 MB
after run 3: object count: 60282, memory used: 78.890625 MB
before run 4: object count: 60282, memory used: 78.890625 MB
after run 4: object count: 46497, memory used: 84.875 MB

I also tried adding gc.collect() after each call, it helped reduce the object count but memory used still increased.
Am I missing something and using the scanner wrong or is it a memory leak?

OS: Ubuntu 20.04
Python version: 3.10.4
sslyze version: 5.0.3
nassl version: 4.0.2

The text was updated successfully, but these errors were encountered:

nabla-c0d3 · 2022-04-03T19:57:52Z

I'm not sure... a memory leak is possible.

As a workaround I would suggest having a long-running process that spawns a Python subprocess to run batches of scans/domains (using a single Scanner()), and then shuts down the subprocess once the scans are complete.,

lcurtis-datto · 2022-04-04T15:26:54Z

Hello there - I ran into the same issue and thought it may have been our wrapper code used around SSLyze, but I set up a test script using the example code and see large amounts of memory being consumed. I have a list of 30 random websites I am using in 'myhosts.txt'. I tried using scanner.get_results() strictly as a generator function, but there does not seem to be a difference.
tester.py.txt

Output:

Line # Mem usage Increment Occurrences Line Contents

34     37.4 MiB     37.4 MiB           1   @profile
35                                         def main():   
36     37.4 MiB      0.0 MiB           1       scan_results = get_scans()
37     65.2 MiB     23.5 MiB          28       for server_scan_result in scan_results:
38     58.0 MiB      0.3 MiB          27           print(f"\n\n****Results for {server_scan_result.server_location.hostname}****")
39     60.5 MiB      1.8 MiB          27           certinfo_attempt = server_scan_result.scan_result.certificate_info
40     60.8 MiB      0.3 MiB          27           print(certinfo_attempt)

Ubuntu: 20.04
Python: 3.8.10
ssylze: 5.0.3
nassl: 4.0.2

It's happening somewhere between scanner.queue_scans() and scanner.get_results()

@nabla-c0d3 - Thanks so much for all of your work on this amazing library. It has inspired me to dive deeper into Python. I will try digging into the code, but it is certainly pretty complex Hopefully a solution can be found. .

sconway-datto · 2022-04-05T20:51:51Z

I think I've isolated the issue to mass_scanner's _generate_result_for_completed_server_scan function, with the help of memory-profiler.

Specifically, the lines where the finished scan_command's plugin is called to produce a result.
https://github.com/nabla-c0d3/sslyze/blob/release/sslyze/scanner/_mass_scanner.py#L265
https://github.com/nabla-c0d3/sslyze/blob/release/sslyze/scanner/_mass_scanner.py#L273

(and the attempts that follow in the below except blocks)

I'm going to keep poking at this and hopefully figure out what's occurring. For my testing, I've been repeating the CERTIFICATE_INFO scan command on the same host and comparing the memory footprint with psutil.

~~I still have yet to understand why the amount of memory leaked differs so wildly between the first few runs.~~

Interestingly enough, I noticed that if cert-info runs successfully on my host, I see increases of 9MB, 2MB, and then several below .05 MB. However, if I force an exception here, I only see an initial gain of 2MB, followed by gains below .05 MB. I thought this might be valuable information.

edit: I didn't consider imports for the first execution, but that still doesn't explain the following 2-3 executions taking a decent amount of memory.

sconway-datto · 2022-04-06T21:36:41Z

I'm getting closer to (one of) the issues. I've noticed that over the course of 50 executions with the CERTIFICATE_INFO scan command, having all trust stores enabled (as it is currently non-configurable) when evaluating trust stores leads to an average leak per execution (ignoring imports from the first run) of roughly 22 KB. When I manually intervened and disabled all trust stores, the average leak per execution is roughly 4.6 KB.

The next step will be further investigating _verify_certificate_chain, called here.

nabla-c0d3 · 2022-04-24T08:49:21Z

I used the test script from @MoshePerez and https://github.com/bloomberg/memray , and can confirm that the leak is in _verify_certificate_chain().
It's probably this line of code but I haven't had the time to fix it: https://github.com/nabla-c0d3/nassl/blob/release/nassl/_nassl/nassl_X509_STORE_CTX.c#L118

Instead, I pushed a workaround in #563 . Can you try it and confirm that it fixes the problem? Thanks!

sconway-datto · 2022-04-25T19:29:06Z

@nabla-c0d3 using a similar test script as @MoshePerez, your PR seems to eliminate the leak. I poked at nassl a bit more, and found that set0_trusted_stack and set0_untrusted appear to leak, even when given empty lists as the sole argument. Please let me know if I should open an issue in nassl for this.

Here's a simple test to demonstrate the leaks. I was having issues getting reliable data from psutil with this. With it running continuously with a sleep statement between tests, you'll be able to see the leak in any process monitor.

#!/usr/bin/env python3

import gc
import os

import psutil
from nassl._nassl import X509, X509_STORE_CTX
#from nassl.cert_chain_verifier import CertificateChainVerifier
from time import sleep


def get_mem_used():
    process = psutil.Process(os.getpid())
    memory_used = process.memory_info().rss
    return memory_used  # / (1024**2)


def print_memory_used(msg):
    object_count = len(gc.get_objects())
    process = psutil.Process(os.getpid())
    memory_used = process.memory_info().rss
    print(
        f"{msg}: object count: {object_count}, memory used: {memory_used / 1024**2} MB"
    )


def nassl_test(trusted_certs, chain_to_evaluate):
    store_ctx = X509_STORE_CTX()    # good
    store_ctx.set0_untrusted(chain_to_evaluate)  # leaky!
    store_ctx.set0_trusted_stack(trusted_certs)  # leaky!
    #store_ctx.set_cert(chain_to_evaluate[0])  # good
    #result = X509.verify_cert(store_ctx)  # good?


def main():
    trusted_certs = list()
    chain_to_evaluate = list()
    #start_mem = get_mem_used()
    #end_mem = get_mem_used()
    #init_mem = get_mem_used()

    while True:
        #start_mem = get_mem_used()
        nassl_test(trusted_certs, chain_to_evaluate)
        sleep(.00000001)
        #end_mem = get_mem_used()
        #print(f"memory diff: {end_mem - start_mem}")

    #print(end_mem - init_mem)

if __name__ == "__main__":
    main()

hmilkovi · 2022-04-28T09:14:48Z

I can confirm memory leak but not in sslyze it's inside https://github.com/nabla-c0d3/nassl

nabla-c0d3 · 2022-04-30T10:05:07Z

@sconway-datto Please go ahead and open an issue in nassl for this, which references this issue. Thanks!

[#560] Implement workaround for memory leak in CertificateChainVerifier

nabla-c0d3 · 2022-04-30T12:36:30Z

Fixed as part of v5.0.4.

nabla-c0d3 added a commit that referenced this issue Apr 24, 2022

[#560] Implement workaround for memory leak in CertificateChainVerifier

1f33eca

nabla-c0d3 added a commit that referenced this issue Apr 30, 2022

Merge pull request #563 from nabla-c0d3/#560-workaround-memory-leak

ad64c31

[#560] Implement workaround for memory leak in CertificateChainVerifier

nabla-c0d3 closed this as completed Apr 30, 2022

sconway-datto mentioned this issue May 1, 2022

Memory leak in set0_trusted_stack and set0_untrusted nabla-c0d3/nassl#90

Open

nabla-c0d3 mentioned this issue May 14, 2022

Switch to pyOpenSSL for validating certificate chain #566

Closed

lcurtis-datto mentioned this issue May 31, 2023

Memory Leak still seems to be present when running in Python #603

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when running from python #560

Memory leak when running from python #560

MoshePerez commented Mar 24, 2022 •

edited

Loading

nabla-c0d3 commented Apr 3, 2022

lcurtis-datto commented Apr 4, 2022

sconway-datto commented Apr 5, 2022 •

edited

Loading

sconway-datto commented Apr 6, 2022

nabla-c0d3 commented Apr 24, 2022

sconway-datto commented Apr 25, 2022

hmilkovi commented Apr 28, 2022

nabla-c0d3 commented Apr 30, 2022

nabla-c0d3 commented Apr 30, 2022

Memory leak when running from python #560

Memory leak when running from python #560

Comments

MoshePerez commented Mar 24, 2022 • edited Loading

nabla-c0d3 commented Apr 3, 2022

lcurtis-datto commented Apr 4, 2022

sconway-datto commented Apr 5, 2022 • edited Loading

sconway-datto commented Apr 6, 2022

nabla-c0d3 commented Apr 24, 2022

sconway-datto commented Apr 25, 2022

hmilkovi commented Apr 28, 2022

nabla-c0d3 commented Apr 30, 2022

nabla-c0d3 commented Apr 30, 2022

MoshePerez commented Mar 24, 2022 •

edited

Loading

sconway-datto commented Apr 5, 2022 •

edited

Loading