Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak in Field #1190

Closed
pedromb opened this issue Mar 2, 2022 · 3 comments
Closed

Potential memory leak in Field #1190

pedromb opened this issue Mar 2, 2022 · 3 comments
Labels
security Pull requests that address a security vulnerability

Comments

@pedromb
Copy link

pedromb commented Mar 2, 2022

Bug report

What's wrong

When creating a Field inside a loop the memory doesn't seem to be released after the loop completes. Not sure if this is expected behaviour, but I would assume not.

Here is a reproducible example to highlight, using memory_profiler to check for memory usage and matplotlib for plotting the memory usage graph..

from mimesis import Field
from memory_profiler import memory_usage
import matplotlib.pyplot as plt

def schema(f):
    return {
        "id": f("uuid"),
        "name": f("full_name"),
        "email": f("person.email"),
        "timestamp": f("timestamp", posix=False),
        "car_model": f("car"),
        "address": {"full_address": f("address"), "city": f("city"), "zip_code": f("zip_code")},
    }

def test_field_inside_loop():
    for _ in range(1000):
        f = Field("en")
        schema(f)
    
def test_field_outside_loop():
    f = Field("en")
    for _ in range(1000):
        schema(f)

def plot(function_to_plot):
    mem_usage = memory_usage((function_to_plot), interval=.01)
    plt.plot(mem_usage)
    plt.show()

plot(test_field_outside_loop)
plot(test_field_inside_loop)

Memory keeps piling up when Field is created inside the loop.

How is that should be

Memory should be released after the field is used within a loop.

System information

macOS Monterey 12.2.1 M1, 2020
Python 3.8.8
mimesis 5.1.0

Pretty sure it doesn't depend on the system, had same issue on a Linux (Ubuntu 20.04) machine.

@lk-geimfari lk-geimfari added the security Pull requests that address a security vulnerability label Mar 28, 2022
@vbuaraujo
Copy link

From looking at the code, it seems that the problem is the usage of functools.lru_cache in https://github.com/lk-geimfari/mimesis/blob/master/mimesis/providers/base.py#L126. If I understand correctly, this is meant to load the JSON data only once, but the problem is that the cache is keyed on all function arguments, including self, which will be different for every instance. So this cache doesn't really do anything useful (each instance will load the data again), and also keeps the data from every instance around indefinitely.

A quick solution is to just remove lru_cache from this function. It won't cache anything, but the current cache is not useful anyway, and it will free memory after usage.

A possibly better solution is to move all parts of that function that don't depend on self to a staticmethod and cache the staticmethod instead.

@lk-geimfari
Copy link
Owner

@vbuaraujo Correct, but instancing in the loop is absolutely unnecessary, so I'm not sure that we should do something about this problem.

@lk-geimfari
Copy link
Owner

Well, I've fixed this issue. I've removed lru_cache since it works even faster without it. I'll release 6.0.0 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security Pull requests that address a security vulnerability
Projects
None yet
Development

No branches or pull requests

3 participants