Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with bokeh serve and datashader #2111

Closed
jgkatz opened this issue Nov 10, 2017 · 8 comments
Closed

Memory leak with bokeh serve and datashader #2111

jgkatz opened this issue Nov 10, 2017 · 8 comments
Assignees
Labels
type: bug Something isn't correct or isn't working
Milestone

Comments

@jgkatz
Copy link

jgkatz commented Nov 10, 2017

I believe I am encountering a memory leak when using datashader in combination with a 'bokeh serve' style application.

Here is a simplified case that will reproduce the issue:

import datashader as ds
import numpy as np
import holoviews as hv
from itertools import cycle
from holoviews.operation.datashader import datashade
from bokeh.plotting import curdoc
import logging

log = logging.getLogger(__name__)

def setup_doc():
    log.info('Loading module.')
    cat = cycle(range(22))

    # Generate a bunch of data
    module_data = [
        {
            cat.next(): hv.Curve((np.linspace(0, 5000, 500000), np.random.normal(np.ones(500000))))
            for i in range(514)
        }
    ]

    # Use datashader and create a layout
    hv_layout = []
    for item in module_data:
        hv_layout.append(
            datashade(hv.NdOverlay(item, kdims='k'), aggregator=ds.count_cat('k'))
            .opts(plot=dict(width=800))
        )
    final_layout = hv.Layout(hv_layout).cols(1)
    log.info('Done loading module.')

    # Create plot
    plot = hv.renderer('bokeh').instance(mode='server').get_plot(final_layout)
    doc = curdoc()
    doc.add_root(plot.state)

setup_doc()

Add this to a folder test and run:

mprof run bokeh serve test

then try opening up a tab or two. Close the tabs and wait a minute or so for the sessions to be destroyed.

mprof plot

Afterward it appears that memory is not reclaimed.

Though it is hard to do with non-datashaded lines if you run the same experiment (with fewer datapoints) and a normal overlay, it appears the memory is reclaimed. I tried to dig into the persistent objects in the on_session_destroyed hook, and it seems at least the datashader object is not garbage collected when the session is destroyed. Not sure what else might be hanging around.

Am I missing something in the implementation?

@jlstevens jlstevens added this to the v1.10 milestone Nov 10, 2017
@philippjfr philippjfr self-assigned this Mar 5, 2018
@philippjfr
Copy link
Member

Apologies I never responded here. I haven't investigated but I have a few thoughts on what might be hanging around, stream callbacks being the most likely candidate. I think this is a priority to fix before 1.10.0.

@jgkatz
Copy link
Author

jgkatz commented Mar 6, 2018

Great to hear. Looking forward to a potential fix.

@philippjfr philippjfr added the type: bug Something isn't correct or isn't working label Mar 19, 2018
@philippjfr philippjfr modified the milestones: v1.10, v1.10.x Apr 17, 2018
@popher
Copy link

popher commented Sep 22, 2018

Just following up on this, I am having the same problems.
I'm using bokeh server to serve an app which uses HoloViews, Pandas, Bokeh and Datashader. It loads a ~100MB file (x,y data) to plot. However, the memory cost (~100MB per session) does not appear to be reclaimed after session end and very quickly the server runs out of memory.

I made a mre which has the same bug (but smaller footprint as I generate random data rather than read in such a large amount of data).

import numpy as np
import pandas as pd

# Generate some data
mz = np.linspace(100,1000,int(1e6))
I = np.random.randint(0,int(1e6),int(1e6))
df = pd.DataFrame(columns=['mz','I'])
df['mz'] = mz
df['I'] = I
df.head()

# import plotting libraries
import matplotlib as mpl
mpl.use('Agg') # otherwise pyplot throws a 'no qt backend' error
import matplotlib.pyplot as plt

import holoviews as hv
import hvplot.pandas
import datashader as ds
from holoviews.operation.datashader import datashade, dynspread

# specify parameters
hv.extension('bokeh', 'matplotlib', width="100")
dynspread.max_px=1
dynspread.threshold=0.25

# generate holoviews plots
msfull = df.hvplot(x='mz',y='I')
msshow = dynspread(datashade(msfull,cmap=['blue']).opts(plot=dict(width=1024, height=800)))

# add to bokeh document for serving
doc = hv.renderer('bokeh').server_doc(msshow)
doc.title = 'HoloViews Bokeh App'

Served with bokeh serve --show myapp.py
Same issue running on Windows 10 or Ubuntu 18.04 LTS, Python 3.6.6 or Python 3.7. All other packages required are up-to-date.

@philippjfr
Copy link
Member

Thank you, I recently merged a change into bokeh that will let us clean up after a session is killed more easily. I'll use your example to check whether that is enough to address these issues.

@philippjfr
Copy link
Member

Thank you both for your examples, after a marathon debugging session I've finally addressed the issue.

@afzalmushtaque
Copy link

I am facing the same memory leak problem with panel serve using holoviews with datashading. Should I open a new issue for it?

@ceball
Copy link
Member

ceball commented Nov 26, 2019

Sure, but please include detailed information about your environment (versions of packages, how installed, platform, etc) and how to reproduce (e.g. one of the scripts above).

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug Something isn't correct or isn't working
Projects
None yet
Development

No branches or pull requests

6 participants