Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument JupyterHub to record events with jupyter_telemetry [Part II] #2698

Merged
merged 16 commits into from
Sep 24, 2019

Conversation

Zsailer
Copy link
Member

@Zsailer Zsailer commented Aug 22, 2019

This adds documentation and tests to @yuvipanda's PR in #2542.

The original instruments JupyterHub to record events about user server starts & stops using the jupyter_telemetry library.

yuvipanda and others added 10 commits August 22, 2019 11:05
- Introduce the EventLog class from BinderHub for emitting
  structured event data
- Instrument server starts and stops to emit events
- Defaults to not saving any events anywhere
They're pure python, and should be ok
jupyterhub still supports Python 3.5
Full circle, since the code in jupyter_telemetry
came from here: jupyter/telemetry#6
Move it to YAML, since jupyter_telemetry supports these
natively.
@consideRatio
Copy link
Member

consideRatio commented Aug 22, 2019

@Zsailer and @yuvipanda! I'm just dropping in to say that I think this is very exciting work! I'll be very happy to see this in place! ❤️ 🎉

@Zsailer
Copy link
Member Author

Zsailer commented Aug 23, 2019

I'm actually puzzled by these failing tests. They pass locally, so I'm struggling to identify why they fail on Travis.

I'm investigating further, but if anyone has immediate insights I'd really appreciate it! 😃

@consideRatio
Copy link
Member

consideRatio commented Aug 23, 2019

@Zsailer, I'm fairly confident on the following:

  • the output variable is a blank string.
  • json.loads(output) returns an error from getting a blank string.
    @pytest.mark.parametrize('schema, version, event', valid_events)
    def test_valid_events(get_hub_and_sink, schema, version, event):
        hub, sink = get_hub_and_sink(schema)
        # Record event.
        hub.eventlog.record_event(schema, version, event)
        # Inspect consumed event.
        output = sink.getvalue()
        data = json.loads(output)

The question then becomes, why is sink.getvalue() returning a blank string?

Verification of my hypothesis

The error seen in travis, actually matches exactly what I get if I try to invoke json.loads with a blank string.

Me reproducing the error

image

The travis logs

image

@Zsailer
Copy link
Member Author

Zsailer commented Aug 23, 2019

Thanks for investigating, @consideRatio 😃

Yes, I know that json.loads returns an error when the string is blank. The weird thing is that I don't get a blank string when I run the tests locally. I get the event data, just as you'd expect/hope. On Travis, I get a blank string. I need to track down why I'm not getting the event data string on Travis 😕

@Zsailer
Copy link
Member Author

Zsailer commented Sep 11, 2019

Alright, I'm really stumped here. This is really weird.

The tests I've written pass locally but fail on CI. This makes it extremely difficult to debug.

I think it's an issue with MockHub instances (good ol' traitlets troubles).

The test:

  • creates a StringIO instance for collecting telemetry data
  • creates a logging handler to route event data to the StringIO.
  • creates an MockHub instance with the logging handler attached
  • records an event.
  • asserts the event data was written to the StringIO stream

Locally, the StringIO buffer contains the event data. On CI, the StringIO buffer is empty and fails the test.

What's crazy is that I could remove other tests in other files and this test would pass.

So my hunch is that MockHub instances from other tests were not cleared and the logging handlers + stringio were never properly configured in the MockHub. I started addressing each case where the MockHub instances were not cleared, adding other tests back in one at a time. It worked for awhile but eventually failed again.

Does anyone have insight on this issue? Has anyone else run into issues where jupyterhub tests affect each other?

@Zsailer Zsailer requested a review from minrk September 11, 2019 17:32
re-run init_eventlog to ensure event logging is hooked up
@minrk
Copy link
Member

minrk commented Sep 19, 2019

Sorry for leaving you hanging. I'm digging into this one today. First, instead of using .instance(), you can leave that alone and use the existing app fixture to get a handle on the configured JupyterHub Application object. To modify the configuration after-the-fact, you can try taking the instantiated object and calling init_events again, e.g.

@fixture
def event_sink(app):
    sink = io.StringIO()
    handler = logging.StreamHandler(sink)
    cfg = app.config
    cfg.EventLog.handlers = [handler]
    cfg.EventLog.allowed_schemas = [schema]
    # re-initialize EventLog object to register events
    app.init_events()
    yield sink

The reason it's failing on CI and not for you is likely that CI is running all the tests and you are only testing the one file. I can reproduce the failure by running all the tests, which means it's presumably an un-cleared instance or some other state from one test to the next. I see you tried to make sure this didn't happen with .clear_instance(), but I'm guessing there was just one that was missed, or perhaps another bit of state leaking that's not the global instance.

causes some weird behavior, such as event log not working
@minrk
Copy link
Member

minrk commented Sep 19, 2019

Found the state pollution, and it was weird! alembic, the database migration tool we use, configures logging when it runs (perhaps it shouldn't?). This has disable_existing_loggers=True by default, which disables all loggers when it runs. Since EventLogger is a logger, this gets disabled, but that's the wrong thing to do.

It was in test_app.py that the relevant alembic code was being triggered, btw, so the failure could be reproduced with pytest jupyterhub/tests/test_app.py jupyterhub/tests/test_eventlog.py. I found it by running pairs of test files one by one, e.g.

pytest -vx jupyterhub/tests/test_api.py jupyterhub/tests/test_eventlog.py  # success
pytest -vx jupyterhub/tests/test_app.py jupyterhub/tests/test_eventlog.py # failure, aha! One of these tests
pytest -vx jupyterhub/tests/test_app.py::test_resume_spawners jupyterhub/tests/test_eventlog.py # failure, this is the one!

I've pushed a couple commits simplifying the retrieval of the eventlog and reverting the .instance() bookkeeping that was unneeded.

@yuvipanda
Copy link
Contributor

AWESOME! <3

@Zsailer
Copy link
Member Author

Zsailer commented Sep 19, 2019

@minrk you are my hero.

But seriously, thank you so much!

It was the test_app.py and test_eventlog.py combination that broke tests for me. I just cleared all Mockhub instances and eventually got all tests to pass locally, but the still failed on CI.

Just for laughs, check out the various attempts I took at solving this issue on my other branch 🤣

@yuvipanda
Copy link
Contributor

Does this mean this is ready to be merged?

@Zsailer
Copy link
Member Author

Zsailer commented Sep 19, 2019

I've got it in the JupyterHub/Binder team meeting agenda for today :)

To begin recording events, you'll need to set two configurations:

1. ``handlers``: tells the EventLog *where* to route your events. This trait is a list of Python logging handlers that route events to
2. ``allows_schemas``: tells the EventLog *which* events should be recorded. No events are emitted by default; all recorded events must be listed here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed_schemas

Is it possible to easily generate a list of all current schemas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... we don't have functionality for this in telemetry, but that's an interesting idea. We could add an option to get all registered schemas in jupyterhub.

Our general approach has been to force admins to whitelist all events they'd like to record, so that there's no data collected without operator knowledge.

@yuvipanda, thoughts?

docs/source/events/index.rst Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
.. jsonschema:: ../../../jupyterhub/event-schemas/server-actions/v1.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can files like this be generated instead of manually maintained? Not needed immediately, but for future ideas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not right now. That was actually my intention for adding pydantic to telemetry. Check out my example PR on Yuvi's JupyterHub branch..

With pydantic, we get doc generation, validation, and testing for free. The events themselves are python objects.

@minrk
Copy link
Member

minrk commented Sep 19, 2019

Left a couple comments on typos in docs, but 👍 to merge after that.

@Zsailer
Copy link
Member Author

Zsailer commented Sep 19, 2019

I think the new test failures are for reasons outside this PR.

@minrk minrk merged commit ca00c0e into jupyterhub:master Sep 24, 2019
yuvipanda added a commit to yuvipanda/datahub-old-fork that referenced this pull request Oct 28, 2019
We want to get access to the eventlogging feature
(jupyterhub/jupyterhub#2698) and the
user-redirect customization
(jupyterhub/jupyterhub#2790?)

This is a little scary, but there is enough testing that I feel
this should be fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants