-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument JupyterHub to record events with jupyter_telemetry [Part II] #2698
Conversation
- Introduce the EventLog class from BinderHub for emitting structured event data - Instrument server starts and stops to emit events - Defaults to not saving any events anywhere
They're pure python, and should be ok
jupyterhub still supports Python 3.5
Privacy by default!
Full circle, since the code in jupyter_telemetry came from here: jupyter/telemetry#6
Move it to YAML, since jupyter_telemetry supports these natively.
f959cbe
to
263c5e8
Compare
@Zsailer and @yuvipanda! I'm just dropping in to say that I think this is very exciting work! I'll be very happy to see this in place! ❤️ 🎉 |
I'm actually puzzled by these failing tests. They pass locally, so I'm struggling to identify why they fail on Travis. I'm investigating further, but if anyone has immediate insights I'd really appreciate it! 😃 |
@Zsailer, I'm fairly confident on the following:
@pytest.mark.parametrize('schema, version, event', valid_events)
def test_valid_events(get_hub_and_sink, schema, version, event):
hub, sink = get_hub_and_sink(schema)
# Record event.
hub.eventlog.record_event(schema, version, event)
# Inspect consumed event.
output = sink.getvalue()
data = json.loads(output) The question then becomes, why is Verification of my hypothesisThe error seen in travis, actually matches exactly what I get if I try to invoke Me reproducing the errorThe travis logs |
Thanks for investigating, @consideRatio 😃 Yes, I know that |
Alright, I'm really stumped here. This is really weird. The tests I've written pass locally but fail on CI. This makes it extremely difficult to debug. I think it's an issue with The test:
Locally, the StringIO buffer contains the event data. On CI, the StringIO buffer is empty and fails the test. What's crazy is that I could remove other tests in other files and this test would pass. So my hunch is that MockHub instances from other tests were not cleared and the logging handlers + stringio were never properly configured in the MockHub. I started addressing each case where the MockHub instances were not cleared, adding other tests back in one at a time. It worked for awhile but eventually failed again. Does anyone have insight on this issue? Has anyone else run into issues where jupyterhub tests affect each other? |
re-run init_eventlog to ensure event logging is hooked up
Sorry for leaving you hanging. I'm digging into this one today. First, instead of using @fixture
def event_sink(app):
sink = io.StringIO()
handler = logging.StreamHandler(sink)
cfg = app.config
cfg.EventLog.handlers = [handler]
cfg.EventLog.allowed_schemas = [schema]
# re-initialize EventLog object to register events
app.init_events()
yield sink The reason it's failing on CI and not for you is likely that CI is running all the tests and you are only testing the one file. I can reproduce the failure by running all the tests, which means it's presumably an un-cleared instance or some other state from one test to the next. I see you tried to make sure this didn't happen with |
causes some weird behavior, such as event log not working
7c17a1b
to
949d8d0
Compare
Found the state pollution, and it was weird! alembic, the database migration tool we use, configures logging when it runs (perhaps it shouldn't?). This has disable_existing_loggers=True by default, which disables all loggers when it runs. Since EventLogger is a logger, this gets disabled, but that's the wrong thing to do. It was in pytest -vx jupyterhub/tests/test_api.py jupyterhub/tests/test_eventlog.py # success
pytest -vx jupyterhub/tests/test_app.py jupyterhub/tests/test_eventlog.py # failure, aha! One of these tests
pytest -vx jupyterhub/tests/test_app.py::test_resume_spawners jupyterhub/tests/test_eventlog.py # failure, this is the one! I've pushed a couple commits simplifying the retrieval of the eventlog and reverting the |
AWESOME! <3 |
@minrk you are my hero. But seriously, thank you so much! It was the Just for laughs, check out the various attempts I took at solving this issue on my other branch 🤣 |
Does this mean this is ready to be merged? |
I've got it in the JupyterHub/Binder team meeting agenda for today :) |
To begin recording events, you'll need to set two configurations: | ||
|
||
1. ``handlers``: tells the EventLog *where* to route your events. This trait is a list of Python logging handlers that route events to | ||
2. ``allows_schemas``: tells the EventLog *which* events should be recorded. No events are emitted by default; all recorded events must be listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowed_schemas
Is it possible to easily generate a list of all current schemas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... we don't have functionality for this in telemetry, but that's an interesting idea. We could add an option to get all registered schemas in jupyterhub.
Our general approach has been to force admins to whitelist all events they'd like to record, so that there's no data collected without operator knowledge.
@yuvipanda, thoughts?
@@ -0,0 +1 @@ | |||
.. jsonschema:: ../../../jupyterhub/event-schemas/server-actions/v1.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can files like this be generated instead of manually maintained? Not needed immediately, but for future ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not right now. That was actually my intention for adding pydantic to telemetry. Check out my example PR on Yuvi's JupyterHub branch..
With pydantic, we get doc generation, validation, and testing for free. The events themselves are python objects.
Left a couple comments on typos in docs, but 👍 to merge after that. |
I think the new test failures are for reasons outside this PR. |
We want to get access to the eventlogging feature (jupyterhub/jupyterhub#2698) and the user-redirect customization (jupyterhub/jupyterhub#2790?) This is a little scary, but there is enough testing that I feel this should be fine.
This adds documentation and tests to @yuvipanda's PR in #2542.
The original instruments JupyterHub to record events about user server starts & stops using the jupyter_telemetry library.