diff --git a/.gitignore b/.gitignore index 99d8bde..7b4a501 100644 --- a/.gitignore +++ b/.gitignore @@ -19,10 +19,13 @@ b000027.txt *.env # Log files +# the contents of the `logs` directory are ignored using a local `.gitignore`, +# which has the desired bonus effect of ensuring that the `logs` directory is +# always present npm-debug.log* yarn-debug.log* yarn-error.log* -api.log +cysignals_crash_logs/* # Editor directories and files .idea diff --git a/README.md b/README.md index 95cbc5c..5fdfb83 100644 --- a/README.md +++ b/README.md @@ -53,12 +53,14 @@ If you plan on developing backscope, you might find some of these useful: - [Understanding our requirements files](doc/requirements.md) - [API endpoints](doc/api_endpoints.md) - [Directory descriptions](doc/directory_descriptions.md) +- [Logging events](doc/logging.md) - [Writing and running tests](doc/tests.md) If you are a maintainer, you might find some of these useful: - [Updating frontscope on the CU server](doc/update-frontscope.md) - [Updating backscope on the CU server](doc/update-backscope.md) +- [Finding and reading logs](doc/logging.md) - [Administering our server](doc/server-administration.md) - [Administering the database](doc/database-administration.md) - [README file for the server](server/README.md) diff --git a/doc/logging.md b/doc/logging.md new file mode 100644 index 0000000..e1e85fa --- /dev/null +++ b/doc/logging.md @@ -0,0 +1,168 @@ +# Logging + +## Where to find logs + +### Basics + +Logs are kept in the `logs` directory, in the [rotating log files](https://docs.python.org/3/library/logging.handlers.html#logging.handlers.RotatingFileHandler) `api.log`, `api.log.1`, …, `api.log.9`. These files are created automatically as needed. The newest logs are at the end of `api.log`. + +When the `DEVELOPMENT` environment flag is set, logs are also printed to the console. The console logs include events that aren't important enough to store long-term in the log files. + +### File rotation + +When `api.log` fills up to its maximum size of 20 MB, it's renamed to `api.log.1`, while the former `api.log.1` becomes `api.log.2`, …, the former `api.log.8` becomes `api.log.9`, and the former `api.log.9` is deleted. + +If there's a big problem that creates enough log entries and fills up all the log files, we'll start losing the logs from the beginning of the problem, which might be the most valuable ones. The logs might fill up after as few as 20,000 entries, based on the estimate that a 404 error from the OEIS generates a 5–10 kB log entry. We could avoid this pitfall by switching from size-based to time-based rotation. That doesn't seem worth the trouble at our current small scale, though. + +## How to read logs + +### Format + +On the terminal, log entries are printed in a pretty, somewhat human-readable format. Unfortunately, some log entries contain very long values, like HTTP request dumps. This makes them a lot less readable. That might be a good thing to fix. + +In the log files, each entry is written on its own line as a JSON dictionary. The string stored in the "event" key tells you what kind of event each entry describes. Here are some known event types. The ones in bold are directly generated by Backscope. + +* "Exception on..." – Uncaught exception +* **"request issue"** – Problem with an HTTP request + +### Event types + +#### Uncaught exception + +When the event string starts with "Exception on...", the log entry describes an uncaught exception. + +#### Request issue + +A "request issue" event describes a problem with an HTTP request. It should always come with a dump of the associated request-response conversation, stored in the key "response" as a base64 string: +``` +"response": "PCBHRVQgc3R1ZmYNCjwgSFRUUC8xLjENCg==" +``` +Applying [`base64.b64decode`](https://docs.python.org/3/library/base64.html#base64.b64decode) to the conversation will turn it into byte array, like this: +```python +b'< GET stuff\\r\\n< HTTP/1.1\\r\\n< Host: oeis.org\\r\\n< ...' +``` +You can turn the byte array into a string, for printing or other uses, by calling its `decode('utf-8')` method. To understand how this string is formatted, let's look at an actual example (somewhat shortened). In the conversation below, we ask the OEIS for the B-file of a non-existent sequence, and we get a 404 error in response. To distinguish the two sides of the conversation, a `<` is added to the start of every request line, and a `>` is added to the start of every response line. (The last line of the response breaks over several lines when it's printed out here, because it contains newline characters.) +``` +< GET /A000000/b000000.txt HTTP/1.1 +< Host: oeis.org +< User-Agent: python-requests/2.31.0 +< Accept-Encoding: gzip, deflate +< Accept: */* +< Connection: keep-alive +< + +> HTTP/1.1 404 Not Found +> Cache-Control: private, no-store +> Content-Type: text/html; charset=utf-8 +> Date: Thu, 11 Apr 2024 04:05:59 GMT +> Vary: * +> Transfer-Encoding: chunked +> + + + + + + + + The On-Line Encyclopedia of Integer Sequences® (OEIS®) + + + + + + + +``` + +## Choosing the level of detail + +What gets logged depends on what environment mode Backscope is running in. Each log entry is assigned a [level of importance](https://docs.python.org/3/library/logging.html#logging-levels). From highest to lowest, the possible levels are: *critical*, *error*, *warning*, *info*, *debug*. + +The file logs include all entries at *warning* level or higher. If the `DEVELOPMENT` environment flag is set, the console log includes all entries at *info* level or higher, and it also includes entries at *debug* level if the `DEBUG` flag is set. Without the `DEVELOPMENT` flag, the console log is silent. + +## How to write logs + +### Package documentation + +We use [*structlog*](https://www.structlog.org/) package. + +### Basic usage + +#### Starting a draft log entry + +The function call +```python +current_app.structlogger.bind(tags=[]) +``` +returns a draft of a log entry, with no content except an empty list of tags. + +As this example shows, the app keeps its structured logger in the custom `structlogger` attribute. The default `logger` attribute still points to the basic logger that Flask is presumably designed to use internally. The basic logger feeds its output into the structured logger. Backscope should always write logs to `current_app.structlogger`, bypassing the basic logger. + +#### Updating a draft log entry + +If `log` is a log entry, the method call +```python +log.bind(attempts=8, mood='confused') +``` +returns an updated version of `log` with (possibly new) keys `'attempts'` and `'mood'` holding the values `8` and `'confused'`, respectively. Keep in mind that all values will ultimately be converted to strings and written in a JSON file or printed to the console. + +The entry that `log` points to is not modified. If you want to use the updated log entry, you have to store the reference to it that `log.bind` returns! In most cases, you'll want to throw away the old version of the log entry and keep the new version in its place. In that case, you might use the pattern +```python +log = log.bind(attempts=8, mood='confused') +``` + +#### Adding and removing tags + +If `log` is a log entry, the call +```python +structlog.get_context(log)['tags'] +``` +returns a reference to the entry's `tags` attribute. You can use that reference to add and remove tags. + +#### Writing a log entry + +Once you've finished drafting a log entry, you can write it to the log by calling +```python +log.error('connection failed') +``` +The string `'connection failed'` should describe the event being logged. It becomes the value of the `'event'` key. + +## Testing the logging system + +### Automated tests + +Automated tests that involve the logging system include: + - [test_logging.py](/flaskr/nscope/test/test_logging.py) + - `LoggingTest` + - [test_lookup_errors.py](/flaskr/nscope/test/test_lookup_errors.py) + - `TestNonexistentSequence` + +### Manual tests + +An easy way to manually generate a log entry is to request values from a non-existent series. For example, you could launch Backscope locally and visit the following URL in a web browser: +``` +http://localhost:5000/api/get_oeis_values/A000000/12 +``` +Backscope should return the following message: + +> Error: B-file for ID 'A000000' not found in OEIS. + +At the same time, it should add a line to the end of `api.log`. You can use `'timestamp'` values to distinguish new log entries from old ones. Right now, when Backscope fails to retrieve information about a sequence, it gets stuck thinking it's waiting for metadata, so future requests for the same sequence will get the response: + +> Error: Value fetching for {oeis_id} in progress. + +Right now, to request the same sequence again and see the same behavior, you first have to clear the database with `flask clear-database`. + +:warning: **Beware:** this will clear the database named by the `POSTGRES_DB` variable in the `.env` file. \ No newline at end of file diff --git a/doc/server-administration.md b/doc/server-administration.md index 1b11965..337de04 100644 --- a/doc/server-administration.md +++ b/doc/server-administration.md @@ -60,6 +60,13 @@ Stop `numberscope`: sudo systemctl stop numberscope ``` +### Accessing `numberscope` logs + +You can see the standard error output of the backscope process via +`sudo journalctl -u backscope`. However, most diagnostic output is directed to +backscope's own rotating log files, in the `~scope/repos/backscope/logs` +directory. + ## Updating the server to a new version of backscope There are two main cases. If the change in version of backscope does not @@ -75,8 +82,7 @@ should consist of three simple steps: You can then try `sudo systemctl status numberscope` to see if the service thinks it is running correctly, and you can point your browser for example at -`https://numberscope.colorado.edu/api/get_commit` (Warning: not working as of -commit 4ebfc9c) or +`https://numberscope.colorado.edu/api/get_commit` or `https://numberscope.colorado.edu/api/get_oeis_values/A000040/128` for a list of the first 2^7 primes. diff --git a/flaskr/__init__.py b/flaskr/__init__.py index 17d1fae..d851bdf 100644 --- a/flaskr/__init__.py +++ b/flaskr/__init__.py @@ -2,6 +2,7 @@ Init file (creates app and database) """ +# external imports import click from dotenv import load_dotenv from flask import Flask, current_app @@ -10,12 +11,16 @@ from flask_migrate import Migrate from flask_sqlalchemy import SQLAlchemy import logging +from logging import StreamHandler from logging.handlers import RotatingFileHandler import os -import subprocess # for calling git -import sys - +import subprocess # for calling gitcurrent_app +import structlog +from structlog.dev import ConsoleRenderer +from structlog.processors import JSONRenderer, TimeStamper +from structlog.stdlib import ProcessorFormatter +# internal imports from .config import config @@ -30,6 +35,64 @@ class EnvironmentException(Exception): # Create a new sql alchemy database object db = SQLAlchemy() +# --- Log setup utilities --- + +# We definitely want the log file to include bad request warnings, so the log +# level has to be at least that low +file_log_level = logging.WARNING + +# If we're in development, the console log level is either DEBUG or INFO, +# depending on whether debugging is turned on. If we're not in development, +# there's no console log, so the console log level is effectively infinite +def console_log_level(environment): + return ( + ( + logging.DEBUG if config[environment].DEBUG else + logging.INFO + ) if config[environment].DEVELOPMENT else + float('inf') + ) + +def create_file_handler(): + # Create a file handler, which writes to rotating files api.log, api.log.1, api.log.2, ..., api.log.9 + handler = RotatingFileHandler('logs/api.log', maxBytes=2*10**7, backupCount=9) + + # Add a JSON formatter + formatter = ProcessorFormatter( + processors = [ + ProcessorFormatter.remove_processors_meta, + TimeStamper(fmt='iso', utc=False), + JSONRenderer(sort_keys=True) + ] + ) + handler.setFormatter(formatter) + + # Set log level + handler.setLevel(file_log_level) + + return handler + +def create_console_handler(environment): + # Create a stream handler that writes to stderr + handler = StreamHandler() + + # Add a console formatter + formatter = ProcessorFormatter( + processors = [ + ProcessorFormatter.remove_processors_meta, + TimeStamper(fmt='iso', utc=False), + ConsoleRenderer() + ] + ) + handler.setFormatter(formatter) + + # Print everything, or almost everything, in the console + handler.setLevel(console_log_level(environment)) + + return handler + +# --- App creation --- + # To choose the environment, we look for settings in the following order: # (1) Function parameter # (2) .env @@ -39,11 +102,50 @@ def create_app(environment=None, oeis_scheme='https', oeis_hostport='oeis.org'): # Get app type from .env if provided. Otherwise, use 'development' environment = os.environ.get('APP_ENVIRONMENT', 'development') - # Initial app and configuration + # --- Set up logging --- + # + # To access `app.logger` before `app` is created, we take advantage of + # knowing that `app` will use the `__name__` logger. we set `app.logger` to + # the most verbose level so that each handler can do its own filtering + # + # "When you want to configure logging for your project, you should do it as + # soon as possible when the program starts. If app.logger is accessed before + # logging is configured, it will add a default handler. If possible, + # configure logging before creating the application object" + # + # https://flask.palletsprojects.com/en/2.3.x/logging/ + + # For explicit logging, we'll use a structured logger that feeds into the + # basic logger. The structured logger will be current_app.structlogger, and + # the basic logger will be current_app.logger + structlog.configure( + processors=[ + structlog.stdlib.filter_by_level, + structlog.stdlib.ProcessorFormatter.wrap_for_formatter, + ], + logger_factory=structlog.stdlib.LoggerFactory() + ) + + # Create file handler + file_handler = create_file_handler() + + # Set up the basic logger + min_level = min(file_log_level, console_log_level(environment)) + basic_logger = logging.getLogger(__name__) + basic_logger.setLevel(min_level) + basic_logger.addHandler(file_handler) + if config[environment].DEVELOPMENT: + # in development, also log to the console, with higher verbosity + basic_logger.addHandler(create_console_handler(environment)) + + # Create app app = Flask(__name__, instance_relative_config=True) + # Add structured logger + app.structlogger = structlog.get_logger(__name__) + # Upload config from config.py - if environment == 'development': CORS(app) + if config[environment].DEVELOPMENT: CORS(app) if config[environment].TESTING and config[environment].SQLALCHEMY_DATABASE_URI is None: ## this is a really convoluted way of throwing an exception when you try ## to run tests without specifying the test database, but allowing the @@ -69,14 +171,6 @@ def create_app(environment=None, oeis_scheme='https', oeis_hostport='oeis.org'): app.config['git_revision_hash'] = subprocess.check_output( ['git', 'rev-parse', '--short', 'HEAD'], encoding='utf8').strip() - # Logging - file_handler = RotatingFileHandler('api.log', maxBytes=10000, backupCount=1) - file_handler.setLevel(logging.INFO) - app.logger.addHandler(file_handler) - stdout = logging.StreamHandler(sys.stdout) - stdout.setLevel(logging.DEBUG) - app.logger.addHandler(stdout) - # Initialize the application db.init_app(app) Migrate(app, db) diff --git a/flaskr/config.py b/flaskr/config.py index c86bde9..e92510c 100644 --- a/flaskr/config.py +++ b/flaskr/config.py @@ -40,12 +40,14 @@ class Config: DOC_PASSWORD = 'password' DEBUG = False + DEVELOPMENT = False TESTING = False PRODUCTION = False class DevConfig(Config): DEBUG = True + DEVELOPMENT = True class TestConfig(Config): diff --git a/flaskr/nscope/test/abstract_endpoint_test.py b/flaskr/nscope/test/abstract_endpoint_test.py index 89b1338..8d84871 100644 --- a/flaskr/nscope/test/abstract_endpoint_test.py +++ b/flaskr/nscope/test/abstract_endpoint_test.py @@ -1,8 +1,33 @@ -import unittest +from structlog.testing import capture_logs import sys -from flaskr import create_app, db, clear_database +import unittest + import flaskr.nscope.views as views +from flaskr import create_app, db, clear_database + +# required attributes: +# +# --- endpoint --- +# the endpoint we're requesting +# +# --- expected_response --- +# the dictionary or string we expect to get back +# +# optional attributes: +# +# --- expected_log_output --- +# the list of log entries we expect the request to produce. each entry is a +# dictionary, which would be written in JSON format in the log +# +# optional methods: +# +# --- process_log_entry --- +# this method is called on each recorded log entry before comparison with the +# expected log output. it can be used to modify each log entry in place. this +# is useful for removing time-dependent parts of log entries, like the date +# line in an HTTP response +# # guidance on test database handling: # https://stackoverflow.com/a/17818795 @@ -14,12 +39,12 @@ def assert_endpoint_test_attr(self, name): f"Can't construct a '{type(self).__name__}' " f"endpoint test without '{name}' attribute" ) - + def __init__(self, *args, **kwargs): # make sure required attributes are present self.assert_endpoint_test_attr('endpoint') self.assert_endpoint_test_attr('expected_response') - + # check whether unittest is running in verbose mode # hat tip StackOverflow users Dimitris Fasarakis Hilliard and EquipDev... # https://stackoverflow.com/a/43002355 @@ -27,19 +52,19 @@ def __init__(self, *args, **kwargs): # ... who provided this code under the MIT license # https://meta.stackexchange.com/q/271080 self.verbose = ('-v' in sys.argv) or ('--verbose' in sys.argv) - + super().__init__(*args, *kwargs) - + def setUp(self): self.app = create_app('testing') self.ctx = self.app.app_context() with self.ctx: clear_database() - + # put mid-test messages on a new line if self.verbose: print() - + def tearDown(self): # wait for background work to finish if self.verbose: @@ -47,22 +72,22 @@ def tearDown(self): views.executor.shutdown() if self.verbose: print(" Background work done") - + # clear database db.session.remove() with self.ctx: db.drop_all() - + def test_endpoint(self): # using test client is recommended in Flask testing how-to # https://flask.palletsprojects.com/en/2.3.x/testing/ # "The test client makes requests to the application without running a live # server." the `with` block runs teardown # https://github.com/pallets/flask/issues/2949 - with self.app.test_client() as client: + with self.app.test_client() as client, capture_logs() as log_output: if self.verbose: print(" Testing response") - + # check status response = client.get(self.endpoint) self.assertEqual(response.status_code, 200) @@ -79,5 +104,17 @@ def test_endpoint(self): "must be a dictionary or a string, not an object of type " f"'{response_type.__name__}'" ) - + + # check logs + if hasattr(self, 'expected_log_output'): + # sort the keys in each log entry + sorted_log_output = [dict(sorted(entry.items())) for entry in log_output] + + # process log entries + if hasattr(self, 'process_log_entry') and callable(self.process_log_entry): + for entry in sorted_log_output: + self.process_log_entry(entry) + + self.assertEqual(sorted_log_output, self.expected_log_output) + # TO DO: test background work \ No newline at end of file diff --git a/flaskr/nscope/test/test_logging.py b/flaskr/nscope/test/test_logging.py new file mode 100644 index 0000000..2650010 --- /dev/null +++ b/flaskr/nscope/test/test_logging.py @@ -0,0 +1,29 @@ +from structlog.testing import capture_logs +import unittest + +from flaskr import create_app, db, clear_database + + +class LoggingTest(unittest.TestCase): + expected_log_output = [{'log_level': 'error', 'event': 'Error message'}] + + def setUp(self): + self.app = create_app('testing') + self.ctx = self.app.app_context() + with self.ctx: + clear_database() + + def tearDown(self): + # clear database + db.session.remove() + with self.ctx: + db.drop_all() + + def test_logging(self): + with capture_logs() as log_output: + self.app.structlogger.error('Error message') + self.assertEqual(log_output, self.expected_log_output) + + +if __name__ == "__main__": + unittest.main() diff --git a/flaskr/nscope/test/test_lookup_errors.py b/flaskr/nscope/test/test_lookup_errors.py index 4496a07..b32dca4 100644 --- a/flaskr/nscope/test/test_lookup_errors.py +++ b/flaskr/nscope/test/test_lookup_errors.py @@ -1,4 +1,5 @@ import unittest + import flaskr.nscope.test.abstract_endpoint_test as abstract_endpoint_test @@ -7,9 +8,32 @@ # # https://oeis.org/wiki/A-numbers # + +expected_oeis_response_b64 = 'PCBHRVQgL0EwMDAwMDAvYjAwMDAwMC50eHQgSFRUUC8xLjENCjwgSG9zdDogb2Vpcy5vcmcNCjwgVXNlci1BZ2VudDogcHl0aG9uLXJlcXVlc3RzLzIuMzEuMA0KPCBBY2NlcHQtRW5jb2Rpbmc6IGd6aXAsIGRlZmxhdGUNCjwgQWNjZXB0OiAqLyoNCjwgQ29ubmVjdGlvbjoga2VlcC1hbGl2ZQ0KPCANCg0KPiBIVFRQLzEuMSA0MDQgTm90IEZvdW5kDQo+IENhY2hlLUNvbnRyb2w6IHByaXZhdGUsIG5vLXN0b3JlDQo+IENvbnRlbnQtVHlwZTogdGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04DQo+IERhdGU6IFRodSwgMTEgQXByIDIwMjQgMjM6NTk6NDUgR01UDQo+IFZhcnk6ICoNCj4gVHJhbnNmZXItRW5jb2Rpbmc6IGNodW5rZWQNCj4gDQoKPCFET0NUWVBFIGh0bWwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDMuMiBGaW5hbC8vRU4iPgo8aHRtbD4KICAKICA8aGVhZD4KICA8c2NyaXB0IGRlZmVyIGRhdGEtZG9tYWluPSJvZWlzLm9yZyIgc3JjPSJodHRwczovL3BsYXVzaWJsZS5pby9qcy9zY3JpcHQuanMiPjwvc2NyaXB0PgogIDxzdHlsZT4KICB0dCB7IGZvbnQtZmFtaWx5OiBtb25vc3BhY2U7IGZvbnQtc2l6ZTogMTAwJTsgfQogIHAuZWRpdGluZyB7IGZvbnQtZmFtaWx5OiBtb25vc3BhY2U7IG1hcmdpbjogMTBweDsgdGV4dC1pbmRlbnQ6IC0xMHB4OyB3b3JkLXdyYXA6YnJlYWstd29yZDt9CiAgcCB7IHdvcmQtd3JhcDogYnJlYWstd29yZDsgfQogIGRpdi5tb3RkIHsgZm9udC13ZWlnaHQ6IGJvbGQ7IHdpZHRoOiA3MCU7IGJvcmRlcjogMXB4IHNvbGlkIGJsYWNrOyBiYWNrZ3JvdW5kLWNvbG9yOiAjZmZmZmNjOyBtYXJnaW46IDFlbTsgfQogIHRkLm1vdGQgeyAgfQogIHAuU2VxLCBkaXYuU2VxIHsgdGV4dC1pbmRlbnQ6IC0xZW07IG1hcmdpbi1sZWZ0OiAxZW07IG1hcmdpbi10b3A6IDA7IG1hcmdpbi1ib3R0b206IDA7IH0KICBwLlNlcSB0dCwgZGl2LlNlcSB0dCB7IHdoaXRlLXNwYWNlOiBwcmUtd3JhcDsgfQogIDwvc3R5bGU+CiAgPG1ldGEgaHR0cC1lcXVpdj0iY29udGVudC10eXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPgogIDxtZXRhIG5hbWU9ImtleXdvcmRzIiBjb250ZW50PSJPRUlTLGludGVnZXIgc2VxdWVuY2VzLFNsb2FuZSIgLz4KICAKICAKICA8dGl0bGU+VGhlIE9uLUxpbmUgRW5jeWNsb3BlZGlhIG9mIEludGVnZXIgU2VxdWVuY2VzJnJlZzsgKE9FSVMmcmVnOyk8L3RpdGxlPgogIDxsaW5rIHJlbD0ic2VhcmNoIiB0eXBlPSJhcHBsaWNhdGlvbi9vcGVuc2VhcmNoZGVzY3JpcHRpb24reG1sIiB0aXRsZT0iT0VJUyIgaHJlZj0iL29laXMueG1sIj4KICA8c2NyaXB0PgogIHZhciBteVVSTCA9ICJcL0EwMDAwMDBcL2IwMDAwMDAudHh0IgogIGZ1bmN0aW9uIHJlZGlyKCkgewogICAgICB2YXIgaG9zdCA9IGRvY3VtZW50LmxvY2F0aW9uLmhvc3RuYW1lOwogICAgICBpZihob3N0ICE9ICJvZWlzLm9yZyIgJiYgaG9zdCAhPSAiMTI3LjAuMC4xIiAmJiBob3N0ICE9ICJsb2NhbGhvc3QiICYmIGhvc3QgIT0gImxvY2FsaG9zdC5sb2NhbGRvbWFpbiIpIHsKICAgICAgICAgIGRvY3VtZW50LmxvY2F0aW9uID0gImh0dHBzIisiOiIrIi8vIisib2VpcyIrIi5vcmcvIiArIG15VVJMOwogICAgICB9CiAgfQogIGZ1bmN0aW9uIHNmKCkgewogICAgICBpZihkb2N1bWVudC5sb2NhdGlvbi5wYXRobmFtZSA9PSAiLyIgJiYgZG9jdW1lbnQuZikgZG9jdW1lbnQuZi5xLmZvY3VzKCk7CiAgfQogIDwvc2NyaXB0PgogIDwvaGVhZD4KICA8Ym9keSBiZ2NvbG9yPSNmZmZmZmYgb25sb2FkPSJyZWRpcigpO3NmKCkiPgogICAgPHRhYmxlIGJvcmRlcj0iMCIgY2VsbHBhZGRpbmc9IjAiIGNlbGxzcGFjaW5nPSIwIiB3aWR0aD0iMTAwJSI+CiAgICA8dHI+PHRkIHdpZHRoPSIxMDAlIiBhbGlnbj0icmlnaHQiPgogICAgICA8Zm9udCBzaXplPS0xPgogICAgICAKICAgICAgICA8YSBocmVmPSIvbG9naW4/cmVkaXJlY3Q9JTJmQTAwMDAwMCUyZmIwMDAwMDAudHh0Ij5sb2dpbjwvYT4KICAgICAgCiAgICAgIDwvZm9udD4KICAgIDx0ciBoZWlnaHQ9NT48dGQ+CiAgICA8L3RhYmxlPgoKICAgIDxjZW50ZXI+CjxzcGFuIHN0eWxlPSJmb250LWZhbWlseTogc2Fucy1zZXJpZjsgZm9udC1zaXplOiA4MyU7IGZvbnQtc3R5bGU6IGl0YWxpYyI+VGhlIE9FSVMgaXMgc3VwcG9ydGVkIGJ5IDxhIGhyZWY9Imh0dHA6Ly9vZWlzZi5vcmcvI0RPTkFURSI+dGhlIG1hbnkgZ2VuZXJvdXMgZG9ub3JzIHRvIHRoZSBPRUlTIEZvdW5kYXRpb248L2E+Ljwvc3Bhbj4KICAgIDxicj4KPHAgc3R5bGU9Im1hcmdpbi10b3A6LTI0cHgiPiZuYnNwOzwvcD4KPGEgaHJlZj0iLyI+PGltZyBib3JkZXI9IjAiIHdpZHRoPSI2MDAiIGhlaWdodD0iMTEwIiBzcmM9Ii9iYW5uZXIyMDIxLmpwZyIgYWx0PSJMb2dvIj48L2E+CiAgICA8YnI+CgoKCgoKCiAgICA8IS0tIG5vIHNwZWNpYWwgZm9udHMgLS0+CiAgICA8L2NlbnRlcj4KCiAgICA8Y2VudGVyPgogICAgPHRhYmxlIGJvcmRlcj0iMCIgY2VsbHNwYWNpbmc9IjAiIGNlbGxwYWRkaW5nPSIwIj4KICAgICAgPHRyPjx0ZD4KICAgICAgICAKICAgIAogICAgPGNlbnRlcj4KICAgICAgICA8Zm9ybSBuYW1lPWYgYWN0aW9uPSIvc2VhcmNoIiBtZXRob2Q9IkdFVCI+CiAgICAgICAgICAgIDx0YWJsZSBjZWxsc3BhY2luZz0wIGNlbGxwYWRkaW5nPTAgYm9yZGVyPTA+CiAgICAgICAgICAgIDx0cj48dGQ+CiAgICAgICAgICAgIDxpbnB1dCBtYXhMZW5ndGg9MTAyNCBzaXplPTU1IG5hbWU9cSB2YWx1ZT0iIiB0aXRsZT0iU2VhcmNoIFF1ZXJ5Ij4KICAgICAgICAgICAgCiAgICAgICAgICAgIAogICAgICAgICAgICAKICAgICAgICAgICAgCiAgICAgICAgICAgIDxpbnB1dCB0eXBlPXN1Ym1pdCB2YWx1ZT0iU2VhcmNoIiBuYW1lPWdvPgogICAgICAgICAgICA8dGQgd2lkdGg9MTA+PHRkPgogICAgICAgICAgICA8Zm9udCBzaXplPS0yPjxhIGhyZWY9Ii9oaW50cy5odG1sIj5IaW50czwvYT48L2ZvbnQ+CiAgICAgICAgICAgIDx0cj48dGQgY29sc3Bhbj0yPgogICAgICAgICAgICA8Zm9udCBzaXplPS0xPgogICAgICAgICAgICAgICAgKEdyZWV0aW5ncyBmcm9tIDxhIGhyZWY9Ii93ZWxjb21lIj5UaGUgT24tTGluZSBFbmN5Y2xvcGVkaWEgb2YgSW50ZWdlciBTZXF1ZW5jZXM8L2E+ISkKICAgICAgICAgICAgPC9mb250PgogICAgICAgICAgICA8L3RhYmxlPgogICAgICAgIDwvZm9ybT4KICAgIDwvY2VudGVyPgoKICAgIDxiPlNvcnJ5LCB0aGUgcGFnZSB5b3UgcmVxdWVzdGVkIHdhcyBub3QgZm91bmQuCiAgICBUcnkgdGhlIHNlYXJjaCBib3ggYXQgdGhlIHRvcCBvZiB0aGlzIHBhZ2UuPC9iPgoKICAgICAgPC90ZD48L3RyPgogICAgPC90YWJsZT4KICAgIDwvY2VudGVyPgoKICAgIDxwPgoKICAgIDxjZW50ZXI+CiAgICAgIDxhIGhyZWY9Ii8iPkxvb2t1cDwvYT4gfAogICAgICA8YSBocmVmPSIvd2lraS9XZWxjb21lIj48Zm9udCBjb2xvcj0icmVkIj5XZWxjb21lPC9mb250PjwvYT4gfAogICAgICA8YSBocmVmPSIvd2lraS9NYWluX1BhZ2UiPjxmb250IGNvbG9yPSJyZWQiPldpa2k8L2ZvbnQ+PC9hPiB8CiAgICAgIDxhIGhyZWY9Ii93aWtpL1NwZWNpYWw6UmVxdWVzdEFjY291bnQiPlJlZ2lzdGVyPC9hPiB8CiAgICAgIAogICAgICA8YSBocmVmPSIvcGxheS5odG1sIj5NdXNpYzwvYT4gfAogICAgICA8YSBocmVmPSIvcGxvdDIuaHRtbCI+UGxvdCAyPC9hPiB8CiAgICAgIDxhIGhyZWY9Ii9kZW1vMS5odG1sIj5EZW1vczwvYT4gfAogICAgICA8YSBocmVmPSIvd2lraS9JbmRleF90b19PRUlTIj5JbmRleDwvYT4gfAogICAgICA8YSBocmVmPSIvU2Jyb3dzZS5odG1sIj5Ccm93c2U8L2E+IHwKICAgICAgPGEgaHJlZj0iL21vcmUuaHRtbCI+TW9yZTwvYT4gfAogICAgICA8YSBocmVmPSIvd2ViY2FtIj5XZWJDYW08L2E+CgogICAgICA8YnI+CgogICAgICA8YSBocmVmPSIvU3VibWl0Lmh0bWwiPkNvbnRyaWJ1dGUgbmV3IHNlcS4gb3IgY29tbWVudDwvYT4gfAogICAgICA8YSBocmVmPSIvZWlzaGVscDIuaHRtbCI+Rm9ybWF0PC9hPiB8CiAgICAgIDxhIGhyZWY9Ii93aWtpL1N0eWxlX1NoZWV0Ij5TdHlsZSBTaGVldDwvYT4gfAogICAgICA8YSBocmVmPSIvdHJhbnNmb3Jtcy5odG1sIj5UcmFuc2Zvcm1zPC9hPiB8CiAgICAgIDxhIGhyZWY9Ii9vbC5odG1sIj5TdXBlcnNlZWtlcjwvYT4gfAogICAgICA8YSBocmVmPSIvcmVjZW50Ij5SZWNlbnRzPC9hPgoKICAgICAgPGJyPgoKICAgICAgPGEgaHJlZj0iL2NvbW11bml0eS5odG1sIj5UaGUgT0VJUyBDb21tdW5pdHk8L2E+IHwKICAgICAgTWFpbnRhaW5lZCBieSA8YSBocmVmPSJodHRwOi8vb2Vpc2Yub3JnIj5UaGUgT0VJUyBGb3VuZGF0aW9uIEluYy48L2E+CiAgICA8L2NlbnRlcj4KCiAgICA8cD4KICAgIDxjZW50ZXI+CiAgIDxzcGFuIHN0eWxlPSJmb250LWZhbWlseTogc2Fucy1zZXJpZjsgZm9udC1zaXplOiA4MyU7IGZvbnQtc3R5bGU6IGl0YWxpYyI+CiAgICA8YSBocmVmPSIvd2lraS9MZWdhbF9Eb2N1bWVudHMiPgogICAgTGljZW5zZSBBZ3JlZW1lbnRzLCBUZXJtcyBvZiBVc2UsIFByaXZhY3kgUG9saWN5LgogICAgPC9hPi4KICAgIDwvc3Bhbj4KICAgIDwvY2VudGVyPgoKICAgIDxwPgogICAgPGNlbnRlcj4KICAgIDxmb250IHNpemU9LTE+TGFzdCBtb2RpZmllZCBBcHJpbCAxMSAxOTozMiBFRFQgMjAyNC4gIENvbnRhaW5zIDM3MTYxOSBzZXF1ZW5jZXMuIChSdW5uaW5nIG9uIG9laXM0Lik8L2ZvbnQ+CiAgICA8L2NlbnRlcj4KICAgIDxwPgoKICA8L2JvZHk+CjwvaHRtbD4K' + +# see `process_log_entry` +def process_response(response_b64): + return response_b64[:380] + response_b64[424:5000] + class TestNonexistentSequence(abstract_endpoint_test.AbstractEndpointTest): - endpoint = "http://localhost:5000/api/get_oeis_values/A000000/12" + endpoint = 'http://localhost:5000/api/get_oeis_values/A000000/12' expected_response = "Error: B-file for ID 'A000000' not found in OEIS." + expected_log_output = [ + { + 'event': 'request issue', + 'log_level': 'error', + 'response': process_response(expected_oeis_response_b64), + 'tags': ['http error'] + } + ] + + def process_log_entry(self, entry): + # an HTTP response includes a time-dependent 'Date' parameter. in addition, + # the OEIS 404 page ends with a frequently updated paragraph that shows the + # time of the last update and the current number of sequences. to allow the + # test to use a fixed response value in the expected log entry, we cut out + # the frequently changing parts of the response + entry['response'] = process_response(entry['response']) if __name__ == "__main__": diff --git a/flaskr/nscope/views.py b/flaskr/nscope/views.py index 9e28228..282e3e6 100644 --- a/flaskr/nscope/views.py +++ b/flaskr/nscope/views.py @@ -2,18 +2,22 @@ Views for nscope model """ -from urllib.parse import urlunparse - -from flask import Blueprint, jsonify, current_app, render_template -from flask_executor import Executor -from flaskr import db -from flaskr.nscope.models import * - +# external imports +import base64 # for encoding response dumps import cypari2 from cypari2.convert import gen_to_python +from flask import Blueprint, jsonify, current_app, render_template +from flask_executor import Executor import re import requests +from requests_toolbelt.utils import dump +import structlog +import subprocess # for calling git +from urllib.parse import urlunparse +# internal imports +from flaskr import db +from flaskr.nscope.models import * executor = Executor() bp = Blueprint("nscope", __name__) @@ -24,16 +28,50 @@ def index(): return render_template("index.html") -def oeis_url(path='', query=''): +def write_request_log(log, response, error=False): + response_b64 = base64.b64encode(dump.dump_all(response)).decode('ascii') + log = log.bind(response=response_b64) + if error: + log.error('request issue') + else: + log.warning('request issue') + +def oeis_url(path=''): return urlunparse([ current_app.config['oeis_scheme'], current_app.config['oeis_hostport'], path, '', # path parameters - query, + '', # query '' # fragment ]) +def oeis_get(path='', params=None, timeout=4): + # start keeping track of what's going on + log = current_app.structlogger.bind(tags=[]) + tags = structlog.get_context(log)['tags'] + + # try request + try: + response = requests.get(oeis_url(path), params, timeout=timeout) + if response.history: + tags.append('history') + response.raise_for_status() # raise an exception on 4xx and 5xx status codes + except requests.HTTPError as ex: + tags.append('http error') + write_request_log(log, response, error=True) + return ex + except requests.RequestException as ex: + tags.append('exception') + write_request_log(log, response, error=True) + return ex + + #----------------------------------------------------------------------------- + # if we've gotten this far, it's going well enough to return the response + #----------------------------------------------------------------------------- + + return response + def fetch_metadata(oeis_id): """ When called with a *valid* oeis id, makes sure the metadata has been obtained, and returns the corresponding Sequence object with valid @@ -50,8 +88,8 @@ def fetch_metadata(oeis_id): seq.meta_requested = True db.session.commit() # Now grab the data - match_url = oeis_url('/search', f'q={seq.id}&fmt=json') - r = requests.get(match_url).json() + search_params = {'q': seq.id, 'fmt': 'json'} + r = oeis_get('/search', search_params) if r['results'] != None: # Found some metadata backrefs = [] target_number = int(seq.id[1:]) @@ -66,7 +104,8 @@ def fetch_metadata(oeis_id): backrefs.append('A' + str(result['number']).zfill(6)) saw += 1 if saw < matches: - r = requests.get(match_url + f"&start={saw}").json() + search_params['start'] = saw + r = oeis_get('\search', search_params).json() if r['results'] == None: break seq.backrefs = backrefs @@ -114,9 +153,11 @@ def fetch_values(oeis_id): seq.values_requested = True db.session.commit() # Now try to get it from the OEIS: - r = requests.get(oeis_url(f'/{oeis_id}/b{oeis_id[1:]}.txt'), timeout=4) - if r.status_code == 404: - return LookupError(f"B-file for ID '{oeis_id}' not found in OEIS.") + r = oeis_get(f'/{oeis_id}/b{oeis_id[1:]}.txt') + # Test for 404 error. Hat tip StackOverflow user Lukasa + # https://stackoverflow.com/a/19343099 + if isinstance(r, requests.HTTPError) and r.response.status_code == 404: + return LookupError(f"B-file for ID '{oeis_id}' not found in OEIS.") # Parse the b-file: first = float('inf') last = float('-inf') @@ -276,7 +317,7 @@ def get_oeis_name_and_values(oeis_id): # Now get the name seq = find_oeis_sequence(valid_oeis_id) if not seq.name or seq.name == placeholder_name(oeis_id): - r = requests.get(oeis_url('/search', f'q=id:{oeis_id}&fmt=json'), timeout=4).json() + r = oeis_get('/search', {'id': oeis_id, 'fmt': 'json'}).json() if r['results'] != None: seq.name = r['results'][0]['name'] db.session.commit() diff --git a/logs/.gitignore b/logs/.gitignore new file mode 100644 index 0000000..468164c --- /dev/null +++ b/logs/.gitignore @@ -0,0 +1,11 @@ +# almost everything in this directory should be a log file, and therefore +# shouldn't be tracked. the one exception is this `.gitignore` file, as +# explained below +* + +# tracking this file ensures that the `logs` directory will always be present, +# so the logging system can write to it. hat tip to Jamie Flournoy for this idea +# +# https://stackoverflow.com/a/932982 +# +!.gitignore diff --git a/mock_oeis/__init__.py b/mock_oeis/__init__.py index 711f9b6..d6621fb 100644 --- a/mock_oeis/__init__.py +++ b/mock_oeis/__init__.py @@ -51,6 +51,5 @@ def search(): def create_app(): app = Flask(__name__) - ##app.config.from_object() app.register_blueprint(api) - return app \ No newline at end of file + return app diff --git a/requirements-freeze.txt b/requirements-freeze.txt index bbe7645..dddf317 100644 --- a/requirements-freeze.txt +++ b/requirements-freeze.txt @@ -26,8 +26,10 @@ python-dateutil==2.8.2 python-dotenv==1.0.0 python-editor==1.0.4 requests==2.31.0 +requests-toolbelt==1.0.0 six==1.16.0 SQLAlchemy==2.0.22 +structlog==24.1.0 typing_extensions==4.8.0 urllib3==2.0.7 Werkzeug==3.0.1 diff --git a/requirements.txt b/requirements.txt index a907232..77ab69c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -5,4 +5,6 @@ Flask-Migrate # library for migrating database models psycopg2-binary # library for interacting with Postgres python-dotenv # library for .env files requests # library for making HTTP requests +requests-toolbelt # we use this for logging HTTP requests +structlog # package for structured logging wheel # package for building packages, not installed by default on Ubuntu