Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for saving requests to HAR file #146

Closed
asg017 opened this issue Mar 9, 2024 · 9 comments
Closed

Support for saving requests to HAR file #146

asg017 opened this issue Mar 9, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@asg017
Copy link

asg017 commented Mar 9, 2024

Playwright has support for HAR files, which saves all network requests during a session to a custom file format.

The API is awkward but we can use route_from_har. We'll need to add update=True to save requests to the file, instead of "serveing" them.

There's a few more options to:

  • update_content: "embed"|"attach". Maybe --har-content=embed?
  • update_mode: "full"|"minimal". Maybe --har-mode=full?
@simonw simonw added the enhancement New feature or request label Feb 13, 2025
@simonw simonw pinned this issue Feb 13, 2025
@simonw
Copy link
Owner

simonw commented Feb 13, 2025

https://playwright.dev/python/docs/release-notes#version-123 added this in June 2022 https://github.com/microsoft/playwright-python/releases/tag/v1.23.0

context = browser.new_context(record_har_path="github.har.zip")
# ... do stuff ...
context.close()

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

@simonw simonw mentioned this issue Feb 13, 2025
2 tasks
@simonw simonw changed the title Add a new --save-har=my.har option to save requests to HAR file Support for saving requests to HAR file Feb 13, 2025
@simonw
Copy link
Owner

simonw commented Feb 13, 2025

I built this as a separate command:

shot-scraper har https://datasette.io/

That will save it to datasette-io.har.zip (renaming to avoid clobbering existing files) - you can use the -o/--output option to specify somewhere else to save it.

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

% shot-scraper har --help
Usage: shot-scraper har [OPTIONS] URL

  Record a HAR file for the specified page

  Usage:

      shot-scraper har https://datasette.io/

Options:
  -a, --auth FILENAME    Path to JSON authentication context file
  -o, --output FILE      HAR filename
  -j, --javascript TEXT  Execute this JS prior to taking the snapshot
  --timeout INTEGER      Wait this many milliseconds before failing
  --log-console          Write console.log() to stderr
  --fail                 Fail with an error code if a page returns an HTTP
                         error
  --skip                 Skip pages that return HTTP errors
  --bypass-csp           Bypass Content-Security-Policy
  --auth-password TEXT   Password for HTTP Basic authentication
  --auth-username TEXT   Username for HTTP Basic authentication
  --help                 Show this message and exit.

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

Is there an argument for adding this as an option to other commands such as shot-scraper javascript as well? Since this new command supports a -j/--javascript option for executing extra JavaScript I'm not sure that the other commands would benefit from a --record-har option, but maybe they would?

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

This option actually has no effect:

shot-scraper har https://datasette.io/ --javascript 'document.title="mess with the DOM first"'

Because the HAR has already been recorded by the time the JavaScript executes.

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

Having shot-scraper multi able to store HAR files would be neat.

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

Got Claude to write me a very neat pytest fixture for running a localhost web server: https://gist.github.com/simonw/360b520fdb82d48c669db575cf74b9f4

@simonw
Copy link
Owner

simonw commented Feb 13, 2025

Blogged about this here: https://simonwillison.net/2025/Feb/13/shot-scraper/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants