Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update roadmap with a clearly articulated security model & strategy #5718

Open
glyph opened this issue Apr 18, 2019 · 5 comments
Open

Update roadmap with a clearly articulated security model & strategy #5718

glyph opened this issue Apr 18, 2019 · 5 comments
Labels
developer experience Anything that improves the experience for Warehouse devs documentation needs discussion a product management/policy issue maintainers and users should discuss

Comments

@glyph
Copy link

glyph commented Apr 18, 2019

What's the problem this feature will solve?
Right now, PyPI has a way to report a security issue, but no clear description of what a "security issue" might be. Efforts like #5567 will improve the security of the site, but to what end?

Meanwhile, attacks against the open source supply chain are escalating, and more typo-squatting malware gets posted to PyPI every day.

Describe the solution you'd like

  • I'd like https://pypi.org/security/ to describe the threat model of PyPI and what properties it attempts to provide. In particular: what constitutes a security issue that should be reported
  • I'd like https://warehouse.readthedocs.io/security/ to describe what properties it would like to provide in the long term. Particularly, where do efforts like the TOTP work fit into a long-term vision for the security of the site and for its users?
@brainwane brainwane added developer experience Anything that improves the experience for Warehouse devs needs discussion a product management/policy issue maintainers and users should discuss labels Jun 21, 2019
@brainwane
Copy link
Contributor

This evening I gave a talk to some students in an application security class, and figured my notes could be used to start addressing this issue.

The section headings are borrowed from the textbook The art of software security assessment: identifying and preventing software vulnerabilities by Mark Dowd (Chapter 4. Application Review Process):

  • General application purpose—What is the application supposed to do?
  • Fundamental security expectations—What security expectations do legitimate users of this application have?
  • Assets and entry points—How does data get into the system, and what value does the system have that an attacker might be interested in?
  • Components and modules—What are the major divisions between the application’s components and modules?
  • Intermodule relationships—At a high level, how do different modules in the application (within Warehouse) communicate?
  • Major trust boundaries—What are the major boundaries that enforce security expectations?

General application purpose: What is PyPI/Warehouse?

Glossary.

  • language-specific platform for sharing packages -- both libraries and applications
  • part of a toolchain; https://packaging.python.org/ covers the official open source tools for uploading and downloading (most people use PyPI by downloading via pip)
  • Since reads are much more common than writes (much more goes out than goes in), we try to cache as much as possible.
  • sdists and wheels -- we are indeed hosting binaries that we haven't inspected -- more at https://packaging.python.org/
  • History

Fundamental security expectations: Users and what they can do
Reuse user classes from docs and owners vs maintainers.

How do you become one of these kinds of users? This is defined by project namespace. Initial project Owner is the first person to upload a project to PyPI with that project name.

What can these different owners do? See #5863 .

But also! ALL users, including people who are not logged in, can read the records of package activity.

Assets and entry points
How does data get into the system, and what value does the system have that an attacker might be interested in?

  • API: Packages and projects get into the system via the API (users use Twine).
  • Web browser: Initial user creation, a lot of privilege creation/change/deletion, and the administrative interface

Components and modules

https://warehouse.readthedocs.io/application/ goes over this a bit.

  • Pyramid, our web application framework
  • Database access (we use SQLAlchemy and Postgres)
  • Auth
  • Token generation (Macaroons)

Major trust boundaries
What are the major boundaries that enforce security expectations?

  • Login: API and browser-based
  • User privileges as defined in the database

@brainwane
Copy link
Contributor

There are a few items in #2794 (comment) that should also be in such a document, such as release immutability.

@brainwane
Copy link
Contributor

In this discussion thread, @tiran says:

I would like to see a general and user-oriented PEP about PyPI security to answer these questions:

How is a package owner/maintainer able to verify that PyPI is serving correct and unmodified files?
As a user of PyPI how can I make sure that pip installs correct and unmodified packages?
As a user of PyPI how can I protect myself against typo-squatting attacks or compromised versions of a package?

and Donald Stufft notes,

this feels to me more like something that should be documented either on PyPI or as part of packaging.python.org.

I think documentation of the answers to those questions ought to be incorporated into the documentation push @glyph is suggesting.

@tiran
Copy link

tiran commented Jan 17, 2020

Thanks @brainwane

My thought provoking, inconvenient, and brutally honest opinion is: PyPI won't be able to deliver this in it's current shape and design. Sooner or later we have to consider a different model that works more like current app stores or Linux distributions. I'm talking about curated content.

I have been thinking about the matter for a while. All I have so far is a half-baked, handwavy proposal of a three layered index:

  1. Standard PyPI as it works today
  2. A filtered subset of PyPI that offers only projects that have gone through a review process.
  3. A subset of (2) that requires each upload, release, and uploader go through vetting and verification process.

Layer (2) should get rid of typo squatting. Layer (3) requires considerable effort but might be a way to generic revenue to support maintenance of PyPI and its tooling.

@ncoghlan
Copy link
Contributor

ncoghlan commented Feb 6, 2020

PyPI is a publishing platform, not a curation platform, and building a language specific curation service doesn't make sense. It's unfortunate that Red Hat chose not to fund further work on https://fedoraproject.org/wiki/Env_and_Stacks/Projects/SoftwareComponentPipeline, but that's still well outside the scope of PyPI, and it's honestly well outside the scope of the PSF as well.

PyPI's job is to make sure that users can verify that what they installed is what the publisher uploaded.

Determining whether or not a particular publisher is trustworthy is a whole different story, and the onus for that will always remain primarily on consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Anything that improves the experience for Warehouse devs documentation needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

No branches or pull requests

5 participants