Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Number of downloads and other statistics #3781

Open
roysubs opened this issue Dec 4, 2019 · 11 comments
Open

[Feature] Number of downloads and other statistics #3781

roysubs opened this issue Dec 4, 2019 · 11 comments

Comments

@roysubs
Copy link

roysubs commented Dec 4, 2019

Do you record stats on the packages that are most often downloaded and are these things available?

Also, are there forums where people openly discuss their usage of scoop that you can point me at?

@Calinou
Copy link
Contributor

Calinou commented Dec 4, 2019

We don't have analytics since packages are downloaded directly from developers' official download links. As far as I know, there are no plans to integrate an analytics system à la Homebrew where Scoop would "phone home" to a server.

Also, are there forums where people openly discuss their usage of scoop that you can point me at?

There's a Discord server and this issue tracker, but that seems to be it so far.

@r15ch13
Copy link
Member

r15ch13 commented Dec 4, 2019

As Calinou says, there are no analytics right now.

I've had this feature in mind for a while, but it would have to implemented with privacy in mind.

  • It should only track packages from the official buckets, so it doesn't expose private packages (package names)
  • It could track the PowerShell version
  • The server shouldn't track any geolocation data or IPs
  • The server probably has to handle good amount of traffic

I don't know which additional metrics would be useful to track to justify this feature.

@grantcarthew
Copy link

I suggested as such with #3658

The idea was canned without a second thought.

It would be a really useful tool.

@rashil2000
Copy link
Member

From @kiedtl in #3311:

Can we implement a statistics feature like Homebrew analytics?

There need not be any web interface, there could be a dedicated command for it (e.g. scoop stats <app>). It would serve to show package creators how their packages are doing, and if there ever came a time where we would have to curate packages, the statistics could show us the least downloaded packages.

Of course, this would require someone to setup a dedicated server somewhere.

@rashil2000 rashil2000 changed the title [Discussion] Most downloaded packages? [Feature] Number of downloads and other statistics Jan 6, 2022
@rashil2000
Copy link
Member

From @redactedscribe in #4626:

It would be cool to have some insight into the number of downloads a given bucket has served, or the number of times an app has been downloaded overall and for a given version. Does Scoop collect any kind of interesting anonymous statistics that could be published online and updated automatically? Maybe these stats could be displayed under scoop bucket info (doesn't exist) and scoop info? I realise this would be a low priority, but I'd be very interested in seeing such numbers if anyone would be willing to bring this to life. I don't have much experience with Chocolatey but it tracks some basic download statistics.

Thanks!

@niheaven
Copy link
Member

niheaven commented Jan 6, 2022

I don't like the idea. Scoop is an app installation helper, not an app store, so why we collect these data that may contains user privacy, though it may be anonymous?

Scoop doesn't host any apps (except little ones in binaries repo), so it is hard to track download status. This is a great advantage of Scoop, right?

@rashil2000
Copy link
Member

I think some data is actually useful, for example, how many systems are 32bit. This helps manifest authors decide if they want to keep an old version of an app (that still has a 32bit build) or upgrade to a newer, 64bit-only build.

@niheaven
Copy link
Member

niheaven commented Jan 6, 2022

I think some data is actually useful, for example, how many systems are 32bit. This helps manifest authors decide if they want to keep an old version of an app (that still has a 32bit build) or upgrade to a newer, 64bit-only build.

Yes, it is useful, but these are status of manifests, not status of usage (e.g., download, install, etc.).

@redactedscribe
Copy link

I wouldn't be in favour of collecting any data that is personally identifiable, but since there aren't any IDs assigned to Scoop users as far as I'm aware, this shouldn't be a problem anyway.

Data such as download count can serve as an indicator when choosing an app to download, as sometimes an app's popularity hints towards the app you should be downloading (especially useful if there are several variations of an app/manifest to choose from).

@rashil2000
Copy link
Member

I have done some digging around other package managers out there, and here's what I found.


System package managers

APT (Debian)

Opt in. Sets up a weekly cron job to send the list of installed packages and the last access times of files installed by those packages.

Ref: https://popcon.debian.org

Snap (Ubuntu)

Mandatory. Not documented properly, but most probably (on every install command):

  • List of installed snaps
  • OS info (distro, kernel, architecture etc.)

Ref: https://snapcraft.io/docs/snap-store-metrics

Winget (Windows)

Opt out. Privacy policy is same as Microsoft Windows Privacy Policy itself. Telemetry events not documented, but can be seen from the CLI's source code at https://github.com/microsoft/winget-cli/blob/master/src/AppInstallerCommonCore/AppInstallerTelemetry.cpp. From what I can see, it logs all installation events, search results, manifest information, failure information etc.

Ref: https://github.com/microsoft/winget-cli/blob/master/privacy.md

Third party package managers

Homebrew (MacOS and Linux)

Opt out. Uses Google Analytics.

  • Homebrew User Agent (contains OS version and architecture)
  • Tracking ID
  • User ID
  • IP address
  • Homebrew version
  • Event type (install, install_on_request, cask_install, BuildError) with full CLI passed options, for all packages belonging to public taps

Ref: https://docs.brew.sh/Analytics

Chocolatey (Windows)

No telemetry.

Ref: https://docs.chocolatey.org/en-us/information/security

MacPorts (MacOS)

Opt in. A daemon runs once a week to submit the following information:

  • A unique identifier for the MacPorts installation
  • Version numbers and variants of MacPorts, OS X, GCC, XCode (and CLI Tools), stdlib, architecture etc.
  • List of installed ports, with their versions, variants and active status

Ref: https://ports.macports.org/statistics/faq

Language Package Managers

NPM

Mandatory. Every time the npm or npx command is used, the following may be collected:

  • A random, unique identifier, called npm-session, for each time you run commands like npm install
  • Names and versions of your project's dependencies, their dependencies, and so on
  • Versions of Node.js, the npm command, and the operating system you are using
  • An npm-in-ci header, showing whether the command was run on a continuous integration server
  • The scope of the package for which you ran npm install, as an npm-scope header
  • A referrer header that shows the command you ran, with any file or directory paths redacted
  • Data about the software you're using to access the registry, such as the User-Agent string
  • Network request data, such as the date and time, your IP address, and the URL

Ref: https://docs.npmjs.com/policies/privacy

Yarn

Opt out. Data are sent via batches, roughly every seven days. The following is collected:

  • Yarn version
  • Which command name is used (but not its arguments)
  • Active (non-private) plugin names
  • Number of installs run during the week
  • Number of different projects having been installed
  • Number of installs for the nm linker
  • Number of workspaces
  • Number of dependencies
  • The packageExtensions field (name of extended + name of the extra dependency)
  • IP address

Ref: https://yarnpkg.com/advanced/telemetry

@rashil2000
Copy link
Member

rashil2000 commented Sep 4, 2022

From the datapoints above, I think the following are useful and relevant for Scoop:

  • Randomly generated (one-time) anonymous ID
  • Machine info
    • OS build number
    • OS Architecture
    • Scoop version
    • PowerShell Desktop version
    • PowerShell Core version
  • Apps installed from public buckets (need to filter out private apps)
    • Name
    • Version
    • Last updated
    • Source
    • Architecture
    • User or Global installation
    • Installation status
  • Public buckets (need to filter out private buckets)
    • Name
    • Source
    • Last updated
    • Manifest count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants