Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose a PyPI package name that will avoid name conflicts #65

Closed
elsiehupp opened this issue Jan 13, 2023 · 24 comments · Fixed by #181
Closed

Choose a PyPI package name that will avoid name conflicts #65

elsiehupp opened this issue Jan 13, 2023 · 24 comments · Fixed by #181
Labels
enhancement New feature or request

Comments

@elsiehupp
Copy link
Member

wikiteam3 originated as a Pull Request on the upstream WikiTeam repository, but it has taken its own direction, and it seems unlikely that it will merge back into the upstream project in the foreseeable future.

A while back I had an email conversation with Federico (@nemobis), where he said the following:

If you're mostly interested in providing dumpgenerator to the general public of "random users" wishing to download their preferred wiki for themselves, it's probably a waste of time to carry the excess baggage of the rest of WikiTeam, which is built for another primary purpose. It may be more effective to just run your fork separately, while dropping everything other than dumpgenerator, and maybe pick a more catchy name, like "MediaWiki exporter" or whatever conveys the purpose you have in mind. You might also be able to make some broader changes more directly, possibly by following a format similar to some other popular MediaWiki API clients like "mediawiki-utilities".

Because I think we should aim to publish this on PyPI, I agree that we should probably change the name of the project (i.e. the package name, the repo name, and all the references in the README and the code) in order to avoid confusion with the upstream WikiTeam project and to improve discoverability on GitHub and PyPI.

I should note that I do already have a dependency for wikiteam3 on PyPI, wikitools3. You can see the PyPI listing here. Any references to wikiteam3 in the README for wikitools3 should be updated to reflect any new name for this project, as well.

If it would help, I can add the collaborators from this repository as collaborators on the wikitools3 repository, as well, and if you think any functionality from wikiteam3 should be moved to wikitools3, that's something we could explore. (But I don't want to get ahead of myself here... I'm mainly just drawing attention to the existence of the dependency.)

What are all y'all's thoughts on the name for this project?

@robkam
Copy link
Member

robkam commented Jan 13, 2023

Wikiteam as the name was always a little confusing, MediaWiki exporter would make more sense.

There should be a file for what's changed and what's new since the fork.

@yzqzss
Copy link

yzqzss commented Jan 13, 2023

How about Wikisucker ?

I came up with the name because Jason Scott used to make a program called Podsucker to archive podcasts.
https://www.youtube.com/watch?v=T1h1bJZdqkg

@robkam
Copy link
Member

robkam commented Jan 13, 2023

I guess JS used Podsucker because it's amusing, in light of there being an insect with that name

@elsiehupp
Copy link
Member Author

Wikiteam as the name was always a little confusing, MediaWiki exporter would make more sense.

There should be a file for what's changed and what's new since the fork.

One issue with the name "MediaWiki exporter" is that it brings to mind a tool running server-side within MediaWiki itself (or at least on the same server), whereas this tool runs client-side and can create exports from arbitrary MediaWiki instances (even without admin access).

So perhaps something like "MediaWiki client exporter might make more sense?

With regard to PyPI naming, IIRC project names typically use hyphens, whereas package names typically use underscores, so you might have https://pypi.org/project/name-of-example-project/ but import name_of_example_project and, at the top of the README, # Name of Example Project, so any new name should have all three of these formats.

Here is a search for the term mediawiki on PyPI if you'd like to see what nomenclatures are popular.

I would also not be opposed to renaming wikitools3 in order to make the nomenclature consistent, but if we were to do so we should maintain a wrapper project at the wikitools3 namespace on PyPI so as not to abandon any downstream users.

If we come up with a name, and we're comfortable proceeding, one of us could park the name(s) on PyPI and add the others as admins or collaborators.

@randomnetcat
Copy link
Collaborator

I think the best verb for what it does is "scraping", which would give "mediawiki-scraper" or "mwscraper".

@elsiehupp
Copy link
Member Author

I think the best verb for what it does is "scraping", which would give "mediawiki-scraper" or "mwscraper".

Yes, that makes sense. For comparison, if I search on PyPI for mediawiki scraper, the results include:

  • mediawiki
  • MediaWiki-Tools
  • mediawiki-utilities
  • mediawiki-dump
  • mediawiki-parser
  • mhdscraper (probably the most similar to this project)

Based on the most popular name patterns, I feel like a PyPI name of mediawiki-scraper, a package name of mediawiki_scraper and a README title of # MediaWiki Scraper could work well.

I just went ahead and created a PyPI placeholder for mediawiki-scraper, and I added an API key for the PyPI repository to a new GitHub "environment" on this repository called "PyPI Upload".

@yzqzss—you mentioned automating builds in CI? Do you know what would be involved in setting up builds an PyPI uploads for this repository?

I'm going to leave this Issue open until we fully rename the project in the source code and README... would someone be up to taking care of that?

Thanks, everybody!

@robkam
Copy link
Member

robkam commented Jan 14, 2023

okay mediawiki-scraper

@robkam robkam added the enhancement New feature or request label Jan 14, 2023
@robkam
Copy link
Member

robkam commented Jan 14, 2023

a PyPI name of mediawiki-scraper, a package name of mediawiki_scraper and a README title of # MediaWiki Scraper

Also https://github.com/elsiehupp/wikiteam3/ to https://github.com/mediawiki-client-tools/mediawiki-scraper

@NyaMisty
Copy link

I still can't understand why we need to change the name. Instead, I believe it's one of the best names.

WikiTeam3 won't cause confusion

WikiTeam is very famous, but it's now inactive for quite some times, so WikiTeam3 won't be confusing at all.
We still share most command line parameters, aren't we?

Should we really rebrand?

Quite a lot of code of WikiTeam3 still comes from upstream WikiTeam. We actually only rearranged their code and did lots of improvements. Should we ignore this and throw away WikiTeam?

Are we really out of hope on merging?

WikiTeam is not alone and it's actually linked with Wikiapiary and InternetArchive. I believe merging or cooperating with wikiteam will make our code more valuable and useful.

@robkam
Copy link
Member

robkam commented Jan 14, 2023

We're outsiders and not members of WikiTeam.

@robkam
Copy link
Member

robkam commented Jan 14, 2023

Renamed in #86

@elsiehupp
Copy link
Member Author

elsiehupp commented Jan 14, 2023

@NyaMisty—the name change was specifically recommended by @nemobis, the maintainer at the upstream WikiTeam repository.

Probably the biggest difference between this project and the upstream project is that the upstream project includes a bunch of automations, essentially, that are not part of the dumpgenerator application, and the upstream repository is a bit of a grab bag of only loosely related items.

Renaming this project to mediawiki-scraper reflects refocusing this particular repository on the dumpgenerator application, though having the mediawiki-client-tools GitHub organization means that you or any other "Member" can spin up additional repositories to work on the other bits and bobs separately.

Personally I do think it would be nice if this project could get adopted as an official WikiTeam or ArchiveTeam "affiliate" or something, and migrate to the WikiTeam or ArchiveTeam GitHub organization at some point down the line, but eventually transferring this repository would not be incompatible with renaming it.

(As an aside: the ArchiveTeam GitHub organization is much, much larger than the WikiTeam one—618 repositories versus only 3—so it definitely seems more like a "big top" tent that could welcome additional projects such as this.)

@elsiehupp
Copy link
Member Author

I just checked, and, yes, the "organization" settings are currently configured to allow "members" to create additional public and private repositories in the organization.

@NyaMisty
Copy link

NyaMisty commented Jan 14, 2023

Well then I suggest we don't use - (dash or hyphen) in name, because that makes renaming quite difficult (Python does not allow those char in identifier)

@robkam
Copy link
Member

robkam commented Jan 15, 2023

Although they're stuck on Python 2 WikiTeam does offers support to the users:
Especially the wikiteam-discuss on Google Groups
Further documentation, although perhaps this should be in Readme.md
Twitter
Entry on Archive Team Wiki
Possibly through Archiveteam on Reddit

@elsiehupp
Copy link
Member Author

Well then I suggest we don't use - (dash or hyphen) in name, because that makes renaming quite difficult (Python does not allow those char in identifier)

It's quite common in my own experience that PyPI project names will have hyphens and the corresponding Python package names will have underscores. So, in this instance, the package name would be mediawiki_scraper.

For comparison, this package has the PyPI name mediawiki-api-wrapper, the package name mediawiki_api_wrapper, and the README title # MediaWiki API Wrapper (though there's actually a typo).

Again, this practice seems commonplace enough that it wouldn't confuse users, though I'm not sure if there's, like, a PEP describing it, or anything; I'm just describing it anecdotally here. ¯\_(ツ)_/¯

@nemobis
Copy link

nemobis commented Jan 15, 2023

Should we ignore this and throw away WikiTeam?

I just want to confirm that I'm sure WikiTeam doesn't have any objection either way. We've not discussed this as a group, but personally I recommend to do whatever you think is most helpful for the success of this fork.

We're outsiders and not members of WikiTeam.

If you feel that way I respect it, but personally I don't see you as an "outsider", Rob. I think I've known you for your MediaWiki contributions for well over a decade. Also didn't you contribute to some of the lists of wikis which WikiTeam used in the early days? Maybe I'm mixing up. :)

@robkam
Copy link
Member

robkam commented Jan 15, 2023

but personally I don't see you as an "outsider",

Hi Nemobis, thanks :-). Outsider because I use WikiTeam's dumpgenerator.py primarily for backing up a small selection of wikis of personal interest, (although I do post those on Archive.org) and not for preserving whole wiki farms. Yes I did suggest something for sources for WikiTeam's list of wikis :-). On the other hand the Save The Web project that fellow collaborator yzqzss is involved with, appears to have similar aims to WikiTeam.

@robkam
Copy link
Member

robkam commented Jan 15, 2023

From #86

I suggest we postpone the renaming? We can still discuss for a while as we keep refactor & improve this project to make it ready for publish.

If WikiTeam replaces dumpgenerator with Mediawiki Scraper / mwscraper the renaming will be less of an issue.

@NyaMisty
Copy link

Let's unify the discussion location. Where should we post our words? In this issue? in PR? or in the discussion?

@robkam
Copy link
Member

robkam commented Jan 15, 2023

PR and close the issue.

@elsiehupp elsiehupp changed the title New project name (to avoid confusion with upstream) Choose a PyPI package name that will avoid name conflicts Jan 15, 2023
@elsiehupp
Copy link
Member Author

I just renamed the issue to better reflect what I think the underlying issue is.

As a name idea that would (a) retain WikiTeam branding, and (b) avoid any potential PyPI name conflicts, what would all y'all think of wikiteam-scraper for the PyPI and GitHub names and wikiteam_scraper (or just scraper) for the module name?

(@nemobis—your opinion, as well, and thank you for responding!)

Regarding adoption of this tool for WikiTeam's purposes—publishing on PyPI should help facilitate this, no? And I'd be happy to transfer this repository as well as wikitools3 to the WikiTeam GitHub organization at some point if that would help, too.

(I mean, alternately, we could proceed with the Pull Request sooner rather than later, but it's my understanding that there's some benefit to having one PyPI package per repository, rather than glomming it all in one.)

@elsiehupp
Copy link
Member Author

Regarding, discussion, though... yes, it would probably be better to consolidate that on #86.

@robkam
Copy link
Member

robkam commented Jan 16, 2023

I've closed #86 because it's getting too far behind all the changes.

elsiehupp added a commit that referenced this issue Aug 29, 2023
)

Fixes
#65.

Addresses @yzqzss'
[comment](https://github.com/orgs/mediawiki-client-tools/discussions/61#discussioncomment-6831973):

> * `scraper` is an evil name. (for webmasters)

Uses similar naming to
[`mediawiki-dump`](https://github.com/macbre/mediawiki-dump), from one
of the past contributors to `wikitools`. (I'm not 100% sure, but this
might be a more modern replacement for `wikitools`... either way,
potentially someone to be friendly with!)

I already created [a placeholder on
PyPI](https://pypi.org/project/mediawiki-dump-generator/), and it seems
like we're like 99% of the way there to being able to publish there.

I can change the name of this repository to match the new name right
when I merge this.

Signed-off-by: Elsie Hupp <github@elsiehupp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
6 participants