From 490af55dabe501422b534912750eee007654965d Mon Sep 17 00:00:00 2001 From: Elsie Hupp Date: Tue, 29 Aug 2023 16:04:02 -0400 Subject: [PATCH] Change name from `mediawiki-scraper` to `mediawiki-dump-generator` (#181) Fixes https://github.com/mediawiki-client-tools/mediawiki-scraper/issues/65. Addresses @yzqzss' [comment](https://github.com/orgs/mediawiki-client-tools/discussions/61#discussioncomment-6831973): > * `scraper` is an evil name. (for webmasters) Uses similar naming to [`mediawiki-dump`](https://github.com/macbre/mediawiki-dump), from one of the past contributors to `wikitools`. (I'm not 100% sure, but this might be a more modern replacement for `wikitools`... either way, potentially someone to be friendly with!) I already created [a placeholder on PyPI](https://pypi.org/project/mediawiki-dump-generator/), and it seems like we're like 99% of the way there to being able to publish there. I can change the name of this repository to match the new name right when I merge this. Signed-off-by: Elsie Hupp --- .github/ISSUE_TEMPLATE/bug_report.md | 2 +- .github/ISSUE_TEMPLATE/config.yml | 2 +- CONTRIBUTING.md | 6 +-- README.md | 46 ++++++++++----------- wikiteam3/dumpgenerator/cli/greeter.py | 2 +- wikiteam3/dumpgenerator/dump/image/image.py | 2 +- wikiteam3/uploader.py | 6 +-- 7 files changed, 33 insertions(+), 33 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index fdb97b62..3a989958 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -7,7 +7,7 @@ assignees: '' --- - + diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml index 6bdb958e..ef844dd9 100644 --- a/.github/ISSUE_TEMPLATE/config.yml +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -1,6 +1,6 @@ blank_issues_enabled: false contact_links: - - name: Get help using MediaWiki Scraper + - name: Get help using MediaWiki Dump Generator url: https://github.com/orgs/mediawiki-client-tools/discussions/categories/q-a about: If you need help (other than reporting a bug), you can reach out on our Discussions Q&A. - name: Anything else diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9a1ebafc..e8492dd1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,7 +6,7 @@ This document is an ongoing process for establishing and refining a set of best ## Reporting Issues -If you find anything amiss, you can report it using [GitHub Issues](https://github.com/mediawiki-client-tools/mediawiki-scraper/issues). The template is there to help you communicate clearly. It's okay if you change it to meet your needs, though, as it is merely a suggested baseline. +If you find anything amiss, you can report it using [GitHub Issues](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues). The template is there to help you communicate clearly. It's okay if you change it to meet your needs, though, as it is merely a suggested baseline. For anything that doesn't fit, you can open a less formal conversation in [GitHub Discussions](https://github.com/orgs/mediawiki-client-tools/discussions) and feel free to tag any of the members of our GitHub organization. @@ -32,13 +32,13 @@ In addition to the tools listed in the basic installation instructions in the ma ### 1. Fork the repository if you don't have write access -You can do so [here](https://github.com/mediawiki-client-tools/mediawiki-scraper/fork). +You can do so [here](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/fork). ### 2. Clone the repository (or your fork) if you'd like to work on it locally (such as in VS Code) This is particularly important if you are contributing executible code, so that you can use "code intelligence" and test your work. You can clone the repository using the big green **Code** button on the homepage of the repository (or your fork). -Alternately, you can [create a codespace](https://github.com/mediawiki-client-tools/mediawiki-scraper/codespaces) (also from the big green **Code** button), though we have yet to set up a consistent development container. +Alternately, you can [create a codespace](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/codespaces) (also from the big green **Code** button), though we have yet to set up a consistent development container. ### 3. Create a new branch for the changes you'd like to make diff --git a/README.md b/README.md index 0786bdbd..e0eda45b 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@ -# `MediaWiki Scraper` +# `MediaWiki Dump Generator` -**MediaWiki Scraper can archive wikis from the largest to the tiniest.** +**MediaWiki Dump Generator can archive wikis from the largest to the tiniest.** -`MediaWiki Scraper` is an ongoing project to port the legacy [`wikiteam`](https://github.com/WikiTeam/wikiteam) toolset to Python 3 and PyPI to make it more accessible for today's archivers. +`MediaWiki Dump Generator` is an ongoing project to port the legacy [`wikiteam`](https://github.com/WikiTeam/wikiteam) toolset to Python 3 and PyPI to make it more accessible for today's archivers. Most of the focus has been on the core `dumpgenerator` tool, but Python 3 versions of the other `wikiteam` tools may be added over time. -## MediaWiki Scraper Toolset +## MediaWiki Dump Generator Toolset -MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpose module of MediaWiki Scraper is dumpgenerator, which can download XML dumps of MediaWiki sites that can then be parsed or redeployed elsewhere. +MediaWiki Dump Generator is a set of tools for archiving wikis. The main general-purpose module of MediaWiki Dump Generator is dumpgenerator, which can download XML dumps of MediaWiki sites that can then be parsed or redeployed elsewhere. ### Viewing MediaWiki XML Dumps @@ -17,18 +17,18 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos ## Python Environment -`MediaWiki Scraper` requires [Python 3.8](https://www.python.org/downloads/release/python-380/) or later (less than 4.0), but you may be able to get it run with earlier versions of Python 3. On recent versions of Linux and macOS Python 3.8 should come preinstalled, but on Windows you will need to install it from [python.org](https://www.python.org/downloads/release/python-380/). +`MediaWiki Dump Generator` requires [Python 3.8](https://www.python.org/downloads/release/python-380/) or later (less than 4.0), but you may be able to get it run with earlier versions of Python 3. On recent versions of Linux and macOS Python 3.8 should come preinstalled, but on Windows you will need to install it from [python.org](https://www.python.org/downloads/release/python-380/). -`MediaWiki Scraper` has been tested on Linux, macOS, Windows and Android. If you are connecting to Linux or macOS via `ssh`, you can continue using the `bash` or `zsh` command prompt in the same terminal, but if you are starting in a desktop environment and don't already have a preferred Terminal environment you can try one of the following. +`MediaWiki Dump Generator` has been tested on Linux, macOS, Windows and Android. If you are connecting to Linux or macOS via `ssh`, you can continue using the `bash` or `zsh` command prompt in the same terminal, but if you are starting in a desktop environment and don't already have a preferred Terminal environment you can try one of the following. -> **NOTE:** You may need to update and pre-install dependencies in order for `MediaWiki Scraper` to work properly. Shell commands for these dependencies appear below each item in the list. (Also note that while installing and running `MediaWiki Scraper` itself should not require administrative priviliges, installing dependencies usually will.) +> **NOTE:** You may need to update and pre-install dependencies in order for `MediaWiki Dump Generator` to work properly. Shell commands for these dependencies appear below each item in the list. (Also note that while installing and running `MediaWiki Dump Generator` itself should not require administrative priviliges, installing dependencies usually will.) * On desktop Linux you can use the default terminal application such as [Konsole](https://konsole.kde.org/) or [GNOME Terminal](https://help.gnome.org/users/gnome-terminal/stable/).
Linux Dependencies - While most Linux distributions will have Python 3 preinstalled, if you are cloning `MediaWiki Scraper` rather than downloading it directly you may need to install `git`. + While most Linux distributions will have Python 3 preinstalled, if you are cloning `MediaWiki Dump Generator` rather than downloading it directly you may need to install `git`. On Debian, Ubuntu, and the like: @@ -45,7 +45,7 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos
macOS Dependencies - While macOS will have Python 3 preinstalled, if you are cloning `MediaWiki Scraper` rather than downloading it directly and you are using an older versions of macOS, you may need to install `git`. + While macOS will have Python 3 preinstalled, if you are cloning `MediaWiki Dump Generator` rather than downloading it directly and you are using an older versions of macOS, you may need to install `git`. If `git` is not preinstalled, however, macOS will prompt you to install it the first time you run the command. Therefore, to check whether you have `git` installed or to install `git`, simply run `git` (with no arguments) in Terminal: @@ -68,9 +68,9 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos > When installing [Python 3.8](https://www.python.org/downloads/release/python-380/) (from python.org), be sure to check "Add Python to PATH" so that installed Python scripts are accessible from any location. If for some reason installed Python scripts, e.g. `pip`, are not available from any location, you can add Python to the `PATH` environment variable using the instructions [here](https://datatofish.com/add-python-to-windows-path/). > - > And while doing so should not be necessary if you follow the instructions further down and install `MediaWiki Scraper` using `pip`, if you'd prefer that Windows store installed Python scripts somewhere other than the default Python folder under `%appdata%`, you can also add your preferred alternative path such as `C:\Program Files\Python3\Scripts\` or a subfolder of `My Documents`. (You will need to restart any terminal sessions in order for this to take effect.) + > And while doing so should not be necessary if you follow the instructions further down and install `MediaWiki Dump Generator` using `pip`, if you'd prefer that Windows store installed Python scripts somewhere other than the default Python folder under `%appdata%`, you can also add your preferred alternative path such as `C:\Program Files\Python3\Scripts\` or a subfolder of `My Documents`. (You will need to restart any terminal sessions in order for this to take effect.) - Whenever you'd like to run a Bash session, you can open a Bash terminal prompt from any folder in Windows Explorer by right-clicking and choosing the option from the context menu. (For some purposes you may wish to run Bash as an administrator.) This way you can open a Bash prompt and clone the `MediaWiki Scraper` repository in one location, and subsequently or later open another Bash prompt and run `MediaWiki Scraper` to dump a wiki wherever else you'd like without having to browse to the directory manually using Bash. + Whenever you'd like to run a Bash session, you can open a Bash terminal prompt from any folder in Windows Explorer by right-clicking and choosing the option from the context menu. (For some purposes you may wish to run Bash as an administrator.) This way you can open a Bash prompt and clone the `MediaWiki Dump Generator` repository in one location, and subsequently or later open another Bash prompt and run `MediaWiki Dump Generator` to dump a wiki wherever else you'd like without having to browse to the directory manually using Bash.
@@ -102,22 +102,22 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos The Python 3 port of the `dumpgenerator` module of `wikiteam3` is largely functional and can be installed from a downloaded or cloned copy of this repository. -> If you run into a problem with the version that mostly works, you can [open an Issue](https://github.com/mediawiki-client-tools/mediawiki-scraper/issues/new/choose). Be sure to include the following: +> If you run into a problem with the version that mostly works, you can [open an Issue](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues/new/choose). Be sure to include the following: > > 1. The operating system you're using > 2. What command you ran that didn't work > 3. What output was printed to your terminal -### 1. Downloading and installing `MediaWiki Scraper` +### 1. Downloading and installing `MediaWiki Dump Generator` In whatever folder you use for cloned repositories: ```bash -git clone https://github.com/mediawiki-client-tools/mediawiki-scraper +git clone https://github.com/mediawiki-client-tools/mediawiki-dump-generator ``` ```bash -cd mediawiki-scraper +cd mediawiki-dump-generator ``` ```bash @@ -158,12 +158,12 @@ pip uninstall wikiteam3 ``` ```bash -rm -fr [cloned_MediaWiki Scraper_folder] +rm -fr [cloned_mediawiki_scraper_folder] ``` -### 4. Updating MediaWiki Scraper +### 4. Updating MediaWiki Dump Generator -> **Note:** Re-run the following steps each time to reinstall each time the MediaWiki Scraper branch is updated. +> **Note:** Re-run the following steps each time to reinstall each time the MediaWiki Dump Generator branch is updated. ```bash git pull @@ -194,9 +194,9 @@ pip install --force-reinstall (Get-ChildItem .\dist\*.whl).FullName
-### 5. Manually build and install `MediaWiki Scraper` +### 5. Manually build and install `MediaWiki Dump Generator` -If you'd like to manually build and install `MediaWiki Scraper` from a cloned or downloaded copy of this repository, run the following commands from the downloaded base directory: +If you'd like to manually build and install `MediaWiki Dump Generator` from a cloned or downloaded copy of this repository, run the following commands from the downloaded base directory: ```bash curl -sSL https://install.python-poetry.org | python3 - @@ -243,7 +243,7 @@ git checkout --track origin/python3 ## Using `dumpgenerator` (once installed) -After installing `MediaWiki Scraper` using `pip` you should be able to use the `dumpgenerator` command from any local directory. +After installing `MediaWiki Dump Generator` using `pip` you should be able to use the `dumpgenerator` command from any local directory. For basic usage, you can run `dumpgenerator` in the directory where you'd like the download to be. @@ -385,5 +385,5 @@ You can contact Elsie Hupp directly via email at [mediawiki-client-tools@elsiehu **WikiTeam** is the [Archive Team](http://www.archiveteam.org) [[GitHub](https://github.com/ArchiveTeam)] subcommittee on wikis. It was founded and originally developed by [Emilio J. Rodríguez-Posada](https://github.com/emijrp), a Wikipedia veteran editor and amateur archivist. Thanks to people who have helped, especially to: [Federico Leva](https://github.com/nemobis), [Alex Buie](https://github.com/ab2525), [Scott Boyd](http://www.sdboyd56.com), [Hydriz](https://github.com/Hydriz), Platonides, Ian McEwen, [Mike Dupont](https://github.com/h4ck3rm1k3), [balr0g](https://github.com/balr0g) and [PiRSquared17](https://github.com/PiRSquared17). -**MediaWiki Scraper** +**MediaWiki Dump Generator** The Python 3 initiative is currently being led by [Elsie Hupp](https://github.com/elsiehupp), with contributions from [Victor Gambier](https://github.com/vgambier), [Thomas Karcher](https://github.com/t-karcher), [Janet Cobb](https://github.com/randomnetcat), [yzqzss](https://github.com/yzqzss), [NyaMisty](https://github.com/NyaMisty) and [Rob Kam](https://github.com/robkam) diff --git a/wikiteam3/dumpgenerator/cli/greeter.py b/wikiteam3/dumpgenerator/cli/greeter.py index e081073c..c59af11b 100644 --- a/wikiteam3/dumpgenerator/cli/greeter.py +++ b/wikiteam3/dumpgenerator/cli/greeter.py @@ -53,7 +53,7 @@ def bye(): print("---> Congratulations! Your dump is complete <---") print("") print("If you encountered a bug, you can report it on GitHub Issues:") - print(" https://github.com/mediawiki-client-tools/mediawiki-scraper/issues") + print(" https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues") print("") print("If you need any other help, you can reach out on GitHub Discussions:") print(" https://github.com/orgs/mediawiki-client-tools/discussions") diff --git a/wikiteam3/dumpgenerator/dump/image/image.py b/wikiteam3/dumpgenerator/dump/image/image.py index 4d58d881..b79e9ebb 100644 --- a/wikiteam3/dumpgenerator/dump/image/image.py +++ b/wikiteam3/dumpgenerator/dump/image/image.py @@ -425,7 +425,7 @@ def getImageNamesAPI(config: Config = None, session: requests.Session = None): ) if "%u" in filename: raise NotImplementedError( - f"Filename {filename} contains unicode. Please file an issue with MediaWiki Scraper." + f"Filename {filename} contains unicode. Please file an issue with MediaWiki Dump Generator." ) uploader = re.sub("_", " ", image.get("user", "Unknown")) size = image.get("size", "False") diff --git a/wikiteam3/uploader.py b/wikiteam3/uploader.py index 225b1dee..57ab0fed 100644 --- a/wikiteam3/uploader.py +++ b/wikiteam3/uploader.py @@ -266,15 +266,15 @@ def upload(wikis, logfile, config={}, uploadeddumps=[]): # retrieve some info from the wiki wikititle = "Wiki - %s" % (sitename) # Wiki - ECGpedia wikidesc = ( - '%s dumped with MediaWiki-Scraper (aka WikiTeam3) tools.' + '%s dumped with MediaWiki Dump Generator (aka WikiTeam3) tools.' % (baseurl, sitename) ) # "ECGpedia,: a free electrocardiography (ECG) tutorial and textbook to which anyone can contribute, designed for medical professionals such as cardiac care nurses and physicians. Dumped with WikiTeam tools." wikikeys = [ "wiki", "wikiteam", "wikiteam3", - "mediawiki-scraper", - "mediawikiScraper", + "mediawiki-dump-generator", + "MediaWikiDumpGenerator", "MediaWiki", sitename, wikiname,