diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..4ded17a --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2024 Wang Hui + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..21f760b --- /dev/null +++ b/README.md @@ -0,0 +1,70 @@ +# Hot Hub Scraper + +Hot Hub Scraper is a Node.js application that scrapes hot topics from Weibo and stores them in a PostgreSQL database. + +## UI + +[hot-hub-web](https://github.com/w4n9hu1/hot-hub-web) + +## Installation + +1. Clone the repository: + + ```sh + git clone https://github.com/yourusername/hot-hub-scraper.git + cd hot-hub-scraper + ``` + +2. Install dependencies: + + ```sh + npm ci + ``` + +3. Install Playwright browsers: + + ```sh + npx playwright install --with-deps + ``` + +## Usage + +1. Create a `.env` file based on the `.env.sample` file: + + ```sh + cp .env.sample .env + ``` + +2. Update the `.env` file with your PostgreSQL database URL and Weibo URL. + +3. Run the scraper: + + ```sh + npm start + ``` + +4. Seed the database with historical data: + + ```sh + npm run seed + ``` + +## Database Schema + +The database schema is defined in the [`scripts/wb_hot.sql`](scripts/wb_hot.sql) file: + +```sql +CREATE TABLE IF NOT EXISTS wb_hot ( + id SERIAL PRIMARY KEY, + rank INT NOT NULL, + title VARCHAR(255) NOT NULL, + hot INT NOT NULL, + tag VARCHAR(10), + icon VARCHAR(10), + created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP +); +``` + +## Contributing + +Contributions are welcome! Please open an issue or submit a pull request. \ No newline at end of file