Skip to content

ceroberoz/id-jobs

Repository files navigation

id-jobs: Indonesian Job Market Data Aggregator 💼🇮🇩

Daily Job Data Update License: GPL v3 Python 3.12+ Powered by Scrapy Enhanced by Playwright

🆕 Latest Updates

  • Added TechInAsia spider to collect job data from Tech in Asia Jobs portal
  • Implemented Algolia API integration for efficient data retrieval from TechInAsia
  • Enhanced data sanitization to ensure CSV-friendly output
  • Improved error handling and logging for the new spider
  • Updated documentation to reflect the addition of TechInAsia as a data source

📊 Overview

id-jobs collects job listings from Indonesian job portals and company websites, respecting each site's terms of service.

View the Data on Google Sheets: https://s.id/id-jobs-v2

View the Dashboard on LookerStudio by Google: https://s.id/id-jobs-dashboard

🎨 Job Age Colors

Age Time Color
New ≤ 1 day #00CC00 Bright Green
Hot 1-7 days #FF6600 Bright Orange
Recent 8-15 days #FFFF00 Bright Yellow
Aging 16-21 days #E6E6E6 Light Gray
Old 22-30 days #CCCCCC Medium Gray
Expired > 30 days #B3B3B3 Dark Gray

🔧 How It Works

id-jobs automatically collects job data from various websites, cleans the information, and compiles it into a single spreadsheet. We use Scrapy for most sites and Playwright for sites with complex JavaScript rendering.

Scraping Process

👀 Preview

id-jobs Preview

🌟 Why Use id-jobs?

id-jobs simplifies job searching by gathering information from multiple sources into one place, providing insights on work arrangements, job levels, and application deadlines.

📚 Data Sources

We collect data from various job portals and company websites, including: Blibli, Dealls, Evermos, Flip, GoTo, Glints (Lite), Jobstreet, Kalibrr, Karir.com, Kredivo, Mekari, SoftwareOne, Tiket, Tech in Asia Jobs, and more.

🚀 Features

  • Daily updates
  • Work arrangement identification
  • Job level detection
  • Application deadline calculation
  • Improved data accuracy
  • User-friendly Google Sheets interface
  • Job age tracking
  • JavaScript-rendered content handling with Playwright
  • Efficient pagination across multiple pages
  • Integration with Algolia API for improved data retrieval

🏁 Getting Started

For a quick guide, see our Quickstart Guide.

❓ FAQ

Check our FAQ for common questions.

📄 License

id-jobs is open source under the GPL-3.0 license. You can use, modify, and share the code, as long as you keep it open source.

We respect website terms of service when collecting data.