Skip to content

Singer tap for extracting data from the AT Internet Reporting API v3

License

Notifications You must be signed in to change notification settings

GendarmerieNationale/tap-atinternet

Repository files navigation

tap-atinternet

tap-atinternet is a Singer tap for the AT Internet Reporting API v3.

Built with the Meltano Tap SDK for Singer Taps.

Installation

You can install the Singer tap with:

pipx install git+https://github.com/GendarmerieNationale/tap-atinternet.git

Configuration

Setting Required Default Description
api_key True None AT Internet API key
secret_key True None AT Internet secret key
site_id True None Site ID (can be queried at https://dataquery.atinternet-solutions.com/)
start_date True None Start syncing data from that date
max_results False 5000 Max number of results per page (up to 10000)
filter_str False "" If not empty, filter and extract only the pages with this string in the 'page_full_name'

A full list of supported settings and capabilities for this tap is available by running:

tap-atinternet --about

It can be used with both FULL_TABLE or INCREMENTAL replication methods, to be specified in the tap metadata (how to do it with meltano).

Note: ⚠️ in incremental mode, the tap may produce duplicate records. With incremental sync (and date as the replication_key), the tap will use the last state value instead of the start_date specified in the config, and will extract data for that date even though it has already been extracted last time.

Example: The last state will typically something like

{
  "bookmarks": {
    "visits": {
      "replication_key": "date",
      "replication_key_value": "2022-04-05"
    }
  }
}

if last time you extracted data up to 2022-04-05 (included). Next time you call the tap in incremental mode (and with the same job_id if using meltano), the extraction will start from 2022-04-05 (included), which will likely lead to duplicate records for this date.

This may be a problem with some targets (e.g. target-jsonl, in this case see issue), but if you are working with target-postgres it should work fine. Thanks to the primary_keys defined in streams.py, in case of duplicate records the target db will know when to insert a new record or simply update the previous one.

Usage

You can easily run tap-atinternet by itself or in a pipeline using Meltano.

Executing the Tap Directly

tap-atinternet --version
tap-atinternet --help
tap-atinternet --config CONFIG --discover > ./catalog.json

Developer Resources

Initialize your Development Environment

pipx install poetry
poetry install

Create and Run Tests

Create tests within the tap_atinternet/tests subfolder and then run:

poetry run pytest

You can also test the tap-atinternet CLI interface directly using poetry run:

poetry run tap-atinternet --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Your project comes with a custom meltano.yml project file already created. Open the meltano.yml and follow any "TODO" items listed in the file.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-atinternet
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-atinternet --version
# OR run a test `elt` pipeline:
meltano elt tap-atinternet target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.

Contact

Gendarmerie Nationale, cyberimpact@gendarmerie.interieur.gouv.fr

This was developed as part of the CyberImpact EIG project.

About

Singer tap for extracting data from the AT Internet Reporting API v3

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages