tap-atinternet
is a Singer tap for the AT Internet Reporting API v3.
Built with the Meltano Tap SDK for Singer Taps.
You can install the Singer tap with:
pipx install git+https://github.com/GendarmerieNationale/tap-atinternet.git
Setting | Required | Default | Description |
---|---|---|---|
api_key | True | None | AT Internet API key |
secret_key | True | None | AT Internet secret key |
site_id | True | None | Site ID (can be queried at https://dataquery.atinternet-solutions.com/) |
start_date | True | None | Start syncing data from that date |
max_results | False | 5000 | Max number of results per page (up to 10000) |
filter_str | False | "" | If not empty, filter and extract only the pages with this string in the 'page_full_name' |
A full list of supported settings and capabilities for this tap is available by running:
tap-atinternet --about
It can be used with both FULL_TABLE
or INCREMENTAL
replication methods, to be specified in the tap metadata
(how to do it with meltano).
Note: date
as the replication_key
), the tap will use the last state value instead of the
start_date
specified in the config, and will extract data for that date even though it has already been extracted last time.
Example: The last state will typically something like
{
"bookmarks": {
"visits": {
"replication_key": "date",
"replication_key_value": "2022-04-05"
}
}
}
if last time you extracted data up to 2022-04-05 (included). Next time you call the tap in incremental mode
(and with the same job_id
if using meltano), the extraction will start from 2022-04-05 (included), which will
likely lead to duplicate records for this date.
This may be a problem with some targets (e.g. target-jsonl
, in this case see issue),
but if you are working with target-postgres
it should work fine. Thanks to the primary_keys
defined in streams.py
,
in case of duplicate records the target db will know when to insert a new record or simply update the previous one.
You can easily run tap-atinternet
by itself or in a pipeline using Meltano.
tap-atinternet --version
tap-atinternet --help
tap-atinternet --config CONFIG --discover > ./catalog.json
pipx install poetry
poetry install
Create tests within the tap_atinternet/tests
subfolder and
then run:
poetry run pytest
You can also test the tap-atinternet
CLI interface directly using poetry run
:
poetry run tap-atinternet --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Your project comes with a custom meltano.yml
project file already created. Open the meltano.yml
and follow any "TODO" items listed in
the file.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-atinternet
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-atinternet --version
# OR run a test `elt` pipeline:
meltano elt tap-atinternet target-jsonl
See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.
Gendarmerie Nationale, cyberimpact@gendarmerie.interieur.gouv.fr
This was developed as part of the CyberImpact EIG project.