Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous improvements for 2025-W02 #218

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
61 changes: 61 additions & 0 deletions doc/sources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,36 @@ SDMX-JSON —
.. autoclass:: sdmx.source.abs_json.Source()
:members:

.. _AR1:

``AR1``: National Institute of Statistics and Censuses (Argentina)
------------------------------------------------------------------

SDMX-ML — `Website <https://sdds.indec.gob.ar/nsdp.htm>`__

- Spanish name: Instituto Nacional de Estadística y Censos

This source does not provide an actual SDMX-REST web service.
Instead, a set of SDMX-ML 2.1 files with data messages only (no structure or metadata) are available at URLs with the form: ``https://sdds.indec.gob.ar/files/data/IND.XML``.
These can be used with :class:`Client` by:

- Using ``https://sdds.indec.gob.ar/files/`` as the base URL.
- Accessing only the :attr:`.Resource.data` endpoint, which gives the ``…/data/…`` URL component.
- Treating ``IND.XML`` (in reality, a file name with suffix) as the resource ID.
- Using no query key or parameters.

.. code-block:: python

c = sdmx.Client("AR1")
# The URL https://sdds.indec.gob.ar/files/data/IND.XML
dm = c.data("IND.XML")

This is the same as using a non-source-specific Client to query the URL directly:

.. code-block:: python

c = sdmx.Client()
dm = c.get(url="https://sdds.indec.gob.ar/files/data/IND.XML")

.. _BBK:

Expand Down Expand Up @@ -425,6 +455,26 @@ API documentation `(en) <https://www.stat.ee/sites/default/files/2020-09/API-ins
- As of 2020-12-13, this web service (like NBB) uses server software that serves SDMX-JSON or SDMX-ML 2.0.
The latter is not supported by :mod:`sdmx` (see :ref:`sdmx-version-policy`).

.. _StatCan:

``StatCan``: Statistics Canada
------------------------------

SDMX-ML —
API documentation `(en) <https://www.statcan.gc.ca/en/developers/sdmx/user-guide>`__,
`(fr) <https://www.statcan.gc.ca/fr/developpeurs/sdmx/guide-sdmx>`__.

- The source only provides a SDMX-REST API for the ``/data/`` endpoint.
- Some structural artefacts are available, but not through an SDMX-REST API.
Instead, a set of SDMX-ML 2.1 files with structure messages are available at URLs with the form: ``https://www150.statcan.gc.ca/t1/wds/sdmx/statcan/rest/structure/Data_Structure_17100005``.
(Note that this lacks the URL path components for the agency ID and version, which would resemble ``…/structure/StatCan/Data_Structure_17100005/latest``.)

These can be queried directly using any Client:

.. code-block:: python

c = sdmx.Client("StatCan") # or sdmx.Client()
dm = c.get(url="https://www150.statcan.gc.ca/t1/wds/sdmx/statcan/rest/structure/Data_Structure_17100005")

.. _UNESCO:

Expand Down Expand Up @@ -503,6 +553,17 @@ SDMX-ML —

- Supports preview_data and series-key based key validation.

.. _UY100:

``UY100``: Labour Market Information System (Uruguay)
-----------------------------------------------------

SDMX-ML —
Website `(en) <https://de-mtss.simel.mtss.gub.uy/?lc=en>`__,
`(es) <https://de-mtss.simel.mtss.gub.uy>`__.

- Spanish name: Sistema de Información de Mercado Laboral
- Operated by the Ministry of Labour and Social Security of (Ministerio de Trabajo y Seguridad Social, MTSS), the National Institute of Statistics (Instituto Nacional de Estadística, INE) and the Social Security Bank (Banco de Previsión Social, BPS) of Uruguay.

.. _WB:

Expand Down
5 changes: 4 additions & 1 deletion doc/whatsnew.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,14 @@
What's new?
***********

.. _2.20.1:
.. _2.21.0:

Next release
============

- Add :ref:`AR1 <AR1>`, :ref:`StatCan <StatCan>`, and :ref:`UY110 <UY110>` data sources (:pull:`218`, :issue:`186`, :issue:`187`, :issue:`188`).
- New function :func:`.get_source` for case-insensitive lookup of sources (:pull:`218`).
:class:`.Client` will handle, for instance, :py:`Client("wb")` the same as :py:`Client("WB")` and log a message about the difference.
- Simplify :class:`.Session` via direct inheritance from :class:`.requests_cache.session.CacheMixin`, where installed (:pull:`217`).
- Add an optional :py:`session=...` keyword argument to :class:`.Client` (:pull:`217`).
- Improve :ref:`network and offline tests <test-network>` via new and improved test utilities (:pull:`217`).
Expand Down
4 changes: 2 additions & 2 deletions sdmx/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from sdmx.reader import get_reader
from sdmx.rest import Resource
from sdmx.session import ResponseIO, Session
from sdmx.source import NoSource, list_sources, sources
from sdmx.source import NoSource, get_source, list_sources

if TYPE_CHECKING:
import io
Expand Down Expand Up @@ -79,7 +79,7 @@ def __init__(
**session_opts,
):
try:
self.source = sources[source.upper()] if source else NoSource
self.source = get_source(source) if source else NoSource
except KeyError:
raise ValueError(
f"source must be None or one of: {' '.join(list_sources())}"
Expand Down
90 changes: 52 additions & 38 deletions sdmx/format/xml/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from operator import itemgetter
from pathlib import Path
from shutil import copytree
from typing import IO, Iterable, Mapping, Optional, Union
from typing import IO, Iterable, Mapping, Optional, Union, cast

from lxml import etree
from lxml.etree import QName
Expand Down Expand Up @@ -99,61 +99,75 @@
msg: Union[Path, IO],
schema_dir: Optional[Path] = None,
version: Union[str, Version] = Version["2.1"],
max_errors: int = -1,
) -> bool:
"""Validate and SDMX message against the XML Schema (XSD) documents.
"""Validate SDMX-ML in `msg` against the XML Schema (XSD) documents.

The XML Schemas must first be installed or validation will fail. See
:func:`sdmx.install_schemas` to download the schema files.
A log message with level :data:`logging.ERROR` is emitted if validation fails. This
indicates the first (possibly not only) element in `msg` that is not valid per the
schemas.

Parameters
----------
msg
A SDMX-ML Message formatted XML file.
Path or io-like containing an SDMX-ML message.
schema_dir
The directory to XSD schemas used to validate the message.
Directory with SDMX-ML XSD schemas used to validate the message.
version
The SDMX-ML schema version to validate against. One of ``2.1`` or ``3.0``.
max_errors
Maximum number of messages to log on validation failure.

Returns
-------
bool
True if validation passed. False otherwise.
:any:`True` if validation passed, otherwise :any:`False`.

Raises
------
FileNotFoundError
if `schema_dir` (or a subdirectory) does not contain :file:`SDMXMessage.xsd`.
Use :func:`sdmx.install_schemas` to download the schema files.
NotImplementedError
if `msg` contains valid XML, but with a root element that is not part of the
SDMX-ML standard.
"""
schema_dir, version = _handle_validate_args(schema_dir, version)

msg_doc = etree.parse(msg)
# Find SDMXMessage.xsd in `schema_dir` or a subdirectory
for candidate in schema_dir, schema_dir.joinpath(version.name):
try:
# Turn the XSD into a schema object
xml_schema = etree.XMLSchema(file=candidate.joinpath("SDMXMessage.xsd"))
break
except Exception:
xml_schema = None

# Make sure the message is a supported type
supported_elements = [
"CodelistQuery",
"DataStructureQuery",
"GenericData",
"GenericMetadata",
"GenericTimeSeriesData",
"MetadataStructureQuery",
"Structure",
"StructureSpecificData",
"StructureSpecificMetadata",
"StructureSpecificTimeSeriesData",
]
root_elem_name = msg_doc.docinfo.root_name
if root_elem_name not in supported_elements:
raise NotImplementedError

message_xsd = schema_dir.joinpath("SDMXMessage.xsd")
if not message_xsd.exists():
raise ValueError(f"Could not find XSD files in {schema_dir}")

# Turn the XSD into a schema object
xml_schema_doc = etree.parse(message_xsd)
xml_schema = etree.XMLSchema(xml_schema_doc)
if xml_schema is None:
raise FileNotFoundError(f"Could not find XSD files in {schema_dir}")

try:
xml_schema.assertValid(msg_doc)
except etree.DocumentInvalid as err:
log.error(err)
finally:
return xml_schema.validate(msg_doc)
# Parse the given document
msg_doc = etree.parse(msg)

if not xml_schema.validate(msg_doc):
for i, entry in enumerate(
cast(Iterable["etree._LogEntry"], xml_schema.error_log)
):
if (
i == 0
and "No matching global declaration available for the validation root"
in entry.message
):
raise NotImplementedError(
f"Validate non-SDMX root element <{msg_doc.getroot().tag}>"
) from None
elif i == max_errors:
break

Check warning on line 165 in sdmx/format/xml/common.py

View check run for this annotation

Codecov / codecov/patch

sdmx/format/xml/common.py#L165

Added line #L165 was not covered by tests
log.log(getattr(logging, entry.level_name), entry.message)

return False
else:
return True


def _extracted_zipball(version: Version) -> Path:
Expand Down
5 changes: 2 additions & 3 deletions sdmx/reader/xml/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,8 @@ def __init__(self, reader, elem, cls_hint=None):
# class above
target_cls = cls_hint

# DEBUG
# if target_cls is None:
# print(f"{info = }")
if target_cls is None:
raise ValueError(f"Unable to determine target class for {info}", info)

self.maintainable = issubclass(target_cls, common.MaintainableArtefact)

Expand Down
33 changes: 26 additions & 7 deletions sdmx/reader/xml/v21.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,10 +331,11 @@
@end(
"""
com:AnnotationTitle com:AnnotationType com:AnnotationURL com:None com:URN com:Value
mes:DataSetAction :ReportPeriod md:ReportPeriod mes:DataSetID mes:Email mes:ID
mes:Test mes:Timezone str:CodelistAliasRef str:DataType str:Email str:Expression
str:NullValue str:OperatorDefinition str:PersonalisedName str:Result
str:RulesetDefinition str:Telephone str:URI str:VtlDefaultName str:VtlScalarType
mes:DataSetAction :ReportPeriod md:ReportPeriod mes:DataSetID mes:Email mes:Fax
mes:ID mes:Telephone mes:Test mes:Timezone mes:URI mes:X400 str:CodelistAliasRef
str:DataType str:Email str:Expression str:NullValue str:OperatorDefinition
str:PersonalisedName str:Result str:RulesetDefinition str:Telephone str:URI
str:VtlDefaultName str:VtlScalarType
"""
)
def _text(reader, elem):
Expand Down Expand Up @@ -403,7 +404,23 @@
# In a StructureMessage
cls_hint = reader.model.DataStructureDefinition

reader.push(QName(elem).localname, reader.reference(elem, cls_hint))
try:
ref = reader.reference(elem, cls_hint)
except ValueError as e:
# Handle references to known non-standard classes; see
# https://github.com/khaeru/sdmx/issues/180
info = e.args[1]
if info["package"] == "publicationtable":
log.warning(
"Cannot resolve reference to non-SDMX class "
f"'{info['package']}.{info['class']}'"
)
# Push the dict of reference info, in case the user wants to make use of it
ref = info
else:
raise

Check warning on line 421 in sdmx/reader/xml/v21.py

View check run for this annotation

Codecov / codecov/patch

sdmx/reader/xml/v21.py#L421

Added line #L421 was not covered by tests

reader.push(QName(elem).localname, ref)


@end("com:Annotation")
Expand Down Expand Up @@ -745,10 +762,12 @@

@end("mes:Contact str:Contact", only=False)
def _contact_end(reader, elem):
contact = model.Contact(
contact = common.Contact(
email=reader.pop_all("Email"),
fax=reader.pop_all("Fax"),
telephone=reader.pop_single("Telephone"),
uri=reader.pop_all("URI"),
email=reader.pop_all("Email"),
x400=reader.pop_all("X400"),
)

add_localizations(contact.name, reader.pop_all("Name"))
Expand Down
30 changes: 21 additions & 9 deletions sdmx/source/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import importlib.resources
import json
import logging
import re
from dataclasses import dataclass, field
from enum import Enum
from importlib import import_module
Expand All @@ -15,6 +17,8 @@
if TYPE_CHECKING:
import sdmx.rest.common

log = logging.getLogger(__name__)

#: Data sources registered with :mod:`sdmx`.
sources: dict[str, "Source"] = {}

Expand Down Expand Up @@ -250,6 +254,22 @@
sources[id] = SourceClass(**_info)


def get_source(id: str) -> Source:
"""Return the Source with the given `id`.

`id` is matched case-insensitively.
"""
try:
return sources[id]
except KeyError:
# Try to find a case-insensitive match
for k, v in sources.items():
if re.match(k, id, flags=re.IGNORECASE):
log.info(f"Return {v} as a case-insensitive match for source id={id!r}")
return v

Check warning on line 269 in sdmx/source/__init__.py

View check run for this annotation

Codecov / codecov/patch

sdmx/source/__init__.py#L268-L269

Added lines #L268 - L269 were not covered by tests
raise


def list_sources():
"""Return a sorted list of valid source IDs.

Expand All @@ -260,15 +280,7 @@

def load_package_sources():
"""Discover all sources listed in :file:`sources.json`."""
try:
ref = importlib.resources.files("sdmx").joinpath("sources.json")
except AttributeError: # Python <3.9
from copy import copy

with importlib.resources.path("sdmx", "sources.json") as path:
ref = copy(path)

with ref.open("rb") as f:
with importlib.resources.files("sdmx").joinpath("sources.json").open("rb") as f:
for info in json.load(f):
add_source(info)

Expand Down
Loading
Loading