Skip to content

Commit 864cb2b

Browse files
committed
Version 0.42 - add search functionality, tests and documentation
1 parent 296042d commit 864cb2b

File tree

8 files changed

+520
-279
lines changed

8 files changed

+520
-279
lines changed

.idea/workspace.xml

Lines changed: 284 additions & 251 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ For more about the purpose and design philosophy, please visit [HDX Python Libra
1717
- [Operations on HDX Objects](#operations-on-hdx-objects)
1818
- [Dataset Specific Operations](#dataset-specific-operations)
1919
- [Working Example](#working-example)
20+
- [ACLED Example](#acled-example)
2021

2122
## Usage
2223
The library has detailed API documentation:
@@ -44,7 +45,7 @@ The first task is to create an API key file. By default this is assumed to be ca
4445

4546
To include the HDX Python library in your project, pip install the line below or add the following to your `requirements.txt` file:
4647

47-
git+git://github.com/ocha-dap/hdx-python-api.git@v0.41#egg=hdx-python-api
48+
git+git://github.com/ocha-dap/hdx-python-api.git@v0.42#egg=hdx-python-api
4849

4950
If you get errors, it is probably the dependencies of the cryptography package that are missing eg. for Ubuntu: python-dev, libffi-dev and libssl-dev. See [cryptography dependencies](https://cryptography.io/en/latest/installation/#building-cryptography-on-linux)
5051

@@ -67,7 +68,7 @@ Let's start with a simple example that also ensures that the library is working
6768
source test/bin/activate
6869
4. Install the HDX Python library:
6970

70-
pip install git+git://github.com/ocha-dap/hdx-python-api.git@v0.41#egg=hdx-python-api
71+
pip install git+git://github.com/ocha-dap/hdx-python-api.git@v0.42#egg=hdx-python-api
7172
5. If you get errors, it is probably the [dependencies of the cryptography package](#installing-the-library)
7273
6. Launch python:
7374

@@ -92,7 +93,11 @@ Let's start with a simple example that also ensures that the library is working
9293

9394
dataset['dataset_date'] = '06/25/2016'
9495
dataset.update_in_hdx()
95-
12. Exit and remove virtualenv:
96+
12. You can search for datasets on HDX:
97+
98+
datasets = Dataset.search_in_hdx(configuration, 'ACLED')
99+
print(datasets)
100+
13. Exit and remove virtualenv:
96101

97102
exit()
98103
deactivate
@@ -212,6 +217,12 @@ You can read an existing HDX object with the static `read_from_hdx` method whi
212217

213218
dataset = Dataset.read_from_hdx(configuration, 'DATASET_ID_OR_NAME')
214219

220+
You can search for datasets and resources in HDX using the `search_in_hdx` method which takes a configuration and a query parameter and returns the a list of objects of the appropriate HDX object type eg. `list[Dataset]` eg.
221+
222+
datasets = Dataset.search_in_hdx(configuration, 'QUERY')
223+
224+
The query parameter takes a different format depending upon whether it is for a [dataset](http://lucene.apache.org/core/3_6_0/queryparsersyntax.html) or a [resource](http://docs.ckan.org/en/ckan-2.3.4/api/index.html#ckan.logic.action.get.resource_search).
225+
215226
You can create an HDX Object, such as a dataset, resource or gallery item by calling the constructor with a configuration, which is required, and an optional dictionary containing metadata. For example:
216227

217228
from hdx.data.dataset import Dataset
@@ -354,6 +365,12 @@ Create a file `my_code.py` and copy into it the code below:
354365

355366
You can then fill out the function `generate_dataset` as required.
356367

368+
## ACLED Example
369+
357370
A complete example can be found here: [https://github.com/mcarans/hdxscraper-acled-africa](https://github.com/mcarans/hdxscraper-acled-africa)
358371

359-
In particular, take a look at the files `run.py`, `acled_africa.py` and the `config` folder.
372+
In particular, take a look at the files `run.py`, `acled_africa.py` and the `config` folder.
373+
374+
The ACLED scraper creates a dataset in HDX for [ACLED realtime data](https://data.humdata.org/dataset/acled-conflict-data-for-africa-realtime-2016) if it doesn't already exist, populating all the required metadata. It then creates resources that point to urls of [Excel and csv files for Realtime 2016 All Africa data](http://www.acleddata.com/data/realtime-data-2016/) (or updates the links and metadata if the resources already exist). Finally it creates a gallery item that points to these [dynamic maps and graphs](http://www.acleddata.com/visuals/maps/dynamic-maps/).
375+
376+
The first iteration of the ACLED scraper was written without the HDX Python library and it became clear looking at this and previous work by others that there are operations that are frequently required and which add unnecessary complexity to the task of coding against HDX. Simplifying the interface to HDX drove the development of the Python library and the second iteration of the scraper was built using it. With the interface using HDX terminology and mapping directly on to datasets, resources and gallery items, the ACLED scraper was faster to develop and is much easier to understand for someone inexperienced in how it works and what it is doing. The challenge with ACLED is that sometimes the urls that the resources point to have not been updated and hence do not work. In this situation, the extensive logging and transparent communication of errors is invaluable and enables action to be taken to resolve the issue as quickly as possible. The static metadata for ACLED is held in human readable files so if it needs to be modified, it is straightforward. This is another feature of the HDX Python library that makes putting data programmatically into HDX a breeze.

hdx/data/dataset.py

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
"""
77
import logging
88
from os.path import join
9-
109
from typing import Any, List, Optional
1110

1211
from hdx.configuration import Configuration
@@ -47,7 +46,8 @@ def actions() -> dict:
4746
'show': 'package_show',
4847
'update': 'package_update',
4948
'create': 'package_create',
50-
'delete': 'package_delete'
49+
'delete': 'package_delete',
50+
'search': 'package_search'
5151
}
5252

5353
def __setitem__(self, key: Any, value: Any) -> None:
@@ -255,6 +255,20 @@ def read_from_hdx(configuration: Configuration, identifier: str) -> Optional['Da
255255
return dataset
256256
return None
257257

258+
def _dataset_create_resources_gallery(self) -> None:
259+
"""Creates resource and gallery item objects in dataset
260+
"""
261+
262+
if 'resources' in self.data:
263+
self.old_data['resources'] = self._copy_hdxobjects(self.resources, Resource)
264+
self.separate_resources()
265+
if self.include_gallery:
266+
success, result = self._read_from_hdx('gallery', self.data['id'], 'id', GalleryItem.actions()['list'])
267+
if success:
268+
self.data['gallery'] = result
269+
self.old_data['gallery'] = self._copy_hdxobjects(self.gallery, GalleryItem)
270+
self.separate_gallery()
271+
258272
def _dataset_load_from_hdx(self, id_or_name: str) -> bool:
259273
"""Loads the dataset given by either id or name from HDX
260274
@@ -267,15 +281,7 @@ def _dataset_load_from_hdx(self, id_or_name: str) -> bool:
267281

268282
if not self._load_from_hdx('dataset', id_or_name):
269283
return False
270-
if 'resources' in self.data:
271-
self.old_data['resources'] = self._copy_hdxobjects(self.resources, Resource)
272-
self.separate_resources()
273-
if self.include_gallery:
274-
success, result = self._read_from_hdx('gallery', self.data['id'], GalleryItem.actions()['list'])
275-
if success:
276-
self.data['gallery'] = result
277-
self.old_data['gallery'] = self._copy_hdxobjects(self.gallery, GalleryItem)
278-
self.separate_gallery()
284+
self._dataset_create_resources_gallery()
279285
return True
280286

281287
def check_required_fields(self, ignore_fields: List[str] = list()) -> None:
@@ -422,3 +428,29 @@ def delete_from_hdx(self) -> None:
422428
None
423429
"""
424430
self._delete_from_hdx('dataset', 'id')
431+
432+
@staticmethod
433+
def search_in_hdx(configuration: Configuration, query: str) -> List['Dataset']:
434+
"""Searches for datasets in HDX
435+
436+
Args:
437+
configuration (Configuration): HDX Configuration
438+
query (str): Query
439+
440+
Returns:
441+
List[Dataset]: List of datasets resulting from query
442+
"""
443+
444+
datasets = []
445+
dataset = Dataset(configuration)
446+
success, result = dataset._read_from_hdx('dataset', query, 'q')
447+
if result:
448+
count = result.get('count', None)
449+
if count:
450+
for datasetdict in result['results']:
451+
dataset = Dataset(configuration)
452+
dataset.old_data = dict()
453+
dataset.data = datasetdict
454+
dataset._dataset_create_resources_gallery()
455+
datasets.append(dataset)
456+
return datasets

hdx/data/hdxobject.py

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -82,29 +82,34 @@ def update_json(self, path: str):
8282
"""
8383
self.data = load_json_into_existing_dict(self.data, path)
8484

85-
def _read_from_hdx(self, object_type: str, id_field: str, action: Optional[str] = None) -> Tuple[bool, dict]:
86-
"""Checks if the hdx object exists in HDX.
85+
def _read_from_hdx(self, object_type: str, value: str, fieldname: Optional[str] = 'id',
86+
action: Optional[str] = None) -> Tuple[bool, dict]:
87+
"""Makes a read call to HDX passing in given parameter.
8788
8889
Args:
8990
object_type (str): Description of HDX object type (for messages)
90-
id_field (str): HDX object identifier
91-
action (Optional[str]): Replacement CKAN url to use. Defaults to None.
91+
value (str): Value of HDX field
92+
fieldname (Optional[str]): HDX field name. Defaults to id.
93+
action (Optional[str]): Replacement CKAN action url to use. Defaults to None.
9294
9395
Returns:
9496
(bool, dict): (True/False, HDX object metadata/Error)
9597
"""
96-
if not id_field:
97-
raise HDXError("Empty %s identifier!" % object_type)
98+
if not value:
99+
raise HDXError("Empty %s value!" % object_type)
98100
if action is None:
99-
action = self.actions()['show']
101+
if fieldname == 'query' or fieldname == 'q':
102+
action = self.actions()['search']
103+
else:
104+
action = self.actions()['show']
100105
try:
101-
result = self.hdxpostsite.call_action(action, {'id': id_field},
106+
result = self.hdxpostsite.call_action(action, {fieldname: value},
102107
requests_kwargs={'auth': self.configuration._get_credentials()})
103108
return True, result
104109
except NotFound as e:
105-
return False, "%s not found!" % id_field
110+
return False, "%s=%s: not found!" % (fieldname, value)
106111
except Exception as e:
107-
raise HDXError('HTTP Get failed when trying to read %s' % id_field) from e
112+
raise HDXError('HTTP Get failed when trying to read: %s=%s' % (fieldname, value)) from e
108113

109114
def _load_from_hdx(self, object_type: str, id_field: str) -> bool:
110115
"""Helper method to load the HDX object given by identifier from HDX

hdx/data/resource.py

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
"""Resource class containing all logic for creating, checking, and updating resources."""
44
import logging
55
from os.path import join
6-
76
from typing import Optional, List
87

98
from hdx.configuration import Configuration
@@ -35,7 +34,8 @@ def actions() -> dict:
3534
'show': 'resource_show',
3635
'update': 'resource_update',
3736
'create': 'resource_create',
38-
'delete': 'resource_delete'
37+
'delete': 'resource_delete',
38+
'search': 'resource_search'
3939
}
4040

4141
def update_yaml(self, path: str = join('config', 'hdx_resource_static.yml')) -> None:
@@ -113,6 +113,29 @@ def delete_from_hdx(self) -> None:
113113
"""
114114
self._delete_from_hdx('resource', 'id')
115115

116+
@staticmethod
117+
def search_in_hdx(configuration: Configuration, query: str) -> List['Resource']:
118+
"""Searches for resources in HDX
119+
120+
Args:
121+
configuration (Configuration): HDX Configuration
122+
query (str): Query
123+
124+
Returns:
125+
List[Resource]: List of resources resulting from query
126+
"""
127+
128+
resources = []
129+
resource = Resource(configuration)
130+
success, result = resource._read_from_hdx('resource', query, 'query')
131+
if result:
132+
count = result.get('count', None)
133+
if count:
134+
for resourcedict in result['results']:
135+
resource = Resource(configuration, resourcedict)
136+
resources.append(resource)
137+
return resources
138+
116139
def create_datastore(self) -> None:
117140
"""TODO"""
118141
pass

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
setup(
1515
name='hdx-python-api',
16-
version='0.41',
16+
version='0.42',
1717
packages=find_packages(exclude=['ez_setup', 'tests', 'tests.*']),
1818
url='http://data.humdata.org/',
1919
license='PSF',

tests/hdx/data/test_dataset.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from hdx.data.dataset import Dataset
1313
from hdx.data.hdxobject import HDXError
1414
from hdx.utilities.dictionary import merge_two_dictionaries
15+
from hdx.utilities.loader import load_yaml
1516

1617

1718
class MockResponse:
@@ -83,6 +84,7 @@ def json(self):
8384
'solr_additions': '{"countries": ["Algeria", "Zimbabwe"]}',
8485
'dataset_date': '06/04/2016'}
8586

87+
searchdict = load_yaml(join('fixtures', 'search_results.yml'))
8688

8789
def mockshow(url, datadict):
8890
if 'show' not in url and 'related_list' not in url:
@@ -113,6 +115,28 @@ def mockshow(url, datadict):
113115
'{"success": false, "error": {"message": "Not found", "__type": "Not Found Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=dataset_show"}')
114116

115117

118+
def mocksearch(url, datadict):
119+
if 'search' not in url and 'related_list' not in url:
120+
return MockResponse(404,
121+
'{"success": false, "error": {"message": "TEST ERROR: Not search", "__type": "TEST ERROR: Not Search Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
122+
if 'related_list' in url:
123+
result = json.dumps(TestDataset.gallery_data)
124+
return MockResponse(200,
125+
'{"success": true, "result": %s, "help": "http://test-data.humdata.org/api/3/action/help_show?name=related_list"}' % result)
126+
result = json.dumps(searchdict)
127+
if datadict['q'] == 'ACLED':
128+
return MockResponse(200,
129+
'{"success": true, "result": %s, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}' % result)
130+
if datadict['q'] == '"':
131+
return MockResponse(404,
132+
'{"success": false, "error": {"message": "Validation Error", "__type": "Validation Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
133+
if datadict['q'] == 'ajyhgr':
134+
return MockResponse(200,
135+
'{"success": true, "result": {"count": 0, "results": []}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
136+
return MockResponse(404,
137+
'{"success": false, "error": {"message": "Not found", "__type": "Not Found Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
138+
139+
116140
class TestDataset():
117141
dataset_data = {
118142
'name': 'MyDataset1',
@@ -290,6 +314,15 @@ def mockreturn(url, data, headers, files, allow_redirects, auth):
290314

291315
monkeypatch.setattr(requests, 'post', mockreturn)
292316

317+
@pytest.fixture(scope='function')
318+
def search(self, monkeypatch):
319+
def mockreturn(url, data, headers, files, allow_redirects, auth):
320+
datadict = json.loads(data.decode('utf-8'))
321+
return mocksearch(url, datadict)
322+
323+
monkeypatch.setattr(requests, 'post', mockreturn)
324+
325+
293326
@pytest.fixture(scope='class')
294327
def configuration(self):
295328
hdx_key_file = join('fixtures', '.hdxkey')
@@ -460,3 +493,11 @@ def test_add_update_delete_gallery(self, configuration, post_delete):
460493
dataset.delete_galleryitem('NOTEXIST')
461494
dataset.delete_galleryitem('d59a01d8-e52b-4337-bcda-fceb1d059bef')
462495
assert len(dataset.gallery) == 0
496+
497+
def test_search_in_hdx(self, configuration, search):
498+
datasets = Dataset.search_in_hdx(configuration, 'ACLED')
499+
assert len(datasets) == 10
500+
datasets = Dataset.search_in_hdx(configuration, 'ajyhgr')
501+
assert len(datasets) == 0
502+
with pytest.raises(HDXError):
503+
Dataset.search_in_hdx(configuration, '"')

0 commit comments

Comments
 (0)