Skip to content
This repository has been archived by the owner on Aug 19, 2020. It is now read-only.

Commit

Permalink
Merge pull request #252 from mlhenderson/ga_download
Browse files Browse the repository at this point in the history
0.4.1
  • Loading branch information
mlhenderson committed Sep 23, 2016
2 parents 6753107 + 7b0606e commit 1f4d0f7
Show file tree
Hide file tree
Showing 13 changed files with 35 additions and 811 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ before_script:
- printf "Build dir = $TRAVIS_BUILD_DIR\n"

script:
- coverage run `which nosetests` -c nose.cfg -c nose-local.cfg
- coverage run `which nosetests` -c nose.cfg -c nose-local.cfg --verbosity=3
- coverage report --skip-covered

env:
Expand Down
79 changes: 2 additions & 77 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**Version**: 0.1.0
**Version**: 0.4.1

[![Join the chat at https://gitter.im/kbase/data_api](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/kbase/data_api?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

Expand All @@ -19,7 +19,7 @@ Master branch status

Develop branch status
[![Build Status](https://travis-ci.org/kbase/data_api.svg?branch=develop)](https://travis-ci.org/kbase/data_api)
[![Coverage Status](http://codecov.io/github/kbase/data_api/coverage.svg?branch=develop)](http://codecov.io/github/kbase/data_api?branch=master)
[![Coverage Status](http://codecov.io/github/kbase/data_api/coverage.svg?branch=develop)](http://codecov.io/github/kbase/data_api?branch=develop)
![Coverage Graph](http://codecov.io/github/kbase/data_api/branch.svg?branch=develop&time=1y)

##### Table of Contents
Expand Down Expand Up @@ -187,18 +187,6 @@ The sub-repository will be cloned in the directory `test_resources`.
Starting the Redis instance:

redis-server redis.conf

## Starting the Data API services

Services can be started using the data_api_start_service.py script, which is in your path from a virtualenv install.

data_api_start_service.py --config deployment.cfg --service taxon --port 9101
data_api_start_service.py --config deployment.cfg --service assembly --port 9102

You can add a --kbase_url argument to indicate which service targets and configs from deployment.cfg to use.
For instance, to set the services to use local files and assume a running Redis instance:

data_api_start_service.py --config deployment.cfg --service assembly --port 9102 --kbase_url=dir_cache

The available targets are:

Expand All @@ -209,30 +197,6 @@ The sub-repository will be cloned in the directory `test_resources`.
- **dir_cache** : Use local test files, assume local Redis caching
- **dir_nocache** : Use local test files, do not attempt to cache using Redis

### Service logging

The Data API service logging is controlled by a file named, by default, "logging.conf", in the same directory as the configuration file from the `--config` option. You can choose another file with the `--log-config` option. If no explicit configuration is given, and the default file is not present, some basic default logging configuration will be used. Failure to open or parse a file provided with the `--log-config` option will cause the program to stop and exit.

For example:

* `data_api_start_service.py --service genome_annotation --config deployment.cfg` will look for a file `logging.conf` in the current directory.
* `data_api_start_service.py --service genome_annotation --config deployment.cfg --log-config ./logging.conf` is equivalent to the previous command.
* `data_api_start_service.py --service genome_annotation --config deployment.cfg --log-config /tmp/logging-test/logging.yaml` will configure logging from the named file in "/tmp/logging-test" instead of looking for "logging.conf" in the current directory. If that file does not exist, the program will stop with an error.

The configuration file is formatted using either the (older) ConfigParser format used by `logging.config.fileConfig` or the (newer) YAML format that is parsed and fed to `logging.config.dictConfig`; the format is auto-detected via trial and error. See the [logging.config module](https://docs.python.org/2.7/library/logging.config.html) documentation for details on the formats of these files. The `logging.conf` example in the root of the repository demonstrates how to configure output to a file and console, and control the level of logging for each service.

The output log format is largely controlled by the configuration file. Some of the messages have additional standardized formatting applied to the part in the `%(message)s` format code, i.e. the free-form string part of the log message. This standardized formatting is encoded the `doekbase.util.log_{event,start,end}` functions. (Note: This formatting can be altered, too, by changing the global `_MESSAGE_FORMAT` dict of the module; but this is not recommended). The basic idea is to reduce the free-form part to a simple name for the event, and put everything else in key/value pairs with a standard format. Also, messages at the beginning and the end of something (in cases where that makes sense) should be named so that they are easily matched. For example, the pair of log messages for the service starting and stopping look like this:

2016-02-18 05:34:39,260 [INFO] doekbase.data_api.annotation.genome_annotation.service.driver: start_service.begin | host='',port=9103
# ...
2016-02-18 05:35:01,910 [INFO] doekbase.data_api.annotation.genome_annotation.service.driver: start_service.end (22.649431) | host='',port=9103

Note the '.begin' and '.end' event suffixes, as well as the "|" separating the message into name from the key/value pairs of the values. The number in parentheses on the '.end' event is the duration in seconds since the corresponding '.start'. For a message that is not part of a begin/end pair, the format is the same:

2016-02-18 05:34:39,137 [INFO] doekbase.data_api_start_service: activating REDIS | host=localhost,port=6379

Not all, in fact right now not even most, of the messages from the services have this standard format extension. But the goal is to slowly convert them. (We'll see how that goes!)

## Testing

To verify all Data API code with local tests.
Expand All @@ -243,54 +207,15 @@ Not all, in fact right now not even most, of the messages from the services have

Install the [Test data](README.md#test-data)

Start each of the API services:

data_api_start_service.py --config deployment.cfg --service taxon --port 9101 &
data_api_start_service.py --config deployment.cfg --service assembly --port 9102 &
data_api_start_service.py --config deployment.cfg --service genome_annotation --port 9103 &

Run nosetests from the data_api source directory, which will use the test data:

nosetests -c nose.cfg -c nose-local.cfg -s doekbase.data_api

### TR;DL testing

An easier way to run the tests is to run the script `run-tests-local.sh` at the top level of the directory.

In addition to performing steps listed above, this does some sanity checks and also regenerates the Thrift
server and client stubs (in case you changed these). It also runs the Thrift preprocessor (see below).

### JavaScript tests

For the JavaScript API, all the code and tests live under `jslib`. See the README in that directory for more details.


### Example narratives

Retrieving and counting genomic features with a local data API client for a [GenomeAnnotation object] (https://narrative-ci.kbase.us/narrative/ws.3413.obj.1)
Retrieving and counting genomic features with direct data API access for a [GenomeAnnotation object] (https://narrative-ci.kbase.us/narrative/ws.3292.obj.1)
A [table] of genome properties for all genomes belonging to a taxon (https://narrative-ci.kbase.us/narrative/ws.3524.obj.1)
Panel of data quality plots for [GenomeAnnotation and Assembly objects] (https://narrative-ci.kbase.us/narrative/ws.3413.obj.1)

## Thrift preprocessor

For convenience, we have added a simple Thrift preprocessor. There is a driver program in the `bin` directory called `data_api_preprocess_thrift`.
This calls functions in the `doekbase.data_api.thrift_include` module to do the work.
"Why? Dear God, why?" you ask, since Thrift already has an include mechanism. Yes, it does, but this ends up complicating the namespace and adding files. And we just didn't feel
like dealing with that today. We wanted a dead-simple literal include mechanism.
So, what you do is add this line to your spec:

#%include api_shared

in your API's Thrift spec, and this special comment will get replaced with:

#%include api_shared
<contents of api_shared.thrift>
#%endinclude api_shared
In the driver script, you can specify include directories in which to look for the .thrift files to include. This is invoked automatically as part of rebuilding the
clients and servers, with the include directories being the current directory and the `thrift/specs/common` directory.

Note that includes can include other things (this is in fact what `api_shared.thrift` does), etc.

# FIN
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.4.0
0.4.1
Binary file added lib/__init__.pyc
Binary file not shown.
29 changes: 21 additions & 8 deletions lib/doekbase/data_api/annotation/genome_annotation/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,21 +646,34 @@ def __init__(self, services, token, ref):

def get_taxon(self, ref_only=False):
from doekbase.data_api.taxonomy.taxon.api import TaxonAPI
possible_ref = self.get_data_subset(path_list=["taxon_ref"])

if "taxon_ref" in possible_ref and possible_ref["taxon_ref"]:
taxon_ref = possible_ref["taxon_ref"]
else:
taxon_ref = self.ref

if ref_only:
return self.ref
return taxon_ref
else:
return TaxonAPI(self.services, token=self._token, ref=self.ref)
return TaxonAPI(self.services, token=self._token, ref=taxon_ref)

def get_assembly(self, ref_only=False):
from doekbase.data_api.sequence.assembly.api import AssemblyAPI

contigset_ref = self.get_data_subset(path_list=["contigset_ref"])["contigset_ref"]
possible_refs = self.get_data_subset(path_list=["contigset_ref", "assembly_ref"])

if "contigset_ref" in possible_refs and possible_refs["contigset_ref"]:
assembly_ref = possible_refs["contigset_ref"]
elif "assembly_ref" in possible_refs and possible_refs["assembly_ref"]:
assembly_ref = possible_refs["assembly_ref"]
else:
raise AttributeError("No assembly reference found!")

if ref_only:
return contigset_ref
return assembly_ref
else:
return AssemblyAPI(self.services, self._token, ref=contigset_ref)
return AssemblyAPI(self.services, self._token, ref=assembly_ref)

def get_feature_types(self):
feature_types = []
Expand Down Expand Up @@ -1358,11 +1371,11 @@ def get_feature_ids(self, filters=None, group_by="type"):

if group_by == "type" or "type_list" in filters:
limited_keys.append("type")
elif group_by == "region" or "region_list" in filters:
if group_by == "region" or "region_list" in filters:
limited_keys.append("locations")
elif group_by == "alias" or "alias_list" in filters:
if group_by == "alias" or "alias_list" in filters:
limited_keys.append("aliases")
elif group_by == "function" or "function_list" in filters:
if group_by == "function" or "function_list" in filters:
limited_keys.append("function")

paths = ['features/*/' + k for k in limited_keys]
Expand Down
12 changes: 9 additions & 3 deletions lib/doekbase/data_api/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,8 +208,14 @@ def get_data_subset(self, parent_method, path_list=None):
# creator function, currying path_list arg.
def creator():
return parent_method(path_list=path_list)
# get from cache, or create
data = self.cache_get_or_create(key, creator)

# if there is no caching, call the creator directly
# NOTE: this was to avoid undesirable behavior when the result was empty
if self.cache_class == NullCache:
data = creator()
else:
# get from cache, or create
data = self.cache_get_or_create(key, creator)
#self._stats.end_event('cache.get_data_subset', self._key)
return data

Expand All @@ -231,7 +237,7 @@ def cache_get_or_create(self, key, creator):
try:
data = self._cache.get_or_create(key, creator)

if data:
if data is not None:
break
except redis.BusyLoadingError:
_log.warn('Redis is busy, sleep for 0.1s and try again')
Expand Down
2 changes: 1 addition & 1 deletion lib/doekbase/data_api/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

# Version

DATA_API_VERSION = "0.4.0"
DATA_API_VERSION = "0.4.1"

def version():
return DATA_API_VERSION
Expand Down
Loading

0 comments on commit 1f4d0f7

Please sign in to comment.