-
Notifications
You must be signed in to change notification settings - Fork 93
Conversation
Allows peers to be added by CLI Lists peers at peers/list Modifies models to have tables for announcements and peers Stub out some tests
Each announcement's timestamp is recorded, as well as the host it came from
4923fe9
to
6a66b93
Compare
Flake fixes Remove prints
Codecov Report
@@ Coverage Diff @@
## master #1556 +/- ##
==========================================
- Coverage 85.95% 85.93% -0.03%
==========================================
Files 34 36 +2
Lines 7519 7783 +264
Branches 947 967 +20
==========================================
+ Hits 6463 6688 +225
- Misses 858 890 +32
- Partials 198 205 +7
Continue to review full report at Codecov.
|
Safely check for attributes
Improve error handling for malformed peers Remove id from peer, primary key is the URL Fix attributes setting bug Add peers to test imports
fcc4c94
to
7c82e2d
Compare
Remove peer adding from compliance as its part of initial peers list Flake fixes Fix tests
Handle empty page token
S3 is down and so Travis is not working... |
Add build to test data
9c7ac89
to
5b4914a
Compare
Fix peers list paging Turn down client verbosity for initial announce
This code is ready for review @dcolligan @ejacox . I can tell you the ugliest part is probably how announcements are handled. As we continue to develop the use cases on how we will work with announcements I expect to clean this up. For now, you can add a peer To test the announcement functionality, Finally, you can observe the protocol version the service provides using For a quick demo of using a client to discover an unknown peer: https://gist.github.com/david4096/8b5cdbfb3bfb00538ad0d4d4b302020e |
Exciting! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a message from the server; "Couldn't announce to initial peer init() got an unexpected keyword argument 'verbosity'".
Are the peer functions in the abstract data repo for testing only? Can you mark them as such if so.
Why is the initialization done in the frontend? Should this initialization code be put in the Peer class?
Otherwise, it works for me and the functionality that is there looks good.
Thanks @ejacox good questions. First, that first one is a bug, the announce on init isn't currently working since I was trying to suppress its output incorrectly. Second, those methods in AbstractDataRepository are only used during testing, yes, I'll make that more clear. I hope we can remove this class entirely as it's in-memory model isn't useful. Third, it does look a bit ugly to have that code in frontend and I don't like the idea of using frontend as a catch-all for operations. In a couple of cases I needed to have the Considering that, I would like to move it to another module and isolate how |
I would say move what you can and otherwise mark it as being in the wrong place. We can refactor it later. |
Updated Warning: Because of flask's use of the SimpleHTTPServer when running in debug mode, if you set the service itself as an initial peer it will result in a broken pipe. There isn't a great way to set the host the announcement is coming from, as this requires one to have found the public internet address of the service. To do that you need to make a request out, look at the response and then set the host. We could do this by including in the Then, when initiating a peer connection one first makes a If two peers run behind a NAT and peer using their local IPs (192.168.1.1 and 2), and one peer exposes its service to the wider internet at a public IP (72.X.X.X), then a peer on the public internet might get a peer that doesn't resolve for them when asking for its peer list (192.168.1.1). This is both a feature and a problem. It makes it easy to create private networks and for systems designers to create network architectures that suit their use case. It also makes it easy to create network architectures that quickly break assumptions of the overall network topology. A further complication is that though the request will originate from some host, the service itself might be running at a different port or path. When a potential peer is formulating its announce request, it will have to be able to sensibly identify the path and port it is running from. When running under Apache or nginx, at least, we don't have good way of determining the path a WSGI application might be running from. I don't think it's a good idea to require that the API runs at the top-level of a path. We want folks to be able to flexibly add the software at any path in their existing infrastructure. I think to support dynamic peer resolution under Apache/nginx, we'll have to be able to receive configuration details about the publicly visible port and path. |
Although 1kgenomes is on a released version and so doesn't have the
Compliance PR here ga4gh/compliance#247 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, I have a few minor things that I want to see changed, but overall I agree with the design.
import ga4gh.schemas.protocol as protocol | ||
|
||
|
||
def isUrl(urlString): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there isn't some way to do this with urllib
or urlparse
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and if there isn't, this is general enough to be pushed down into a lower level... at least put a TODO, we don't need to put it in, say, common
presently...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, instead of a regex you can just catch an exception from urllib
, I'll do that.
http://grokbase.com/t/python/python-list/018fex83ty/check-url-simply
On closer inspection I found that the urlparse method was too lenient for valid addresses and that in practice some regex is needed. I added it and kept the regex for now. We could definitely move it out, added an issue for it here #1594
ga4gh/server/datamodel/peers.py
Outdated
return regex.match(urlString) | ||
|
||
|
||
class Peer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should always use new-style classes, which have object
as a superclass
ga4gh/server/datarepo.py
Outdated
import ga4gh.server.exceptions as exceptions | ||
|
||
import repo.models as m | ||
import ga4gh.server.repo.models as m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use single-letter variable names; models
here is fine instead
ga4gh/server/datarepo.py
Outdated
def removePeer(self, url): | ||
""" | ||
Remove peers by URL. | ||
:return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete :return:
ga4gh/server/network/__init__.py
Outdated
@@ -0,0 +1,76 @@ | |||
""" | |||
Provides methods to initialize the server's peer to peer connections. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete blank line
ga4gh/server/network/__init__.py
Outdated
import os | ||
import requests | ||
|
||
from ga4gh.server import datamodel, exceptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the canonical import style: datamodel and exceptions should be on their own line, and in the idiom import x.y as z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had you in mind when I wrote that line @dcolligan :). I do like that we have uniform import statements, I just find this syntax nicer to read. We'd have to do it all at once of course. I'll change it back but continue to try to convince you that we could clean up the code base by switching to this style.
582 characters on 10 lines
import ga4gh.server.datamodel.datasets as datasets
import ga4gh.server.datamodel.ontologies as ontologies
import ga4gh.server.datamodel.reads as reads
import ga4gh.server.datamodel.references as references
import ga4gh.server.datamodel.variants as variants
import ga4gh.server.datamodel.sequence_annotations as sequence_annotations
import ga4gh.server.datamodel.continuous as continuous
import ga4gh.server.datamodel.bio_metadata as biodata
import ga4gh.server.datamodel.genotype_phenotype as genotype_phenotype
import ga4gh.server.datamodel.rna_quantification as rna_quantification
Might become 169 characters on 3 lines
from ga4gh.server.datamodel import ontologies, reads, \
references, variants, sequence_annotations, continuous, \
biodata, genotype_phenotype, rna_quantification
I think it might actually be more PEPpy?
http://stackoverflow.com/questions/15011367/python-module-import-single-line-vs-multi-line
This style also is highly readable but doesn't earn you any LOC, 10 lines 199 characters.
from ga4gh.server.datamodel import (
ontologies,
reads,
references,
variants,
sequence_annotations,
continuous,
biodata,
genotype_phenotype,
rna_quantification)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another consideration is searching through the code... it is much easier to grep for 'import ga4gh.server.datamodel.variants' to find all the places it is imported than... I'm not even sure what regex one would write for the alternatives.
ga4gh/server/network/__init__.py
Outdated
"file named 'initial_peers.txt' " | ||
"to {}".format(os.getcwd())) | ||
ret = [] | ||
# Remove lines that start with a hash or are empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps another future candidate for a common
method... we do this a lot with our config files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ga4gh/server/network/__init__.py
Outdated
""" | ||
|
||
try: | ||
textFile = open(filePath, 'r') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easier to do:
with open(filePath) as textFile:
lines = textFile.readlines()
@@ -88,7 +88,7 @@ def testDirectory(self): | |||
with self.assertRaises(exceptions.RepoInvalidDatabaseException): | |||
repo.open(datarepo.MODE_READ) | |||
|
|||
def testNonexistantFile(self): | |||
def testNonexistentFile(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always friggin spell this wrong. Thanks.
Thanks @dcolligan for your review! All good suggestions, I'll get to work on them. I am also going to add functionality for listing and clearing announcements using the CLI. |
Add positive check in test of https
Use readline and with in place of try, except Change import statement
This PR implements the peer service ga4gh/ga4gh-schemas#806 . It does not add any new dependencies,
but makes explicit the dependency the server has on the client for performing the announce request(switched to using requests, which was already a dependency).This could easily be replaced by using the requests module directly.It is meant to present the minimum functionality to support implementing the schemas PR. Coordinated testing amongst nodes is not addressed. Atlhough interesting, making peer to peer connections and hostname resolution is not addressed.
Peers can be announced to a server by sending a
AnnouncePeerRequest
to/announce
that looks like{peer: {uri: "http://localhost"}}
. These will be added as a timestamped row in the registry so they can be evaluated for being added as potential peers.Peers are managed via the repo manager,
ga4gh_repo add-peer http://1kgenomes.ga4gh.org
will add that peer to the registry. These peers are exposed via thepeers/list
endpoint (without rebooting the server).In order to support the feature of not reloading the server, some design decisions that expose the SQL of the data repository were made. My hope is that with this PR will help begin to form an opinion on how to separate the messages that require some compound ID (variant), with others that can use auto-incrementing IDs (dataset, peer, referenceset, and so on).
To gather more details about the AnnouncePeerRequest before logging it, the request is handled directly instead of via
handleFlaskPostRequest
breaking some design patterns.TODO
Add repo manager CLI methods for listing and clearning announcements.Repo manager documentationConfiguration setting documentationAdd an initial announce to peers at start.Tested against 1kgenomes, will send an announce packet at server start for any initial peers listed. Setting the URL to announce is still a challenge because of the different configurations we work under.Log the user-agent of theAnnouncePeerRequest
.A JSON with a smattering of extra fields about the announce request are stored to help detect spam.The submitter IP and user agent are stored in the database along with the announce request.Protocol version should be set automatically.Make peer and announcement datamodel objects.Add end to end tests that use the client (send announcement, list peers, get info).Add "initialpeers" file that is read at startup.