Skip to content
This repository has been archived by the owner on Apr 17, 2023. It is now read-only.

Take advantage of the Catalog API #199

Closed
mssola opened this issue Jul 6, 2015 · 6 comments · Fixed by #260
Closed

Take advantage of the Catalog API #199

mssola opened this issue Jul 6, 2015 · 6 comments · Fixed by #260

Comments

@mssola
Copy link
Collaborator

mssola commented Jul 6, 2015

Right now Portus' database is filled in using the data sent by distribution via the webhook API. Glitches inside of Portus could misalign the contents of its database in regards of what is actually inside of the registry.

The next version of distribution (not yet released) is going to provide a search API that we can use to periodically sync our database with the real contents of the registry.

Portus will call the remote API exposed by the registry, parse the response and populate the database accordingly. We already have a client library that can be used to talk with registry. Note that registry's code is already in master.

@lsamayoa
Copy link
Contributor

Catalog API was already merged

@mssola
Copy link
Collaborator Author

mssola commented Aug 3, 2015

I just assigned myself to denote that I'm already doing some research on this. Sorry for the inconvenience @lsamayoa :)

@mssola
Copy link
Collaborator Author

mssola commented Aug 3, 2015

I've decided since PR #241 that this issue should also serve as a discussion about how we can implement this feature in the cleanest way.

@mssola
Copy link
Collaborator Author

mssola commented Aug 3, 2015

In my opinion, there are two strategies around this:

  1. We can use an existing gem that deals with background processes like Sidekiq that continuously performs the sync if needed (e.g. after 5min, sync if needed). Another similar approach, proposed by @lsamayoa is to use crono. Crono has the benefits that it does not require Redis to actually work.
  2. We can write a standalone program for that. A simple script or application dedicated just for this.

It might seem crazy at first, but I'd go for the second option. There are two main reason why I wouldn't go for the first option:

  • I'm really skeptical when it comes to background jobs in Ruby. This is mainly due the limitations of Ruby (MRI at least) on concurrency.
  • In my mind what the Rails application has to implement is the web UI and the authorization service. Anything beyond that is out of the scope of the application itself in my opinion. Therefore I would write and independent script/application that deals with this.

That being said, I can see why the crono gem can be appealing, but at first glance I don't like that it touches the database to persist the state of the running job. Moreover, we don't need to worry about using platform-dependent stuff.

@lsamayoa
Copy link
Contributor

lsamayoa commented Aug 3, 2015

I don't think creating a standalone script would be a good idea doe, I
don't think we should reinvent the wheel.
I would advice to use ActiveJob to have multiple backends available
(crono/sidekiq/etc). And leave scheduling to consumers, maybe just make a
rake task available to sync the app.

With crono background jobs are proccesed in a diferent proccess, just like
with sidekiq.

Sidenote: crono does not use platform dependent stuff.

2015-08-03 10:04 GMT-06:00 Miquel Sabaté Solà notifications@github.com:

In my opinion, there are two strategies around this:

  1. We can use an existing gem that deals with background processes
    like Sidekiq http://sidekiq.org/ that continuously performs the sync
    if needed (e.g. after 5min, sync if needed). Another similar approach,
    proposed by @lsamayoa https://github.com/lsamayoa is to use crono
    https://github.com/plashchynski/crono. Crono has the benefits that
    it does not require Redis to actually work.
  2. We can write a standalone program for that. A simple script or
    application dedicated just for this.

It might seem crazy at first, but I'd go for the second option. There are
two main reason why I wouldn't go for the first option:

  • I'm really skeptical when it comes to background jobs in Ruby. This
    is mainly due the limitations of Ruby (MRI at least) on concurrency.
  • In my mind what the Rails application has to implement is the web UI
    and the authorization service. Anything beyond that is out of the scope of
    the application itself in my opinion. Therefore I would write and
    independent script/application that deals with this.

That being said, I can see why the crono gem can be appealing, but at
first glance I don't like that it touches the database to persist the state
of the running job. Moreover, we don't need to worry about using
platform-dependent stuff.


Reply to this email directly or view it on GitHub
#199 (comment).

@flavio
Copy link
Member

flavio commented Aug 4, 2015

Right now Portus builds its knowledge of the registry by processing the notifications messages sent by the registry. This works fine and has the advantage of being basically real time.

However there are at least two big limitations:

  • Registry's notification messages could be lost or there could be temporary problems processing them on the Portus side. That would cause the data to be lost forever.
  • There's no way to add Portus to an already existing registry and build Portus' knowledge from scratch.

This is where the catalog API can help us. Since I'm more interested in solving these issues I think a solution like cronos could be fine. We can still rely on the notification messages to have real time feedback and achieve data consistency running some "catalog-sync" code on a daily basis. Using cronos (or something similar) has the big advantage of keeping the deployment and maintenance work low because there's no new moving part like Redis to be involved.
The same "catalog-sync" can be used to bootstrap Portus' database from scratch.

Going back to Redis, registry's master code has optional support for it. We should look more into that and understand if the in memory alternative is good enough for production usage. If using Redis is the recommended way for production usage, then we should not penalize solutions like sidekiq (after all everybody will already have to deal with Redis).

Talking about sidekiq and similar. I think they could be useful to schedule background jobs as a response to some notifications sent by the registry. IMO Portus can keep creating new repositories/tags using the information attached to the notification message; at the same time it can schedule a background job that fetches and process more data about the new repository/tag.

The first objective of Portus is to act as an authentication service for the Docker registry. Hence the push of a Docker image should be processed immediately to allow other users to access it. The details we can get by processing the image manifest have a lower priority, hence they can be extracted in the background even with a little delay. However, we should discuss about that here.

mssola added a commit to mssola/Portus that referenced this issue Aug 12, 2015
In recent commits, the RegistryClient not only was fixed, but it also
introduced the `catalog` method. With this commit we take advantage of this new
method by calling it in a new crono job.

For managing and launching this new job we use the `crono` gem as suggested by
@lsamayoa.

Fixes SUSE#199
mssola added a commit to mssola/Portus that referenced this issue Aug 12, 2015
In recent commits, the RegistryClient not only was fixed, but it also
introduced the `catalog` method. With this commit we take advantage of this new
method by calling it in a new crono job.

For managing and launching this new job we use the `crono` gem as suggested by
@lsamayoa.

Fixes SUSE#199
mssola added a commit to mssola/Portus that referenced this issue Aug 12, 2015
In recent commits, the RegistryClient not only was fixed, but it also
introduced the catalog method. With this commit we take advantage of this new
method by calling it in a new crono job.

For managing and launching this new job we use the crono gem as suggested by
@lsamayoa.

Fixes SUSE#199
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants