-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved redis performance in large keyspace, added pagination, and auto-expiration #119
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
module Gush | ||
class IndexWorkflowsByCreatedAtAndExpiresAt < Gush::Migration | ||
def self.version | ||
1 | ||
end | ||
|
||
def up | ||
redis.scan_each(match: "gush.workflows.*").map do |key| | ||
id = key.sub("gush.workflows.", "") | ||
workflow = client.find_workflow(id) | ||
|
||
ttl = redis.ttl(key) | ||
redis.persist(key) | ||
workflow.jobs.each { |job| redis.persist("gush.jobs.#{id}.#{job.klass}") } | ||
|
||
client.persist_workflow(workflow) | ||
client.expire_workflow(workflow, ttl.positive? ? ttl : -1) | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
module Gush | ||
class Migration | ||
def migrate | ||
return if migrated? | ||
|
||
up | ||
migrated! | ||
end | ||
|
||
def up | ||
# subclass responsibility | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we consider raising a |
||
raise NotImplementedError | ||
end | ||
|
||
def version | ||
self.class.version | ||
end | ||
|
||
def migrated? | ||
redis.sismember("gush.migration.schema_migrations", version) | ||
end | ||
|
||
private | ||
|
||
def migrated! | ||
redis.sadd("gush.migration.schema_migrations", version) | ||
end | ||
|
||
def client | ||
@client ||= Client.new | ||
end | ||
|
||
def redis | ||
Gush::Client.redis_connection(client.configuration) | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,10 @@ def self.find(id) | |
Gush::Client.new.find_workflow(id) | ||
end | ||
|
||
def self.page(start=0, stop=99, order: :asc) | ||
Gush::Client.new.workflows(start, stop, order: order) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I do prefer to have 'all positional or all kw' args unless I have a really strong reason not to. Mixing them can be confusing as you make more contexts for usage. I think you do a good job handling the differences in the code, just a style thought. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was mirroring the redis zrange argument style here (though one difference is that that method doesn't have default values for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a preference here, so up to you 👍 |
||
end | ||
|
||
def self.create(*args, **kwargs) | ||
flow = new(*args, **kwargs) | ||
flow.save | ||
|
@@ -185,6 +189,7 @@ def to_hash | |
total: jobs.count, | ||
finished: jobs.count(&:finished?), | ||
klass: name, | ||
job_klasses: jobs.map(&:class).map(&:to_s).uniq, | ||
status: status, | ||
stopped: stopped, | ||
started_at: started_at, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a bit about using the #to_f value for this. That value is explicitly an approximation (of the rational representation), and doing math with floats is notoriously ... not fun.
We could try and do something using the integer (ie
#to_i
) and sub-second (ie#usec
or#nsec
) values themselves to avoid any approximation and float messiness down the road. The first thing that comes to mind would be some concatenated integer/BigDecimal representation. This kind of approach would even let us be explicit about the precision of our value here, which I do like. The limit being just how big a number Redis will let us put in there :)This might be 'not worth it' right now (which is a fine answer) but I do worry we might set up some nasty surprises later with a float here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redis uses floating point numbers for sorted scores. I think the rationale for using them here is:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the Redis side is a float regardless. The imprecision of the estimate for the Time#to_f value should be at the nanosecond (or at worst maybe 10s of nanoseconds) level so it's unlikely that we would get any ordering wonkiness unless our throughput here was extremely high (or extremely distributed etc). Even then the ordering being off may or may not be significant. Can definitely see this being a tradeoff that works well for this code.
The other concerns would really be implementation side for anyone doing some calculation based off the Gush expire_at values in their code but they should take care to check things before they do math with Time floats anyways :)
Thanks for replying!