Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate updating versions.list file #1400

Closed
sonalkr132 opened this issue Aug 30, 2016 · 4 comments · Fixed by #2403
Closed

Automate updating versions.list file #1400

sonalkr132 opened this issue Aug 30, 2016 · 4 comments · Fixed by #2403

Comments

@sonalkr132
Copy link
Member

http://rubygems.org/versions file is meant to updated every month. As versions.list file gets older, it gets more and more expensive to calculate the versions we need to append to the file.
We cache the calculated versions. Memcached has limit on value size of 1MB. We compress the value before it gets stored in memcached if value size is greater than 500KB. However, it is preferable that we never reach the state where we need to compress the data as later we have to price for decompression on cache hit.

@indirect
Copy link
Member

I would like to instrument the size of the calculated versions, so that we can see it change over time. If we did not have this problem, I would recommend updating the versions.list only once every 6-24 months, because any time we change it every user has to download the entire ~10MB file again from scratch. On a slow connection, that takes a very long time. :/

@sonalkr132
Copy link
Member Author

sonalkr132 commented Aug 31, 2016

Here is an estimate of value(data to append) bytesize using datadump of 2016-08-22
memcached limit: 1048576 bytes
Value size(in bytes) on different days:

Date without compression (bytes) with compression (bytes) read of compressed value (ms)
2016-04-22 4869446 1203480 -
2016-05-22 3604758 917240 334.24170999933267
2016-06-22 2363388 639711 189.20898199939984
2016-07-22 1159236 319872 105.26340199976403
2016-07-26 1022712 283015 98.98002299996733
2016-08-01 794487 221078 55.700863000311074

Read of value size 794487 bytes without compression takes 73.77823800015904 milliseconds (less than half of the time spent in read of compressed value of comparable size).
Time spent on read of compressed value is calculated from:

   time = Benchmark.realtime do
      versions_after_date = Rails.cache.read('versions')
    end

    puts "#{time*1000} milliseconds"

Looks like without compression we will need to update (or create a new chunk) versions.list in less than a month and with compression we will get little over 3 months.

@sonalkr132
Copy link
Member Author

@dwradcliffe

rake task generates and uploads a timestamped file to s3
then a deploy task can sync that file on all the servers
the deploy task could just run the rake task so it's just one button

@sonalkr132
Copy link
Member Author

In the code snippet you can see txp has is_gfx:>= 0.2 as dependency. However, is_gfx gem didn't exist when txp 0.1 was created, which meant the info content the version saw during creation was without the is_gfx dependency.
when is_gfx was pushed a few seconds later, we update the dependency record from unresolved_name to refer to an actual rubygem_id, and it meant info content changes after info_checksum on version referring it was set.

2.6.5 :028 > puts CompactIndex.info(GemInfo.new("txp").compact_index_info)
---
0.1 nis_gfx:>= 0.2|checksum:6169d5757f2d9f6754c40abae013a1f36de0897d6410260bb5db6108cf065376,ruby:>= 1.9
2.6.5 :029 > Digest::MD5.hexdigest("---\n0.1 nis_gfx: >= 0.2|checksum:6169d5757f2d9f6754c40abae013a1f36de0897d6410260bb5db6108cf065376,ruby:>= 1.9\n")
 => "87ac45462702677b546ca4ef787d8a25" 

unlike "fixing" dependency record from unresolved_name to real rubygem_id from a hook, we can't "fix" the version info_checksum using similar process. versions.list file is append-only and changing info_checksum from the middle of file would mean etag/checksum mismatch between client cache and server response. client will have to download the entire file, whenever someone pushes a version which has this issue if we do "fix" the info_checksum after the fact.

We are hoping to resolve the above-mentioned issue with automation of versions.list file update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants