-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize on base64 encoded, SHA-256 hashed, normalized URIs as Redis keys #522
Comments
Reading https://twitter.com/joepie91/status/1218867444489641985 made me think that we might want something other than MD5 here. We just need "unique" without security consideration, where fast is important. I'm not sure what the best algorithm would be. On my Mac, openssl has these implemented:
|
Probably we can just swap out MD5 for SHA-256: crypto.createHash('sha-256').update('some-string').digest('base64') |
One bug this will fix, that our current implementation has, is that we won't keep duplicating feeds in Redis. The fix in #547 really exposes it, since it keeps doubling, tripling, etc. the feeds that have to get processed. So we should probably do this sooner vs. later. |
I'll try to have a PR by the end of today or tomorrow morning. |
Closes #522: Standardize on base64 encoded, SHA-256 hashed, normalized URIs as Redis keys
What would you like to be added:
Our
Feed
(see #520) andPost
objects need to live in Redis. I've been doing a bit of research on the best way to do this, and here is what I suggest:Feed
is a Redis Hash stored using the feed's URL as its key. Let's use normalize-url to clean it up a bit (so we don't have slight variations leading to duplication) and then hash it using MD5 and base64 encode it (see below). We should also be namespacing our keys. I suggestt:feed:<base64 encoded MD5 hash>
, wheret=Telescope
.Post
is a Redis Hash stored using the article'sguid
, which is (likely) a URI or URL. Let's hash using MD5 and then base64 encode it. I suggestt:post:<base64 encoded MD5 hash>
.To hash and encode these URIs in node, do this:
We'd also need a way to compare a URL with a key, so we can check if a URI is already in the database.
Why would you like this to be added:
I noticed this post by Redis' creator suggesting this approach for storing URLs.
The text was updated successfully, but these errors were encountered: