Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize on base64 encoded, SHA-256 hashed, normalized URIs as Redis keys #522

Closed
humphd opened this issue Dec 17, 2019 · 4 comments · Fixed by #565
Closed

Standardize on base64 encoded, SHA-256 hashed, normalized URIs as Redis keys #522

humphd opened this issue Dec 17, 2019 · 4 comments · Fixed by #565
Assignees
Labels
area: redis Redis Database related type: enhancement New feature or request

Comments

@humphd
Copy link
Contributor

humphd commented Dec 17, 2019

What would you like to be added:

Our Feed (see #520) and Post objects need to live in Redis. I've been doing a bit of research on the best way to do this, and here is what I suggest:

  1. A Feed is a Redis Hash stored using the feed's URL as its key. Let's use normalize-url to clean it up a bit (so we don't have slight variations leading to duplication) and then hash it using MD5 and base64 encode it (see below). We should also be namespacing our keys. I suggest t:feed:<base64 encoded MD5 hash>, where t=Telescope.
  2. A Post is a Redis Hash stored using the article's guid, which is (likely) a URI or URL. Let's hash using MD5 and then base64 encode it. I suggest t:post:<base64 encoded MD5 hash>.

To hash and encode these URIs in node, do this:

const crypto = require('crypto')

function toBase64(url) {
  return crypto.createHash('md5').update('some_string').digest('base64');
}

We'd also need a way to compare a URL with a key, so we can check if a URI is already in the database.

Why would you like this to be added:

I noticed this post by Redis' creator suggesting this approach for storing URLs.

@humphd
Copy link
Contributor Author

humphd commented Jan 20, 2020

Reading https://twitter.com/joepie91/status/1218867444489641985 made me think that we might want something other than MD5 here. We just need "unique" without security consideration, where fast is important. I'm not sure what the best algorithm would be. On my Mac, openssl has these implemented:

openssl dgst --help
unknown option '--help'
options are
-c              to output the digest with separating colons
-r              to output the digest in coreutils format
-d              to output debug info
-hex            output as hex dump
-binary         output in binary form
-sign   file    sign digest using private key in file
-verify file    verify a signature using public key in file
-prverify file  verify a signature using private key in file
-keyform arg    key file format (PEM)
-out filename   output to filename rather than stdout
-signature file signature to verify
-sigopt nm:v    signature parameter
-hmac key       create hashed MAC with key
-mac algorithm  create MAC (not neccessarily HMAC)
-macopt nm:v    MAC algorithm parameters or key
-gost-mac       to use the gost-mac message digest algorithm
-streebog512    to use the streebog512 message digest algorithm
-streebog256    to use the streebog256 message digest algorithm
-md_gost94      to use the md_gost94 message digest algorithm
-md4            to use the md4 message digest algorithm
-md5            to use the md5 message digest algorithm
-md5-sha1       to use the md5-sha1 message digest algorithm
-ripemd160      to use the ripemd160 message digest algorithm
-sha1           to use the sha1 message digest algorithm
-sha224         to use the sha224 message digest algorithm
-sha256         to use the sha256 message digest algorithm
-sha384         to use the sha384 message digest algorithm
-sha512         to use the sha512 message digest algorithm
-whirlpool      to use the whirlpool message digest algorithm

openssl enc --help
usage: enc -ciphername [-AadePp] [-base64] [-bufsize number] [-debug]
    [-in file] [-iv IV] [-K key] [-k password]
    [-kfile file] [-md digest] [-none] [-nopad] [-nosalt]
    [-out file] [-pass arg] [-S salt] [-salt]

 -A                 Process base64 data on one line (requires -a)
 -a                 Perform base64 encoding/decoding (alias -base64)
 -bufsize size      Specify the buffer size to use for I/O
 -d                 Decrypt the input data
 -debug             Print debugging information
 -e                 Encrypt the input data (default)
 -in file           Input file to read from (default stdin)
 -iv IV             IV to use, specified as a hexadecimal string
 -K key             Key to use, specified as a hexadecimal string
 -md digest         Digest to use to create a key from the passphrase
 -none              Use NULL cipher (no encryption or decryption)
 -nopad             Disable standard block padding
 -out file          Output file to write to (default stdout)
 -P                 Print out the salt, key and IV used, then exit
                      (no encryption or decryption is performed)
 -p                 Print out the salt, key and IV used
 -pass source       Password source
 -S salt            Salt to use, specified as a hexadecimal string
 -salt              Use a salt in the key derivation routines (default)
 -v                 Verbose

Valid ciphername values:

 -aes-128-cbc              -aes-128-cbc-hmac-sha1    -aes-128-cfb
 -aes-128-cfb1             -aes-128-cfb8             -aes-128-ctr
 -aes-128-ecb              -aes-128-gcm              -aes-128-ofb
 -aes-128-xts              -aes-192-cbc              -aes-192-cfb
 -aes-192-cfb1             -aes-192-cfb8             -aes-192-ctr
 -aes-192-ecb              -aes-192-gcm              -aes-192-ofb
 -aes-256-cbc              -aes-256-cbc-hmac-sha1    -aes-256-cfb
 -aes-256-cfb1             -aes-256-cfb8             -aes-256-ctr
 -aes-256-ecb              -aes-256-gcm              -aes-256-ofb
 -aes-256-xts              -aes128                   -aes192
 -aes256                   -bf                       -bf-cbc
 -bf-cfb                   -bf-ecb                   -bf-ofb
 -blowfish                 -camellia-128-cbc         -camellia-128-cfb
 -camellia-128-cfb1        -camellia-128-cfb8        -camellia-128-ecb
 -camellia-128-ofb         -camellia-192-cbc         -camellia-192-cfb
 -camellia-192-cfb1        -camellia-192-cfb8        -camellia-192-ecb
 -camellia-192-ofb         -camellia-256-cbc         -camellia-256-cfb
 -camellia-256-cfb1        -camellia-256-cfb8        -camellia-256-ecb
 -camellia-256-ofb         -camellia128              -camellia192
 -camellia256              -cast                     -cast-cbc
 -cast5-cbc                -cast5-cfb                -cast5-ecb
 -cast5-ofb                -chacha                   -des
 -des-cbc                  -des-cfb                  -des-cfb1
 -des-cfb8                 -des-ecb                  -des-ede
 -des-ede-cbc              -des-ede-cfb              -des-ede-ofb
 -des-ede3                 -des-ede3-cbc             -des-ede3-cfb
 -des-ede3-cfb1            -des-ede3-cfb8            -des-ede3-ofb
 -des-ofb                  -des3                     -desx
 -desx-cbc                 -gost89                   -gost89-cnt
 -gost89-ecb               -id-aes128-GCM            -id-aes192-GCM
 -id-aes256-GCM            -rc2                      -rc2-40-cbc
 -rc2-64-cbc               -rc2-cbc                  -rc2-cfb
 -rc2-ecb                  -rc2-ofb                  -rc4
 -rc4-40                   -rc4-hmac-md5

@humphd
Copy link
Contributor Author

humphd commented Jan 20, 2020

Probably we can just swap out MD5 for SHA-256:

crypto.createHash('sha-256').update('some-string').digest('base64')

@humphd humphd changed the title Standardize on base64 encoded, MD5 hashed, normalized URIs as Redis keys Standardize on base64 encoded, SHA-256 hashed, normalized URIs as Redis keys Jan 20, 2020
@humphd
Copy link
Contributor Author

humphd commented Jan 20, 2020

One bug this will fix, that our current implementation has, is that we won't keep duplicating feeds in Redis. The fix in #547 really exposes it, since it keeps doubling, tripling, etc. the feeds that have to get processed. So we should probably do this sooner vs. later.

@manekenpix
Copy link
Member

I'll try to have a PR by the end of today or tomorrow morning.

cindyorangis added a commit that referenced this issue Jan 22, 2020
Closes #522: Standardize on base64 encoded, SHA-256 hashed, normalized URIs as Redis keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: redis Redis Database related type: enhancement New feature or request
Projects
None yet
2 participants