-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Casefold when processing email addresses #374
Merged
Merged
Changes from 29 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
087bd4a
add casefolding to bind and associated functions
H-Shay 0988bdd
add casefolding to necessary functions + lints
H-Shay d80191a
add changelog
H-Shay 3227de7
fix broken terms test
H-Shay 5950186
draft migration script
H-Shay 80a33f3
draft test for migration script
H-Shay 406c9ac
add send email and email test
H-Shay d20c2bc
function for updating global table added
H-Shay 4449f00
lints
H-Shay 9b363c5
requested changes
H-Shay a99a18e
lints
H-Shay 568b603
fix lint fail
H-Shay 775f0ec
update tests + misc requested changes
H-Shay 7ae6d36
extract casefold logic to function and add to affected files
H-Shay 5d3ecc8
update email template and associated test
H-Shay 76ad036
lints
H-Shay e23232a
requested changes
H-Shay 3871005
rename file, rework to be script, rework tests to test script
H-Shay 1f8ebfe
update tests and dry run/no email versions
H-Shay 69f654d
update tests + scripts
H-Shay 4b83b25
lints
H-Shay ac08265
add ability to call script from command line and update tests
H-Shay 91d98d3
remove deleted files
H-Shay ec585a3
refine commandline invocation, add print statement
H-Shay ef40611
lints
H-Shay e774fa1
requested changes
H-Shay 83560a4
requested changes + lints
H-Shay eb5ad93
requested changes
H-Shay f1b4f9e
lints
H-Shay 969dda4
Actually let me just do that myself - doesn't have to wait till Shay …
babolivier b6b95c6
Merge branch 'matrix-org:main' into casefold
H-Shay 96eee48
lints
H-Shay File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Case-fold email addresses when binding to MXIDs or performing look-ups. Contributed by H. Shay. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
Date: %(date)s | ||
From: %(from)s | ||
To: %(to)s | ||
Message-ID: %(messageid)s | ||
Subject: %(subject_header_value)s | ||
MIME-Version: 1.0 | ||
Content-Type: multipart/alternative; | ||
boundary="%(multipart_boundary)s" | ||
|
||
--%(multipart_boundary)s | ||
Content-Type: text/plain; charset=UTF-8 | ||
Content-Disposition: inline | ||
|
||
Hello, | ||
|
||
We’ve recently improved how people discover your Matrix account. | ||
In the past, identity services did not take capitalization into account when creating and storing Matrix IDs. We’ve now updated this behavior so anyone can find you, no matter how your email is capitalized. As part of this recent update, the duplicate Matrix ID %(mxid)s is no longer associated with this e-mail address. | ||
|
||
No action is needed on your part. This doesn’t affect any passwords or password reset options on your account. | ||
|
||
|
||
About Matrix: | ||
|
||
Matrix.org is an open standard for interoperable, decentralised, real-time communication | ||
over IP, supporting group chat, file transfer, voice and video calling, integrations to | ||
other apps, bridges to other communication systems and much more. It can be used to power | ||
Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere | ||
you need a standard HTTP API for publishing and subscribing to data whilst tracking the | ||
conversation history. | ||
|
||
Matrix defines the standard, and provides open source reference implementations of | ||
Matrix-compatible Servers, Clients, Client SDKs and Application Services to help you | ||
create new communication solutions or extend the capabilities and reach of existing ones. | ||
|
||
Thanks, | ||
|
||
Matrix |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,258 @@ | ||
#!/usr/bin/env python | ||
# Copyright 2021 The Matrix.org Foundation C.I.C. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse | ||
import json | ||
import os | ||
import sqlite3 | ||
import sys | ||
from typing import Any, Dict, List, Tuple | ||
|
||
import signedjson.sign | ||
|
||
from sydent.sydent import Sydent, parse_config_file | ||
from sydent.util import json_decoder | ||
from sydent.util.emailutils import sendEmail | ||
from sydent.util.hash import sha256_and_url_safe_base64 | ||
from tests.utils import ResolvingMemoryReactorClock | ||
|
||
|
||
def calculate_lookup_hash(sydent, address): | ||
cur = sydent.db.cursor() | ||
pepper_result = cur.execute("SELECT lookup_pepper from hashing_metadata") | ||
pepper = pepper_result.fetchone()[0] | ||
combo = "%s %s %s" % (address, "email", pepper) | ||
lookup_hash = sha256_and_url_safe_base64(combo) | ||
return lookup_hash | ||
|
||
|
||
def update_local_associations( | ||
sydent, db: sqlite3.Connection, send_email: bool, dry_run: bool | ||
): | ||
"""Update the DB table local_threepid_associations so that all stored | ||
emails are casefolded, and any duplicate mxid's associated with the | ||
given email are deleted. | ||
|
||
:return: None | ||
""" | ||
cur = db.cursor() | ||
|
||
res = cur.execute( | ||
"SELECT address, mxid FROM local_threepid_associations WHERE medium = 'email'" | ||
"ORDER BY ts DESC" | ||
) | ||
|
||
# a dict that associates an email address with correspoinding mxids and lookup hashes | ||
associations: Dict[str, List[Tuple[str, str, str]]] = {} | ||
|
||
# iterate through selected associations, casefold email, rehash it, and add to | ||
# associations dict | ||
for address, mxid in res.fetchall(): | ||
casefold_address = address.casefold() | ||
|
||
# rehash email since hashes are case-sensitive | ||
lookup_hash = calculate_lookup_hash(sydent, casefold_address) | ||
|
||
if casefold_address in associations: | ||
associations[casefold_address].append((address, mxid, lookup_hash)) | ||
else: | ||
associations[casefold_address] = [(address, mxid, lookup_hash)] | ||
|
||
# list of arguments to update db with | ||
db_update_args: List[Tuple[str, str, str, str]] = [] | ||
|
||
# list of mxids to delete | ||
to_delete: List[Tuple[str]] = [] | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# list of mxids to send emails to letting them know the mxid has been deleted | ||
mxids: List[Tuple[str, str]] = [] | ||
|
||
for casefold_address, assoc_tuples in associations.items(): | ||
db_update_args.append( | ||
( | ||
casefold_address, | ||
assoc_tuples[0][2], | ||
assoc_tuples[0][0], | ||
assoc_tuples[0][1], | ||
) | ||
) | ||
|
||
if len(assoc_tuples) > 1: | ||
# Iterate over all associations except for the first one, since we've already | ||
# processed it. | ||
for address, mxid, _ in assoc_tuples[1:]: | ||
to_delete.append((address,)) | ||
mxids.append((mxid, address)) | ||
|
||
# iterate through the mxids and send email, let's only send one email per mxid | ||
if send_email and not dry_run: | ||
for mxid, address in mxids: | ||
processed_mxids = [] | ||
|
||
if mxid in processed_mxids: | ||
continue | ||
else: | ||
templateFile = sydent.get_branded_template( | ||
"none", | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"migration_template.eml", | ||
("email", "email.template"), | ||
) | ||
|
||
sendEmail( | ||
sydent, | ||
templateFile, | ||
address, | ||
{"mxid": "mxid", "subject_header_value": "MatrixID Update"}, | ||
) | ||
processed_mxids.append(mxid) | ||
|
||
print( | ||
f"{len(to_delete)} rows to delete, {len(db_update_args)} rows to update in local_threepid_associations" | ||
) | ||
|
||
if not dry_run: | ||
if len(to_delete) > 0: | ||
cur.executemany( | ||
"DELETE FROM local_threepid_associations WHERE address = ?", to_delete | ||
) | ||
|
||
if len(db_update_args) > 0: | ||
cur.executemany( | ||
"UPDATE local_threepid_associations SET address = ?, lookup_hash = ? WHERE address = ? AND mxid = ?", | ||
db_update_args, | ||
) | ||
|
||
# We've finished updating the database, committing the transaction. | ||
db.commit() | ||
|
||
|
||
def update_global_associations( | ||
sydent, db: sqlite3.Connection, send_email: bool, dry_run: bool | ||
): | ||
"""Update the DB table global_threepid_associations so that all stored | ||
emails are casefolded, the signed association is re-signed and any duplicate | ||
mxid's associated with the given email are deleted. | ||
|
||
:return: None | ||
""" | ||
|
||
# get every row where the local server is origin server and medium is email | ||
origin_server = sydent.server_name | ||
medium = "email" | ||
|
||
cur = db.cursor() | ||
res = cur.execute( | ||
"SELECT address, mxid, sgAssoc FROM global_threepid_associations WHERE medium = ?" | ||
"AND originServer = ? ORDER BY ts DESC", | ||
(medium, origin_server), | ||
) | ||
|
||
# dict that stores email address with mxid, email address, lookup hash, and | ||
# signed association | ||
associations: Dict[str, List[Tuple[str, str, str, str]]] = {} | ||
|
||
# iterate through selected associations, casefold email, rehash it, re-sign the | ||
# associations and add to associations dict | ||
for address, mxid, sg_assoc in res.fetchall(): | ||
casefold_address = address.casefold() | ||
|
||
# rehash the email since hash functions are case-sensitive | ||
lookup_hash = calculate_lookup_hash(sydent, casefold_address) | ||
|
||
# update signed associations with new casefolded address and re-sign | ||
sg_assoc = json_decoder.decode(sg_assoc) | ||
sg_assoc["address"] = address.casefold() | ||
sg_assoc = json.dumps( | ||
signedjson.sign.sign_json( | ||
sg_assoc, sydent.server_name, sydent.keyring.ed25519 | ||
) | ||
) | ||
|
||
if casefold_address in associations: | ||
associations[casefold_address].append( | ||
(address, mxid, lookup_hash, sg_assoc) | ||
) | ||
else: | ||
associations[casefold_address] = [(address, mxid, lookup_hash, sg_assoc)] | ||
|
||
# list of arguments to update db with | ||
db_update_args: List[Tuple[Any, str, str, str, str]] = [] | ||
|
||
# list of mxids to delete | ||
to_delete: List[Tuple[str]] = [] | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
for casefold_address, assoc_tuples in associations.items(): | ||
db_update_args.append( | ||
( | ||
casefold_address, | ||
assoc_tuples[0][2], | ||
assoc_tuples[0][3], | ||
assoc_tuples[0][0], | ||
assoc_tuples[0][1], | ||
) | ||
) | ||
|
||
if len(assoc_tuples) > 1: | ||
# Iterate over all associations except for the first one, since we've already | ||
# processed it. | ||
for address, mxid, _, _ in assoc_tuples[1:]: | ||
to_delete.append((address,)) | ||
|
||
print( | ||
f"{len(to_delete)} rows to delete, {len(db_update_args)} rows to update in global_threepid_associations" | ||
) | ||
|
||
if not dry_run: | ||
if len(to_delete) > 0: | ||
cur.executemany( | ||
"DELETE FROM global_threepid_associations WHERE address = ?", to_delete | ||
) | ||
|
||
if len(db_update_args) > 0: | ||
cur.executemany( | ||
"UPDATE global_threepid_associations SET address = ?, lookup_hash = ?, sgAssoc = ? WHERE address = ? AND mxid = ?", | ||
db_update_args, | ||
) | ||
|
||
db.commit() | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Casefold email addresses in database") | ||
parser.add_argument( | ||
"--no-email", action="store_true", help="run script but do not send emails" | ||
) | ||
parser.add_argument( | ||
"--dry-run", | ||
action="store_true", | ||
help="run script but do not send emails or alter database", | ||
) | ||
|
||
parser.add_argument("config_path", help="path to the sydent configuration file") | ||
|
||
args = parser.parse_args() | ||
|
||
# if the path the user gives us doesn't work, find it for them | ||
if not os.path.exists(args.config_path): | ||
print(f"The config file '{args.config_path}' does not exist.") | ||
sys.exit(1) | ||
|
||
config = parse_config_file(args.config_path) | ||
|
||
reactor = ResolvingMemoryReactorClock() | ||
sydent = Sydent(config, reactor, False) | ||
|
||
update_global_associations(sydent, sydent.db, not args.no_email, args.dry_run) | ||
update_local_associations(sydent, sydent.db, not args.no_email, args.dry_run) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some parts of this paragraph are confusing, e.g.:
Isn't the issue that they did take capitalisation (btw I think case might be a better word?) into account and we don't want them to?
The Matrix ID might still be associated with the email address, just not the email address with this case.
Generally, I think this should be looked at by our wordsmiths before we can use it on vector.im and matrix.org, so I don't think fixing it should block this PR as long as we remember to ping the right people internally.