-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Check field types in database emulation #4862
Conversation
There is currently a problem with a table that has an index on a certain field, but is trying to insert values of a type not matching that index. It fails in Dynamo, which doesn't allow that. However, the local emulation layer doesn't catch that, because it doesn't know anything about types that need to be checked. Add a required `types` argument to `Table`, which must declare at least the key fields (of the table itself and of the indexes), and can optionally declare the types of other fields. It will be an error to insert values of the wrong type into the database. This may look like an easy feature, but it has some edge cases you need to be aware of: * First of all, it relies on users declaring the indexes in code; they can choose NOT to declare the indexes in code, and there may still be an error after deploying. * Second of all: just because you declare a type on a fields saying what may be inserted into a table going forward, this can not make any guarantees on data already in the table. Don't use this feature to make any assumptions on data you read out of the table, only use it to sanity-check new data going into it! We may decide this change is not worth it... (for example, it would not have caught the current problem, as the index was there in Dynamo but not declared in code, so this feature wouldn't have caught it).
for more information, see https://pre-commit.ci
Let's see how badly this breaks on the Cypress tests... :) |
Not sure anybody really wants or needs this, so I'll be closing it. |
'created': int, | ||
'classes': Optional(SetOf(str)), | ||
'keyword_language': Optional(str), | ||
'password': str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this!!! I think this makes a lot of sense. More typing annotations on our DB would be a huge win. I think the work now would be to adapt the definitions of all the tables. I can handle that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 that would be great, thanks! You probably also have a better view into what the currently expected data contract is
for more information, see https://pre-commit.ci
I am picking up #4678 so it would be nice if we can merge this 🙏 |
@@ -88,7 +88,7 @@ def times(): | |||
return int(round(time.time())) | |||
|
|||
|
|||
DEBUG_MODE = False | |||
DEBUG_MODE = not os.getenv('NO_DEBUG_MODE') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it for now, but maybe this is not the cleanest way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this always set it to True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, it's what we are already doing:
Lines 2908 to 2912 in 6e7573d
# Start the server on a developer machine. Flask is initialized in DEBUG mode, so it | |
# hot-reloads files. We also flip our own internal "debug mode" flag to True, so our | |
# own file loading routines also hot-reload. | |
no_debug_mode_requested = os.getenv('NO_DEBUG_MODE') | |
utils.set_debug_mode(not no_debug_mode_requested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better solution is probably to control the initialization order of all these things better, by making the tables class members of Database and initializing it when the app is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense too.
Oh shit, yes that makes sense. |
website/dynamo.py
Outdated
def is_valid(self, value): | ||
if not isinstance(value, dict): | ||
return False | ||
first_type = list(self.inner.keys())[0] | ||
if isinstance(first_type, InstanceOf): | ||
return all(first_type.is_valid(k) and self.inner.get(first_type).is_valid(v) for k, v in value.items()) | ||
else: | ||
return all(validator.is_valid(value.get(key)) for key, validator in self.inner.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rico, what do you think about this? I changed that you now can specify str
in the left-hand side of the validator dictionary. This way we can be a little more generic in our type checks, like this one:
'sorted_adventures': RecordOf({
str: ListOf(RecordOf({
'name': str,
'from_teacher': bool
}))
It's important to note that you can only use str
and no other type, since that would go against Dynamo naming rules either way. This is just a way to relax the type checking a little bit without needing to use dict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "record" (a set of known fields, with known and potentially different types for every value) is a different kind of type than a "map" (can be any field, but the types are the same).
Examples:
person = {
"name": "Bertus",
"age": "67"
}
favorite_restaurant_in_city = {
"Oegstgeest": "Suzie's & Van Aken",
"Amsterdam": "Ron's Gastropub",
}
In Python, you might use a dict
to represent both of these, but the type-of-type is very different!
They are:
person :: { "name" -> str, "age" -> int }
# but
favorite_restaurant_in_city :: { str - > str }
So I wouldn't overload RecordOf
in this way, because it looks like it will just get confusing. Also, it would allow you to construct this... now what does that mean?
RecordOf({
str: str,
"age": int
})
I would make a MapOf(str, str)
or DictOf(str, str)
or something.
But you have to explain the differences very clearly, because I can see if even you are confused about it, there is a good chance other people will also not realize the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make a MapOf(str, str) or DictOf(str, str) or something.
Hmm I think this would make sense! In essence what I'm trying to do is relax the checking a bit, so another name for that type of validation makes a ton of sense.
So I wouldn't overload RecordOf in this way, because it looks like it will just get confusing. Also, it would allow you to construct this... now what does that mean?
RecordOf({ str: str, "age": int })
I thought of that! If you use str
in the left hand side, then you can't use a named field (and all other classes aren't allowed)
Lines 1677 to 1680 in dd20d55
if str in validator_dict.keys() and len(validator_dict.keys()) > 1: | |
raise ValueError(f'If you specify str as part of the validation, you cant define more type checks.\ | |
You dindt do that for {validator_dict}') | |
type_dict = {} |
But you have to explain the differences very clearly, because I can see if even you are confused about it, there is a good chance other people will also not realize the difference
Yes, I need to add something to the wiki and to the comments.
Seems like adding an element to a set is giving errors, for example adding a student to a class gives this:
|
You're right, I made a mistake. Fixed it and added a test. |
@rix0rrr I think I'm finished tweaking the PR! Added types for all of the tables and also added two new type validators: |
❤️ Love it! Instead of Still not sure about the initialization of Feel free to ship this at any point (since I'm the author, you need to be the shipper 😉 ) |
Damn, you're good at coming out with names!
I think it's best to tackle that in a new PR, so ship this one and then take your suggestion of explicitly initializing the tables in a new PR. @Felienne wants this PR ready because it'll help her with the live documentation of the tables.
Understood 🫡 |
Thanks for taking this across the finish line, and fixing my gigantic fuckup of the code not actually doing anything 🫣🫠😆 |
Hahaha the core idea was so cool already! Thanks a lot for that!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go! And hope nothing breaks !
Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork). |
There is currently a problem with a table that has an index on a certain field, but is trying to insert values of a type not matching that index. It fails in Dynamo, which requires that all values used as index keys are of a single declared type (either string, number or binary).
However, the local emulation layer doesn't catch that, because it doesn't know anything about types that need to be checked.
Add a required
types
argument toTable
, which must declare at least the key fields (of the table itself and of the indexes), and can optionally declare the types of other fields. It will be an error to insert values of the wrong type into the database.This may look like an easy feature, but it has some edge cases you need to be aware of:
We may decide this change is not worth it... (for example, it would not have caught the current problem, as the index was there in Dynamo but not declared in code, so this feature wouldn't have caught it).