-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load correctly the delegated Targets objects hierarchy #1052
Conversation
Thanks, Theodora! I have just one concern: how does this handle the possibility that Alice and Bob may delegate to Charlie using different sets and thresholds of keys from each other? |
💯, 🎉, 🥳, @sechkova!!
@trishankatdatadog, the current reference implementation already does not support such "promiscuous" delegations (see #589 and for a more general description of the issue #660). |
Got it. Gods, I hate that term, let's find another name, please... |
Agreed. Promiscuous has negative connotations. In theupdateframework/specification#96 Marina used the term multi-role delegations |
Thanks, Josh. I don't think the problem is specific to multi-role delegations. I believe the idea is that Alice and Bob, on entirely different parts of the graph, can delegate to Charlie using entirely different thresholds and keys. I don't know what's a good, succinct word for this, except to say something like "delegator-dependent keys for a delegatee" (rolls off the tongue, I know). |
Yes, multi-role delegations as described in TAP 3 are something else. It's a one-to-many delegation where the delegated-to roles need to agree on a target. Promiscuous delegations are many-to-many delegations. |
This looks good to me. There's some nice clean-ups of the modified code in here too, thanks @sechkova |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this great patch, @sechkova! This is just a partial review that touches repository_lib.py
only. I'll follow up with the rest tomorrow.
Btw. I am aware that the things I flagged were already there before, but since they are in your diff, I'm asking you to fix them. :P
tuf/repository_lib.py
Outdated
@@ -811,6 +811,59 @@ def import_ed25519_privatekey_from_file(filepath, password=None): | |||
return private_key | |||
|
|||
|
|||
|
|||
def get_delegations_filenames(metadata_directory, consistent_snapshot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Would you mind calling the function something like get_delegated_role_metadata_filenames
just so that it's clear that delegation != role
? (see #660) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or maybe get_non_top_level_metadata_filenames
? 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
35f72a3 I renamed both get_*_metadata_filenames ... 🤷
tuf/repository_lib.py
Outdated
continue | ||
|
||
# Skip top-level roles, only interested in delegated roles now that the | ||
# top-level roles have already been loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the comment, the "now that the top-level roles have already been loaded" part is not in the scope of this function anymore and given the function name it should be clear what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in here: da09a22
tuf/repository_lib.py
Outdated
# Keep a store of metadata previously loaded metadata to prevent re-loading | ||
# duplicate versions. Duplicate versions may occur with | ||
# 'consistent_snapshot', where the same metadata may be available in | ||
# multiples files (the different hash is included in each filename). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has a stray "metadata" (first line) and is a bit outdated, as consistent role metadata does not use a hash-prefix anymore but a version number (last line). A comment like this still might be helpful, but maybe rather above, where we define loaded_metadata
and call sorted(..., reverse=True)
.
What do you think about something like this directly above the for loop?
# Iterate over role metadata files, sorted by their version-number prefix, with
# more recent versions first, and only add the most recent version of any
# (non top-level) metadata to the list of returned filenames. Note that there
# should only be one version of each file, if consistent_snapshot is False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also done in da09a22 , thanks for suggesting the text which I directly reused ;)
I still left one line comment for the lazy ones who won't scroll up to the start of the for loop.
tuf/repository_lib.py
Outdated
continue | ||
|
||
filenames[metadata_name] = metadata_path | ||
loaded_metadata.append(metadata_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just
if metadata_name not in filenames:
filenames[metadata_name] = metadata_path
Should be a lot faster (O(1) vs. O(n)), less code and less memory. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing, here it is da09a22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent work, @sechkova, you're quite the magician! :) Please consider improving the code comment as suggested inline (and my review comments from yesterday). Otherwise this LGTM!
tuf/repository_tool.py
Outdated
# top-level targets: [('role1', 'targets'), ('role2', 'targets'), ... ] | ||
roleinfo = tuf.roledb.get_roleinfo('targets', repository_name) | ||
for role in roleinfo['delegations']['roles']: | ||
delegations.append([role['name'], 'targets']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: The comment suggests delegations are list of tuples, but the code makes it a list of lists. Please change either one. IMO tuples are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And tuples they are!
tuf/repository_tool.py
Outdated
# Append the next level delegations to the list: | ||
# the 'delegated' role becomes the 'delegating' | ||
for delegation in metadata_object['delegations']['roles']: | ||
delegations.append([delegation['name'], rolename]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is cool stuff! But I admit it took me a while to understand what's happening. You're adding delegations to the list of delegations as you iterate over them, thus traversing the tree of delegations. :)
Out of curiosity, is there a name for this pattern?
At any rate, we should add a big comment that describes it above the outer loop definition (L3123). And we should definitely point out that cycles in the delegation graph, will make this an infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it a custom breadth-first search 😁
I tried to improve it after you pointed out the potential infinite loop.
The loaded delegations were still kept in the list of delegations anyway so the options to avoid repetitions seem to be:
if x is in list
- O(n)- replace the list with a dictionary for O(1) lookup but dictionaries are guaranteed to be ordered only in Python 3.x so the traversal won't work
- use a
deque
for traversing the delegations (keep only the next in line) +set
for storing already loaded ones
If this seems unnecessary overhead since cycles in the delegations are not likely to happen, we can keep only the deque
and get rid of the set
+ adding a comment.
Hopefully the new comments help to make things look less magical 🙂
(and this is the new commit c79ea55)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I think it just became even more magical. 😆
Regarding the options:
Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.
The need for deque
is not fully obvious to me. Are you doing this to save memory? If so, I suppose that's fine. Maybe you can leave a note about the motivation. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.
It could be better to warn during the creation of the delegation ... but I think currently there is no mechanism to detect that so ... better late than never?
Teodora is very good! |
Update load_repository() function to load the delegations metadata starting from 'targets' and traversing downwards the delegated roles in order to load correctly the delegations hierarchy. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add a tests case checking if delegated Targets() objects are loaded correctly. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Remove unnecessary list keeping track of loaded file names and rewrite outdated comments. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
tuf/repository_tool.py
Outdated
targets_object = targets_objects['targets'] | ||
targets_objects[metadata_name] = new_targets_object | ||
targets_objects['targets'].add_delegated_role(rolename, | ||
new_targets_object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic:
I find it extremely odd that the top-level Targets
object does not make a difference between roles it delegates to directly and indirectly (via other roles).
tuf/repository_tool.py
Outdated
# Append the next level delegations to the list: | ||
# the 'delegated' role becomes the 'delegating' | ||
for delegation in metadata_object['delegations']['roles']: | ||
delegations.append([delegation['name'], rolename]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I think it just became even more magical. 😆
Regarding the options:
Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.
The need for deque
is not fully obvious to me. Are you doing this to save memory? If so, I suppose that's fine. Maybe you can leave a note about the motivation. :)
tuf/repository_lib.py
Outdated
continue | ||
|
||
# Skip top-level roles, only interested in delegated roles. | ||
if metadata_name in ['root', 'snapshot', 'targets', 'timestamp']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this kind of pattern in enough places that I think it would be worthwhile to add a constant for the list of top-level role names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Maybe we can cherry-pick 7100dc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd seen that somewhere already, and the same author too. 😄
# top-level roles have already been loaded. | ||
if metadata_name in ['root', 'snapshot', 'targets', 'timestamp']: | ||
# A deque used to keep the next delegation to be loaded | ||
delegations = deque() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the use of a deque here, can you explain the rationale a bit?
AFAICT you're just using it as a FIFO, at which point a list with pop(0)
would work just as well and would be a bit more familiar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering both of your comments @lukpueh @joshuagl :
Yes, @joshuagl got it right, the idea is to use a FIFO and continuously append the next delegations while removing(pop(0)/popleft()
) the loaded one (unlike in my first implementation where the delegations were only appended and never removed from the list).
@joshuagl my reasoning about list vs deque as a FIFO
Using only a FIFO structure would be enough if there were no potential cycles in the graph, but since what we want to achieve is traverse the graph AND avoid loops, we need one ordered structure + one fast lookup structure, hence the deque+set (or one that can do both which would be the dict in Python 3.x).
And for a completeness of this discussion ...
On the other hand, up to now we don't have exact constraints in load_repository()
regarding the graph traversal and we can as well use a list as a stack (LIFO) which will result in a depth-first-search type of traversal.
I'm open to suggestions about less magical code or more pythonic way of implementing this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @joshuagl got it right, the idea is to use a FIFO and continuously append the next delegations while removing(pop(0)/popleft()) the loaded one (unlike in my first implementation where the delegations were only appended and never removed from the list).
Sure, but your first approach already implemented a FIFO queue, didn't it? The only difference that I saw is that the list grows in your first approach but does not in your second.
# Go from left to right, adding to the right
↓
| 0 |
↓
| 0 | 1 |
↓
| 0 | 1 | 2 |
# Pop from left, add to the right
↓
| 0 |
↓
| 1 |
↓
| 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hence the deque+set (or one that can do both which would be the dict in Python 3.x).
Haven't checked, but would collections.OrderedDict
have the desired properties cross-version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, up to now we don't have exact constraints in load_repository() regarding the graph traversal and we can as well use a list as a stack (LIFO) which will result in a depth-first-search type of traversal.
AFAIU, when loading the delegation graph into memory it really shouldn't matter if we do depth or breadth first. This is only important when verifying targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hence the deque+set (or one that can do both which would be the dict in Python 3.x).
Haven't checked, but would
collections.OrderedDict
have the desired properties cross-version?
Probably it would do the job, I think I had overlooked it because it sounded outdated from the perspective of the newer Python versions ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO really any of the approaches discussed here (FIFO or LIFO with list + set, deque + set or OrderedDict) is absolutely fine, I think I would even have been okay with not guarding against infinite loops, as long as it is documented. :)
So what do you think about keeping it as it is and adding another comment about saving memory by using a deque + leftpop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- warning the user when a loop is encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- warning the user when a loop is encountered.
Is this related to the sentence above ("okay with not guarding against infinite loops") but appearing as a bullet? 😁
Otherwise, yep, it works for me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, this was meant to be a +
as in:
... what do you think about adding another comment ... plus warning the user when a loop is encountered?
Replace the list used for the delegations graph traversal with a deque and use a set to store already loaded roles and avoid loops in case of cycles in the graph. Improve comments and readability. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Use the top-level targets object to reference already loaded delegated targets instead of storing them in an additional dictionary in load_repository(). Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Rename repository_lib.get_metadata_filenames() and get_delegations_filenames() to better match their functionality and tuf terminology. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add TOP_LEVEL_ROLES as a global variable in roledb. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 Thanks for addressing all the comments! :)
Fixes issue #: #1045
Description of the changes being introduced by the pull request:
load_repository()
function to load the delegations metadata starting from 'targets' and traversing downwards the delegated roles in order to load correctly the delegations hierarchy.metadata_directory
files in a separate functionget_delegations_filenames()
in a similar way in which top level metadata files are loaded byget_metadata_filenames()
.test_load_repository
.Please verify and check that the pull request fulfills the following
requirements: