-
Notifications
You must be signed in to change notification settings - Fork 22
Handling slightly different URLs to same project #27
Comments
Sorry, with a swamp of github notifications and a conference last week, I totally missed these... The SM assumes (or will - I haven't dealt with it just yet) there is a canonical expression of a URI that underlies each URL, and stores them in the same location designated by that URL. I don't have it handy, but there's a standard transform for git, at least, that derives this URI; @technosophos pointed me to one in Deis a while back. There's an express assumption here that you can't have different branches or tags from different URLs corresponding to the same URI. That's not a capability that basic git services provide, at least. Though...gitolite might? Is that what you're referring to by "private git installs"? Also, do bzr or hg, or one of their major hosting platforms allow that? Even so, because the cache's location is configurable, and is managed using the current user's permissions, I'm not sure that a 'leak' is so problematic here. But I can see how this might be a problem. Examples of how and where it actually comes up - basically, what vcs hosting circumstances - would help me assess. |
also, I'm sorta loathe to link #10 b/c it's just my braindump, but...well, it's where I sorta braindumped through this. might be helpful for discussion. |
I've only got a moment so I'll come back to this later but... gitolite does have access rules for branches. This is one case and I do know of people using gitolite. I need to do some digging for other cases. |
Rawr. Indeed they do - thanks for the link. OK, reading through those docs...just shooting from the hip, it seems like there are two, not mutually exclusive scenarios to consider:
The latter case is the one that unequivocally requires a different place on disk...and, really, is kind of a violation of the concept of a URI in the first place - if the same properties have different values, then it's a different resource. The former case - which, if I'm grokking correctly, is the one gitolite's access controls enables - is more like just seeing some of the resource's properties. And it actually might not be a reason to keep the URIs separately. Here's my thinking: While vsolver itself doesn't internally enforce that these URLs are coming from a manifest file, or that that file is committed to version control and being shared by a team, there's basically no reason to use a
That's all really just a way of saying that there's no point in distributing a manifest - which is explicitly intended to be as widely shared as the containing repository - to people who won't be able to use that manifest. Really, I think this is actually the same problem as distributing a manifest with a totally private repository, just with slightly different, more uneven failure modes. What's salient here is, providing a different on-disk storage location for a different URI doesn't actually solve a problem. The cache isn't enforced to be per-user, but that's the expectation, so unless you're doing sub-user level multi-tenancy (in which case, not our problem), if the cache is already there from a different URI, it's because the user had legit access. There's a bunch more rabbit holes I could chase, but it already seems like the sort of thing people would have to intentionally screw up in order to really have it go wrong. OH SHIT Lemme invert it real quick: gitolite's actually not a great example, because gitolite over ssh won't have a per-user URL. HTTP would, but it doesn't matter, because it means sameness of URL is irrelevant as a predictor of whether a URI will provide the a complete, incomplete, or empty resource set. Like I said, this is just shooting from the hip, but that last bit in particular feels pretty definitive. If I've missed something, let me know...but I think the real decider here is if we can find a case where a different URL for a URI can produce branches/tags pointing at different revisions. That'll mean we definitely need to separate storage. |
Two quick things...
|
OK, but as my previous line of thought laid out, permission-level differences don't seem like a good reason to separate otherwise-equivalent URIs. If you can think of some holes in that line of thinking...
So we risk getting into a really pedantic discussion here, but...well, I'm not sure how meaningful it is to call those "different URIs." The spec indicates that a URI indicating a protocol "does not imply that use of these URIs will result in access to the resource via the named protocol." And it's certainly quite clear that the goal is to "allow uniform semantic interpretation of common syntactic conventions across different types of resource identifiers.". This doesn't necessitate that we should consider these URL/URIs that are only differentiated by protocol as pointing to the same resource. But it does mean that's a valid way to look at it - which means it's a question of whether the application we're working with looks at it that way. Git, as far as I know, does expect them to be equivalent, though it's certainly possible to configure a git hosting service that violates the constraint. It seems that helm, at least, figures that normalizing is safe for git. idk about hg/bzr/svn, but I'd be very surprised if they didn't try to maintain this invariant. This is why I'm looking for examples of a hosting service that actually does violate the equivalency relationship between different URIs where the identifier component is the same.
It's certainly safer, but it's more complex for vsolver than for glide. We have to maintain a permanent cache, access to atomic parts of that cache, metadata about that cache, and still make the code therein available for cross-project analysis. Doing this would require keeping all local clones in a separate directory structure based on full URL, then dynamically moving them into place when (cross-project) analysis is required. That then incurs new cleanup requirements, which also incurs new startup sanity-checking requirements. And more complications for any attempts to convert the There's a decent chance this all ends up being necessary at some point. But it's a lot of extra complexity right now, all to solve a problem that still strikes me as remote. |
I've changed my mind on this. While I still think the above rationale applies (that is, I don't find the different-access-over-different-protocols argument convincing), it's now pretty much moot. My reason for not wanting to do this was because I was hoping we could keep a complete, correct We need, for example, https://github.com/sdboyer/semver to be able to reside temporarily at So, being that keeping repos on a separate path and moving them into place on That said, I don't think I'll make e.g., |
Don't you control the directory you scan for dependencies? For example, when a reference to |
tl;dr - my thinking's updated since even that last comment, and i don't think repo movement is necessary at all, as you say. yep yep, i do control the dir, which is why the current approach was OK back when we discussed in mid-May. what i was worried about was the cross-package analysis (#67) - so, for example, needing to have both that worry was based on the assumption that the type analysis would ultimately come back to however, having now banged out the package tree parsing logic and spent a lot of time in the guts of |
I'll need to look again but I think you can use |
you can, if you're walking the directory tree yourself and call e.g. the problem i was anticipating was that, when jumping between packages, anything it was an imaginary problem, though, because no lib exists that does this sort of type checking, afaik. so, i have to write it anyway (while hoisting a bunch of private code out of the toolchain). writing it myself means i'll be in control of what directory corresponds to what import path - thus, no problem. |
I'm actually going to move this out of MVP. While it's what I'm actively progressing towards, it's necessarily part of a much larger set of changes (#83) which aren't strictly necessary to move forward. It'll mean a bit of an interface hiccup for glide when I do get that merged, but the overall effect shouldn't be too significant. |
With #83 in, this should now be all wrapped up |
How does the source manager handle the case of two URLs to the same location. For example,
https://github.com/foo/bar
git@github.com:foo/bar
This case can arise in a global cache where two different projects specify the same location two different ways.
In some systems the (e.g., private git installs) there may be a difference in available branches (e.g., private branches) and the access that entails. Don't need that to leak.
How does the source manager handle this so it can be used in the
GOPATH
and scanning?Glide handles this by creating a key for the cache that's based on the URI. So, the two examples above would be two different cache entries.
The text was updated successfully, but these errors were encountered: