Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Introduce CKAN-meta-Legacy #1975

Open
Dazpoet opened this issue Jan 8, 2017 · 26 comments
Open

Proposal: Introduce CKAN-meta-Legacy #1975

Dazpoet opened this issue Jan 8, 2017 · 26 comments
Labels
Enhancement New features or functionality

Comments

@Dazpoet
Copy link
Member

Dazpoet commented Jan 8, 2017

I saw this brought up by @Ruedii on irc and thought I should chime in.

Problem

CKAN-meta master is getting to be a pretty big download (8+MB) and when unzipped it takes up 40+MB on disk (I'm on Win10 running NTFS with whatever the standard settings are) even though the real size is just some 13MB.

This potentially makes CKAN run slower as the size will only continue to increase over time.

Proposed solution

Trim the size of CKAN-meta master by moving metadata for older versions of KSP to a legacy metadataarchive (CKAN-meta-Legacy) based on some breaking point ksp_version = X.

How do we choose the breaking point?

I think at this point the breaking point should be ksp 1.1.3 so that anything with a ksp_version < 1.1 in the metadata should be moved to a legacy archive. I base this on the fact that RO is not yet out for 1.2.x and neither is FAR which probably means that a lot of players are still stuck running 1.1.3 while waiting for key mods. Using RO as a measuring stick is probably pretty smart since it has a rather comprehensive depends and recommends list and is normally pretty stringent when it comes to not releasing before it's ready. To me it seems it is often one of the last mods to come out.

But we are the COMPREHENSIVE Kerbal Archive Network, will we become just the KAN now?!

Obviously not, we are not removing metadata, just moving it. Introducing a Legacy option shouldn't be harder than moving all metadata with ksp_version < 1.1 to a new repository (or is branch better?) named e.g. CKAN-meta-Legacy and then adding it to the repositories list

This can probably be done easily by a smart person and regexp, however I am not that smart person or this would've been a PR rather than an issue.

What about the users running ksp version < future breaking point when the Legacy archive is huge?

They'll just have to download the whole shebang. This usergroup will probably be relatively small and well-prepared for the potential problems that their choice creates.

Problems with this proposal

Mods with "ksp_version" : "any" and a huge amount of releases

I have no idea how to resolve this, I don't even know if it's a real problem just yet but it might be in the future. For these keeping Y versions of backlog might be enough. Might require some manual checking every now and then but I hope it won't be a gigantic issue at this point.

Someone will have to do it...

...and I don't know how. Hoping someone finds this worthwhile though and knows a good way of doing it :)

Users still running e.g. 0.90 will suddenly need to an additional step to reach their mods

This is probably mostly a communication issue but needs to be handled seriously. I propose that a change like this one is occurs near or directly after a CKAN release so that we can atleast give users with autoupdate a heads up that it's coming. Hopefully the impact will be low though since I doubt we have a lot of users still running KSP < 1.1, or atleast I hope so!

@politas
Copy link
Member

politas commented Jan 8, 2017

Perhaps we have another repository for "ksp_version" : "any" mods? I very much like this idea. I've been getting concerned about the size of the repo, and it';s obvious that we need to split it up at some stage.

Perhaps we could use the builds.json in each repository to define the versions of KSP it covers, and we can then add some code to the "Add an install" procedure to inform users if the KSP version in the install they have added is not covered by their current repository selections.

For mod versions that cross repository boundaries, I can't see a fundamental problem with havingg duplicated ckan files across repos. That shouldn't add hugely to the size.

Maybe we should maintain CKAN-meta as a complete repo, and have some process that duplicates ckans into relevant repos. That doesn't sound too complex.

@techman83
Copy link
Member

I wonder if there is a way to do it without splitting the repo?

@politas
Copy link
Member

politas commented Jan 9, 2017

I suppose we could build a separate .tar.gz file for each KSP version including all the ckans that support that version without having to have separate repos?

@techman83
Copy link
Member

That'd be doable, we'd have to subscribe to the webhooks and process them ourselves though. It'd be doable. Question being which is the saner option?

One Repo:

  • Pros
    • No repo split
    • Would be automated
    • We have a lot of the code already to do it
    • Future proof
  • Cons
    • Needs to be autoamted
    • Adds an external dependency to the metadata production
    • The code we have is Perl based and I'm the only Perl dev

Split Repo:

  • Pros
    • No external dependencies
  • Cons
    • Involves messing with the repos and metadata
    • Not very future proof

@politas
Copy link
Member

politas commented Jan 10, 2017

One Repo votes:

I'll give a thumbs up to One Repo, with the additional note that I know a little Perl and would be happy to learn more in the process of adding this, if you have the time to do most of the Repo-side coding. As I see it, there's an additional pro that we should be able to implement it while the existing system is still working, so we've got clear stages.

@politas
Copy link
Member

politas commented Jan 10, 2017

Split Repo votes:

And I'll give a thumbs down to Split Repo, because it seems like adding technical debt without ultimately solving the problem.
On the other hand, Split Repo does mean we're putting a cap on the size of a single repo, though we're two orders of magnitude below hitting GitHub's Repo max size suggestion.

@Dazpoet
Copy link
Member Author

Dazpoet commented Jan 10, 2017

As someone who has already been through the split repo thing once I didn't particularily like it since it made issuereports very VERY annoying to deal with. If it's possible to do in one repo I'd much prefer that tbh.

@ayan4m1 ayan4m1 added the Enhancement New features or functionality label Jan 13, 2017
@Ruedii
Copy link

Ruedii commented Jan 20, 2017 via email

@politas
Copy link
Member

politas commented Jan 21, 2017

Hi @Ruedii, thanks for getting involved!

As I said, i fear that the split repo model with an arbitrary divide without some logic behind it adds technical debt to the project without providing an ultimate solution.

The Single repo model I am thinking of is that all the .ckan files are in a single repository, but we create a separate .tar.gz file of .ckan files for each KSP version (whether we break it up by major, minor, patch or build level is a question we have to decide), as well as the full .tar.gz of the entire repository. Then we change the default repository rules in the client to point to the relevant .tar.gz file for the KSP version. Older versions of CKAN would not be affected, they would still see the full repo. We can implement the multiple tar.gz files and people can utilise them manually until we implement an elegant solution in the client.

@Ruedii
Copy link

Ruedii commented Jan 21, 2017 via email

@politas
Copy link
Member

politas commented Jan 21, 2017

If we split by Major version, we'll have one for all the 0.x.y and one for all the 1.x.y And pretty much the only crossover will be the "any"s

@techman83
Copy link
Member

I'd probably go with a config file, define our split points. Anything not defined or set as 'any' will end up in the "current". Future splits will be a matter of updating the config file in the repository and the code should take care of the rest.

I actually don't think it would be ludicrous amounts of code, a lot of the infrastructure is already in place to achieve it.

@politas
Copy link
Member

politas commented Jan 22, 2017

Oh yeah, there's not a lot of work to do. The client already has multiple repository support. I'm not sure how it handles duplication between repos, but I suspect given issues we've had in the past that anything with the same name will get merged into a single line in the modlist. (It really ought to be by identifier, but I guess that's trickier).
I think 1.0 is a good split point. @techman83, are you cool with sorting out the separate .tar.gzs? I like the idea of a config file to define the split. Could we use the existing builds.json file and add a new field per entry to say which .tar.gz that version should go into?

@Ruedii
Copy link

Ruedii commented Jan 28, 2017 via email

@politas
Copy link
Member

politas commented Jan 29, 2017

If we stick with a single Repo and separate tar.gz files, that's not an issue.

@Ruedii
Copy link

Ruedii commented Jan 29, 2017 via email

@techman83
Copy link
Member

@politas I have some ideas on how to achieve it. When I get some spare cycles I'll have a crack at it.

@Ruedii I'm not sure what benefit combining them all into one file would give us. It certainly would add more complexity to the process.

@techman83
Copy link
Member

I've opened a PR in KSP-CKAN/NetKAN-bot#58, I need to write a full description of how it works. But the basic design is:

  • CKAN-meta will have a 'releases.json' of which we can define release boundaries
  • Releases will be pushed to Orphaned Branches, which will automatically be available in the same manner as any other branch.tar.gz
  • The releases will bail out if an attempt is made to orphan to Master or Staging
  • We can redefine the boundaries in the future, however CKANs will not be removed from existing orphaned branches. So the process would be to alter the releases.json, remove the relevant branchs and run the gen-releases --all again (which takes ~11 seconds currently).

From my offline testing so far:

current!CKAN-meta> find . -name *.ckan|wc -l
4828
current!CKAN-meta> checkout middle
Switched to branch 'middle'
middle!CKAN-meta> find . -name *.ckan|wc -l
4566
middle!CKAN-meta> checkout legacy 
Switched to branch 'legacy'
legacy!CKAN-meta> find . -name *.ckan|wc -l
237
legacy!CKAN-meta> checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
master u=!CKAN-meta> find . -name *.ckan|wc -l
9631

4828+4566+237 = 9631

I'll fix up the broken travis tests (they all pass locally though) and do a more thorough write up in the PR. But feel free to have a look and make suggestions. For reference, currents lower boundary is set to '1.1.0'.

Example releases.json (order of the array is important as 'any' goes into the first entry):

{
    "releases": [
        {
            "lower": "1.1.0",
            "name": "current"
        },
        {
            "lower": "0.90.0",
            "name": "middle",
            "upper": "1.1.0"
        },
        {
            "name": "legacy",
            "upper": "0.90.0"
        }
    ]
}

@politas
Copy link
Member

politas commented Mar 12, 2017

"Legacy" seems a little light. What numbers do you get if you set

"current" - 1.2.0 - *
"middle" - 1.0.0 - 1.2.0
"legacy" - * - 1.0.0

@techman83
Copy link
Member

Oh that was just an example. It's about right, we didn't have many mods below 0.90.0 as 0.90.0 was where things started to really take off mod wise. We probably don't need to split 3 ways either, it was more of an example of what's possible.

@politas
Copy link
Member

politas commented Mar 13, 2017

I think a lot of < 0.90 mods have just been lost when Kerbalstuff shut down, too. Most ARR mod makers didn't bother to load up their old releases, even if they did move to another hosting platform.

@Gryffen1971
Copy link

Here is a suggestion;
How about having CKAN remove all unnecessary entries from the registry.json file. What i mean by that is have CKAN check when updating the repositories it uses by only looking for entries for the lastest version(s) of Kerbal Space Program. As of right now my registry.jason file is running around (22,529kb). Removing those entries that are for earlier version of Kerbal Space Program will reduce the file size. I will attach my json file for reference if, needed.

@politas
Copy link
Member

politas commented Oct 13, 2018

@Gryffen1971 , if we process the whole repo and then purge the non-relevant mod versions then:

  • we actually make everything a little slower, as it has to spend time deciding which mod versions to drop, and
  • users will be unable to install incompatible mod versions, which is overwhelmingly popular

@Gryffen1971
Copy link

@politas, thanks for informing me about that. I had forgot that it would take longer for it to purge the information that is non-relevant and your right it would slow down the process. Skipped that one in my thought process. Thanks for reminding me about that.

@Ruedii
Copy link

Ruedii commented Oct 14, 2018

For incompatible mod versions I would recommend adding access to the second repository.

This is also why I recommended only using a split.

The version I would use as the barrier for the split as either 0.8 or 0.9 This is because this was where several major changes were put in the KSP code. These are also the moving to "pre-release" state from "early access" state.

@Ruedii
Copy link

Ruedii commented Oct 19, 2018

Oh, as a note, I recommend the following implementation:|

  1. Items are added into "CKAN Full" repository when added.
  2. A build script is run automatically to push them to the various versioned repositories.

I would actually put the following repositories:
CKAN Current: Only the past few versions included. (Currently either 1.4+, 1.3+ or 1.2+)
CKAN Recent: Fairly far back, a good two-point split excluding CKAN Current. (Currently 1.0+)
CKAN Legacy: Anything not in CKAN Recent. Lots of old stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New features or functionality
Projects
None yet
Development

No branches or pull requests

6 participants