-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Increase the package size limit on NuGet.org from 250 MB #9473
Comments
I'm going to move this to NuGet/NuGetGallery since the 250 MB restriction is enforced on NuGet.org, not by client tooling. |
@AsakusaRinne, storing ML models on NuGet.org hasn't been fully thought through or designed from the service and tooling standpoint. Generally speaking, our software works best for our primary use case: .NET packages containing a relatively small payload. Currently, packages in the 150 MB+ range on NuGet.org account for less than 0.1%: This is the current distribution (all sizes in MB)
I use this to demonstrate how NuGet.org is being used today. More than 95% of all packages are less than 5 MB (i.e. very far from the current limit of 250 MB). I'll give you some examples of things that don't work as well for very large packages:
All in all, the NuGet tooling (both server and client) doesn't work well for large packages, especially in non-ideal network conditions. For us to ship large package support and have it be a great experience for the majority of our users, I think we'd need to do some work to improve our software. If we simply increase the limit to a much higher value (for example 1 GB), my team would undoubtedly get even more reports and customer support requests about package upload or download issues. And the best mitigation we'll be able to provide in this case is "try again" or "use a faster internet connection". For users, especially our less experienced or less technical users, this isn't a very helpful answer and would lead to frustration. That being said, the currently limit was selected several years ago, so the majority of our users may be in a better position to handle a larger limit now. I don't know of any data to support this theory in our user cohort, but I think it's a safe assumption. There could be a middle ground where we made no changes to our software and simply increase the limit by a modest amount. Even this would require some testing to ensure our backend service can handle this change effectively. It's hard to know what that new limit should be. We'd need to select it confidently and avoid any take-backs (i.e. reverting the change due to unexpected problems). In the short term, you can consider working around the problem by downloading the ML model (or, generally, any large file that can't fit inside the package) in an MSBuild target (docs). I've seen the Uno.Wasm.Bootstrap package do this to download an extra toolchain that is outside of the package. Here's an example: Uno.Wasm.Bootstrap.targets, UnoInstallSDKTask.cs. You of course will need to host the data file yourself, but this could ease the installation process by automating it at build time. And, importantly, doing a download at build time may be seen as unexpected behavior so be sure to document this behavior at the top of your package README and description so package consumers are not surprised by this flow (as mentioned in our recent blog post unexpected behavior). Again, this is only a workaround and will need to be implemented and maintained by the package author. Another workaround would be to host your own package feed that contains these packages and instruct your users to add your "large package" or "ML data" feed as an additional package source in their client tooling. For more information, see Hosting your own NuGet feeds. We'll leave this issue open since it is indeed unresolved, and we'll gauge the priority w.r.t. to our other work based on upvotes. |
Thanks a lot for your answer. :) I'll try MSBuild or other ways as a work around. |
@AsakusaRinne There is an undocumented I can't find much documentation on it, but this blog might be a good start. https://natemcmaster.com/blog/2016/05/19/nuget3-rid-graph/ I've updated the parent comment to help collect upvotes. Thank you for filing this issue! |
Note that https://github.com/dotnet/TorchSharp as far as I know has the same issue with nuget package size limits and NVidia cuDNN dlls being too large (a single dll is too large e.g. You can see this in https://www.nuget.org/packages/libtorch-cuda-11.7-win-x64/ |
After learning from A small tip for others with same problem: rather than config everything in the repo, I choose to manually make the nuget package by |
@AsakusaRinne is that package public? Would be great to have the example for reference 😊 |
Sure, here it is: SciSharp.Tensorflow.Redist-Linux-GPU.2.11.0 The package depends on four other packages: primary, fragment1,fragment2 and fragment3. The fragment packages contain the fragments of the large file, while When the project is built, the If there is anyone else facing the same problem in the future, there's a simple way to re-use our
After these steps, the package will be ready for publish (but test locally first). It's a plain approach but is simple to make it. Hope it could help you. |
Just throwing some ideas: ZIP format (which is used for .nupkg files) supports "volumes" enabling splitting larger archives into smaller chunks. That might be considered as a potential way to enable larger packages without concerns of having to be able to download large files in one go. However, it would also likely require protocol update (as well as support from both client and server) and backwards compatibility issues. |
FYI regarding "Improve handling of native packages (Support RID specific dependencies)" NuGet/Home#10571 discusses issues related to this and shows how https://www.nuget.org/packages/libclang is packaged via the "Should runtime. packages be listed in NuGet.org?" dotnet/core#7568 similarly discusses issues around this and points to https://www.nuget.org/packages/Microsoft.NETCore.App package which has multiple runtime specific "sub-packages" https://www.nuget.org/packages?q=Microsoft.NETCoreApp.Runtime For {
"runtimes": {
"linux-arm64": {
"libclang": {
"libclang.runtime.linux-arm64": "16.0.6"
}
},
"linux-x64": {
"libclang": {
"libclang.runtime.linux-x64": "16.0.6"
}
},
"osx-arm64": {
"libclang": {
"libclang.runtime.osx-arm64": "16.0.6"
}
},
"osx-x64": {
"libclang": {
"libclang.runtime.osx-x64": "16.0.6"
}
},
"win-arm64": {
"libclang": {
"libclang.runtime.win-arm64": "16.0.6"
}
},
"win-x64": {
"libclang": {
"libclang.runtime.win-x64": "16.0.6"
}
},
"win-x86": {
"libclang": {
"libclang.runtime.win-x86": "16.0.6"
}
}
}
} |
Hi folks, I'm late to this one although I've had various conversations about it in the last years. I propose that we double the package size limit on NuGet.org for a number of reasons.
In an effort to accommodate larger libraries, make it easier to distribute, reduce the need for splitting, and remain competitive, these reasons should be enough to consider this change to help interim while other products are introduced in the market to solve specific problems like hosting large AI models and distributing them appropriately. The cost will be that we won't have great answers to scaling challenges as mentioned by Joel earlier in this thread. I think majority of people would be okay with that given the benefit outweighs the cost right now. |
A greater package size limit would also be helpful for my packaging of Zig toolsets: https://www.nuget.org/packages?q=vezel.zig.toolsets Right now, I publish a package per build machine RID (not target RID!), so I'm hoping to combine these packages into a single package one day, as it would significantly simplify the user experience, and allow the toolset packages to be used in more scenarios. But as you can see, each package is already ~75 MB, so combining 10 of them immediately runs into the package size limit. |
To add another one: The same problem with TensorFlow bindings for .NET Android in GooglePlayServices |
Leveraging NuGet for hosting packages that include ML models simplifies a few workflows:
|
NuGet Product(s) Affected
NuGet.exe, Other/NA
Current Behavior
The NuGet package has a size limitation of 250M.
Desired Behavior
Change the limitation to 500M or higher.
Additional Context
As mentioned in NuGet/Home#6208 (comment), several years have passed and it's easy for machine learning packages to be larger than 250M. For example, I'm one of the authors of Tensorflow.NET, the linux cuda binary reaches 400M (after compressed by NuGet). It's a big inconvenient if users have to download the binary them self and find the right place to put it.
Please 👍 or 👎 this comment to help us with the direction of this feature & leave as much feedback/questions/concerns as you'd like on this issue itself and we will get back to you shortly.
The text was updated successfully, but these errors were encountered: