-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Enable globalization invariant mode for all runtime images #1877
Comments
My main concerns would be:
|
|
I don't think this will be a good idea. From what I am seeing, almost 90% of users will need to turn off the invariant mode and install the needed ICU packages. I saw some issues users had Invariant mode turned on and ran into problems that were not easy for them to figure out what is going on. |
I get all of this feedback, however, Alpine usage is growing. What do we do when half of our pulls are Alpine? Would that change the dynamic? We don't have data on whether people use Alpine images as is or add ICU on top. This is the best we have: https://github.com/search?q=ENV+DOTNET_SYSTEM_GLOBALIZATION_INVARIANT+false&type=Code My motivation is to enable pay-for-pay, at the possible expense of extra work and some confusion. Is 12MB worth it? Yes. This win has been valuable for Alpine, and I no longer want Debian and Ubuntu to have asymmetry with Alpine. The rationale for that asymmetry isn't justified. |
I'm missing why that's relevant. There are many other differences between the distros, no? |
I believe we would need to make changes to invariant mode to make this viable:
Is there a way to get distribution between English vs. non-English speaking countries for usage of our Alpine images? My hypothesis is that our Alpine images are used relatively less in non-English speaking countries. |
True. But this isn't one of them. It's an arbitrary choice we made for one distro.
Great thought. I'll see if we have any information that can at least point us in that direction. First, producing container images for a platform is very hard. Since docker has a single line of inheritance, you have to make a variety of trade-offs. In general, it makes sense to make to decide up-front what you value and then use that value-orientation for every single decision. Otherwise, you end up with something that has a bunch of interesting characteristics but is "blah" in aggregate. Clearly, we've decided that size is our #1 metric. In short, you have the following three choices, pick two:
We value those attributes in that order.
This is a great point. Even if 90% of users needed invariant mode disabled, I'd still have this plan. I'm focused on building a competitive product that makes .NET a great choice for those 10% of users that need the smallest size possible. I think of this topic as being directly connected to Jan's form factors doc. Based on the way Docker works, we certainly could create multiple sets of images that effectively implement multiple form factors, but we're not going to. We're going to do one, and it's going to focus on getting images smaller and smaller. We're going to make this change. We just need to decide when. Let's make invariant mode better. I hadn't thought of wasm being aligned with invariant mode. |
We strike balance between these attributes by having all Ubuntu-, Debian- and Alpine- based images. Why do have all 3 instead of just 1? I believe that it is because of Ubuntu and Debian ones are easier to use than the Alpine one. |
@tarekgh Can you give some examples? Are these things we'd be able to work around within the runtime itself? @jkotas had mentioned allowing case conversion of non-ASCII characters. If we carried this data it would only be a few KB. But if common scenarios require customers to install ICU anyway then I have a hard time justifying us carrying around our own copy of the data. @richlander Is this part of a larger effort to shrink size-on-disk for the Alpine distro? I've had some offline conversations with folks re: having "fast" (but large) and "small" (but slower) versions of our code paths. The idea is that we'd ifdef in whichever one was appropriate for the target platform. I haven't done significant analysis on how much footprint this would save overall so I don't know if it's worth pursuing. |
Ha! I wish we could have just one. The short version is this:
It's amazing to reason about pull behavior across Docker and APT, as two examples. The patterns are super different and the OSes people prefer (in aggregate) as super different. And what people value in those modalities is super different. For example, we see pretty much constant pulls in Docker, day in, day out. For APT, we see a huge surge of pulls in the first 36 hours after a release, and then back to a much lower constant set of pulls after about 5 days. |
No, it is specifically not that. We already did that, starting with Alpine with .NET Core 2.1. This is about applying that same win to Debian and Ubuntu.
ICU is 30MB+ (uncompressed). It's worth talking about ways to avoid it. We don't necessarily need to ship those data files in the runtime. We could download them for the Docker scenario. We download plenty of things today, at docker build time, and are happy to add more if there is value. Also, we shouldn't be making optimization choices around small numbers of KBs to the product in isolation. On the runtime team, we blow those away with our crossgen choices (in either direction). For example, we used partial crossgen in 3.0 to save about 10MB in container images. We can pay for your data file cost with change we find behind the couch. We have a bunch more crossgen work planned for 5.0. We don't have any insight on size impact yet. |
@GrabYourPitchforks -- It would be awesome to have this information:
|
It'd be nice to have the invariant mode more developers-friendly but at the same time as we are also having a conversation with @danmosemsft team how to make the globalization support more configurable which could help here as well. The current setup where you go either with no globalization or full-blown ICU is not enough for a growing number of form factors and scenarios .NET is targeting. |
To answer @GrabYourPitchforks question:
One example, it is reported a problem that the resource lookup is not working on one of the user machines and working fine on other machines. The user had no idea about the invariant mode and didn't know what is wrong there. Resource lookup depends on the culture parent chain which of course is not provided with the Invariant mode and the resource lookup fails to get the right resources. |
@richlander anything that involves non-linguistic case comparison will work. Consider the following examples. // In Invariant mode, returns "MAñANA" <-- note the 'ñ' was left unchanged
// Under ICU / NLS, returns "MAÑANA"
// Under invariant mode with our own casing data, returns "MAÑANA"
string result = "mañana".ToUpperInvariant();
// In Invariant mode, returns false
// Under ICU / NLS, returns true
// Under invariant mode with our own casing data, returns true
bool areEqual = string.Equals("mañana", "MAÑANA", StringComparison.OrdinalIgnoreCase); By carrying our own casing data, we can determine that This does not include support for normalization or linguistic comparisons. Consider the following examples. // In Invariant mode, returns false
// Under ICU / NLS, returns true
// Under invariant mode with our own casing data, returns false
bool areEqual = string.Equals("ss", "ß", StringComparison.InvariantCulture);
// In Invariant mode, returns false
// Under ICU / NLS, returns true
// Under invariant mode with our own casing data, returns false
bool areEqual = string.Equals("encyclopaedia", "encyclopædia", StringComparison.InvariantCulture);
For servers this is generally OK. Most server applications deal with things like identifiers, usernames, filenames, paths, etc.; so they should only ever be using For clients this is a bit more problematic. A client app would want localization and would want culture-aware textual analysis. If I visit https://en.wikipedia.org/wiki/Encyclopedia and CTRL-F and type "encyclopædia" into my browser's search box, I want it to find both "encyclopædia" and "encyclopaedia" on the page. Something like this would require the full power of ICU / NLS. Servers that need to display data in a localized fashion also fall under this latter category. If the visitor is browsing from the United States, I want to display pricing using the U.S. currency symbol ( Does this help clarify the scenarios a bit? |
This isn't being considered for 5.0 but is something we are interested in driving post-5.0. |
linking to the issue dotnet/runtime#37349 for awareness about IDN functionality difference with the Invariant mode and potential wrong behavior in the networking stack depending on IDN. |
Related (TZData): https://twitter.com/funcshawnal/status/1271825184589152256?s=21 |
Note that, having TZData is not related to enabling the Globalization invariant mode. TZData is independent bits to install to get TZ support. |
Great point. It's not directly related, as you say. My point is that it is a near-neighbor problem, with similar characteristics and UX. I'd like to start an early 6.0 proposal along the lines of Jan's comment. We should include tzdata in that. I was just talking to the wasm team about this. They expressed that they are struggling with ICU (significantly more than the Docker scenario) and would appreciate a better solution for 6.0 that doesn't require ICU. Cool? |
Is there more info here? what they are struggling with ICU? in general, it is good we start having a 6.0 proposal from now as you mentioned so we can have enough time to react to the needed change. Yes, cool :-) |
Same reason ... size impact. Size constraints of wasm are like 10x more restrictive than containers. More concretely, the wasm team is slicing and dicing ICU itself to reduce size. This isn't a great model. Mono libraries have NLS-style in-product tables/data (actually stale data copied from ICU), but the wasm project is leaving that behind since it is moving to corefx. |
@richlander - This has been dormant for a while now. Any thoughts on this for .NET 8? |
+@steveisok @lewing to advise if they still running into the size problems. @mthalman are you running into some issue because of the size? |
We pull in icu from dotnet/icu, so I do not think our workloads would be negatively impacted. @lewing ? |
This would also be impacted by whatever outcome we have from #4162. If we have a distroless Alpine offering, then we may want to make different choices with the full version of Alpine, like including icu. |
We're no longer pursuing this. |
Proposal: Enable globalization invariant mode for all runtime images
We propose to reduce runtime images by ~12MB (compressed; ~31MB uncompressed) by no longer installing the ICU package for Debian- and Ubuntu-based images, and instead rely on globalization invariant mode, by default. The .NET runtime and libraries depend on ICU, on Linux, for globalization behaviors (sorting, time zones, currency symbols, date formats, ...). We already enable globalization invariant mode and do not install ICU with Alpine runtime images.
We propose to (A) take advantage of this size improvement for Debian and Ubuntu images, and (B) make .NET images symmetric across Linux distros. In short, we like what we did for Alpine, but no longer want Alpine to be a special case.
All Linux-based .NET SDK images will continue to contain ICU. For example, Alpine .NET SDK images contain ICU, even though Alpine runtime images do not. As a point of policy for SDK images, we value UX over size, and intend for SDK images to provide a "batteries included" model. This is, in part, because it is more inconvenient, for users, to add packages to SDK images for some scenarios. This is a tradeoff, as it adds an unfortunate point of asymmetry between runtime and SDK images, but one that we believe is warranted.
We made an analogous change in #1848 where we removed a Debian- and Ubuntu-specific layer that Alpine did not have. After that change, Debian and Ubuntu SDK images are smaller, and the layering across .NET SDK images for Linux distros is now the same.
Context
As part of the .NET Core 2.0 release, we created globalization invariant mode. This feature, when enabled, removes any dependence on external libraries for globalization information by using the invariant behavior for all globalization-sensitive APIs (like sorting, understanding time zones and writing currency symbols). For many applications, this mode is a win because they are not dependent on globalization concepts and behaviors.
This new mode was developed at the same time as we added support for the Alpine Linux distro. The Alpine project is known for publishing small container images, and we wanted to do everything we could to make Alpine-based .NET Core images small. We decided to take advantage of globalization invariant mode and not install ICU in Alpine images by default, and instead let users who need globalization enable it for themselves. This seemed like a great trade-off at the time, and we haven't heard any negative feedback on it. We have however heard that many people are happy with .NET Alpine images, and have seen their usage grow considerably.
Size details
We built the dotnetapp sample a few different ways and published the results at richlander/dotnetapp. The tags listing provides the compressed sizes. The same images are displayed below, with uncompressed size information.
Legend:
The text was updated successfully, but these errors were encountered: