Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove case insensitivity by changing the approach towards style insensitivity #547

Open
IgorGRBR opened this issue Jan 5, 2024 · 45 comments

Comments

@IgorGRBR
Copy link

IgorGRBR commented Jan 5, 2024

Abstract

Instead of removing underscores and forcing every letter to lowercase, replace every occurrence of underscore followed by any-case letter with uppercase letter in snake_case identifiers, and keep camelCase identifiers as-is.

Motivation

Current style insensitivity algorithm can produce a few problems internally when trying to represent compound words that can be split more than one way in identifier names, as the internal all-lowercase identifier names are indistinguishable in certain scenarios (gameStop vs gamesTop, index vs inDEX, superb_owl vs super_bowl, island vs isLand, etc).

Description

I propose for identifiers to store their names with camelCase (or snake_case, or any other way to store separate words inside of a identifier), as a case of a letter or an underscore in the name clearly delimits the words used inside that name. This should resolve the issue presented above by allowing separate words to be represented properly.

Conversion between styles would preserve the words by forcing any letter preceded by an underscore to be uppercase when converting from snake_case to camelCase. When converting from camelCase to snake_case, just reverse the process.

Code Examples

No response

Backwards Compatibility

Macros will break if they generate an all-lowercase identifiers, but other than that I'm not aware of any other backwards-incompatible changes this would introduce to Nim.

@IgorGRBR IgorGRBR changed the title Remove case insensitivity by slightly changing approach towards style insensitivity Remove case insensitivity by changing the approach towards style insensitivity Jan 5, 2024
@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

As for me better to add (again) style skins (but more generalized):

# possible syntax
import strutils

proc processIdent(s: string): string =
  var afterWhitespace = false
  for i in 0..<s.len - 1:
    if s[i] == '_' and s[i + 1] != '_': result.add s[i + 1].toUpperAscii
    elif not afterWhitespace: result.add s[i]
    afterWhitespace = s[i] == '_' and s[i + 1] != '_'
  if not afterWhitespace: result.add s[^1]

proc sameIdent(a, b: string): bool =
  processIdent(a) == processIdent(b)

import std/syntaxskins # it contains also default skins
const mySkin = initSyntaxSkin(sameIdent = sameIdent)
{.syntaxSkin: mySkin.}

It also solves problem with C wrappers (you can just disable case insensitivity for wrapper module and use case sensitivity wrapper from other module in case sensitivity manner)
And also you still have the benefits of case insensitivity (in other modules)
but it realy hard to implement especially executing pragmas during parsing (for style skins in general)
ideally the compiler should also have support for jit compilation for vm

about interaction between modules:

# gl.nim
import std/syntaxskins
{.syntaxSkin: caseSensive.} # or wrapperDefault, etc...
const GL_FLOAT = 0x1406
type GLfloat* = float32

# code.nim
import gl
# we have default syntax skin
let x = GL_FLOAT # external idents used with it's sameIdent logic
proc test(x: GLfloat) = discard
# but non external identifiers have normal logic
import snake_case_module
var x = weirdName # in snake_case_module it is weird_name

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 5, 2024

Style skins seems like an interesting idea, but a bit irrelevant here as this RFC tries to address the issue of having multiple identifiers with different compound words that are treated as the same identifier, because the algorithm that converts them to an IR can't handle such cases.

If your snake_case_module contains both weird_name and weirdname, Nim will still treat it as the same. Sure, you could just add {.syntaxSkin: caseSensive.} into snake_case_module, but I think it would be better to make Nim behave properly by default.

@metagn
Copy link
Contributor

metagn commented Jan 5, 2024

Macros will break if they generate an all-lowercase identifiers, but other than that I'm not aware of any other backwards-incompatible changes this would introduce to Nim.

Isn't the suggestion to remove case insensitivity? That is definitely breaking to code which uses different casings.

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

Perhaps you still need to add a rule:
letters between two capital letters must be capitalized
GlFloat -> GLFloat
this is needed for something like this
Gl_float -> GlFloat -> GLFloat

It solves @metagn cases
newHtmlParser (with originaly newHtmlParser) -> newHTMLParser
new_html_parser -> newHtmlParser -> newHTMLParser
so new_html_parser match newHTMLParser

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

Macros will break if they generate an all-lowercase identifiers, but other than that I'm not aware of any other backwards-incompatible changes this would introduce to Nim.

Isn't the suggestion to remove case insensitivity? That is definitely breaking to code which uses different casings.

It is my understanding that it about changing the register insensitivity rules, not removing it.

@metagn
Copy link
Contributor

metagn commented Jan 5, 2024

Sorry, I was wrong about newHTMLParser, the suggestion is to transform from snake case to case sensitive camel case, we already accept that newHTMLParser and newHtmlParser are different in camel case so they should also be different in snake case.

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

Sorry, I was wrong about newHTMLParser, the suggestion is to transform from snake case to case sensitive camel case, we already accept that newHTMLParser and newHtmlParser are different in camel case so they should also be different in snake case.

newHTMLParser and newHtmlParser are the same
(when stylecheck off)

@metagn
Copy link
Contributor

metagn commented Jan 5, 2024

With this proposal, they no longer become the same.

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

Perhaps you still need to add a rule: letters between two capital letters must be capitalized GlFloat -> GLFloat this is needed for something like this Gl_float -> GlFloat -> GLFloat

It solves @metagn cases newHtmlParser (with originaly newHtmlParser) -> newHTMLParser new_html_parser -> newHtmlParser -> newHTMLParser so new_html_parser match newHTMLParser

this rule lets it be the same
and should maintain differences between proposed cases.

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

import std/syntaxskins
{.syntaxSkin: caseSensive.} # or wrapperDefault, etc...

BTW, now I think that sameIdent shouldn't be in syntaxSkin, it should be another pragma like {.cmpIdent: sth.} (because it's not just about one module, it's about all it's private/public idents).
And this is easier to implement than it seems, because the idents are compared in sem via getIdent (https://github.com/nim-lang/Nim/blob/devel/compiler/idents.nim)

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 5, 2024

I think it would be better to convert every uppercase letter in a continuous sequence of uppercase letters to lowercase, except the first and the last ones, since a single identifier can have many words and (for me, at least) its not clear if compiler should alternate between upper and lower cases or make everything between first and last uppercase letter uppercase and what effects that may cause.

With this logic newHTMLParser becomes newHtmlParser, GLFloat -> GlFloat, Gl_float, etc.

What do you think?

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

Maybe it's better, but I don't see much difference.
It's worth noting that this resolves this case anyway:

getISLandFrame -> getIsLandFrame
get_is_land_frame -> getIsLandFrame # match but shouldn't
# must be valid only:
get_isl_and_frame -> getIslAndFrame # no match but should

NOTE: for uppercase variant result is same
this can happen, although not often.

# l - lowercase letter
# u - uppercase letter
# (l or u)* - zero and more matches
# (l or u)+ - one and more matches

# string with pattern:
l*   u+  l+  u+  l* # in any case it will be the same

Idealy, and must be camel case (And), but we can't detect it

For getISLAndFrame it working as expected

@Araq
Copy link
Member

Araq commented Jan 5, 2024

I thought about making Nim case sensitive but allowing for a .insensitive pragma that applies for imported symbols. But it looks unreasonably hard to implement with unknown effects for template instantiations etc.

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 5, 2024

getISLandFrame -> getIsLandFrame
get_is_land_frame -> getIsLandFrame # match but shouldn't
# must be valid only:
get_isl_and_frame -> getIslAndFrame # no match but should

NOTE: for uppercase variant result is same this can happen, although not often.

I'm not sure I understand your examples correctly. get_is_land_frame and getIsLandFrame match and should match (both in current state of Nim and with proposed changes from this RFC). I don't see why get_isl_and_frame and getIslAndFrame woudn't match either (after both RFC changes and lowercasing letters between 2 capitals).

For getISLAndFrame it working as expected

Get getISLAndFrame is problematic if you want to convert letters between 2 capitals to uppercase, because both it and getIslandFrame will be converted to getISLANDFrame.

@ASVIEST
Copy link

ASVIEST commented Jan 5, 2024

all ident matching and case insensitivity works via getIdent proc, if we add rules object for ident, we can

getISLandFrame

getISLandFrame this is a crazy example:
words: get, ISL, and, Frame
not words: get, IS, Land, Frame
but we can’t determine this in any way, so we can just leave it like this

getISLAndFrame will not be converted to getISLANDFrame it will be converted to getIslAndFrame
because we scan I - uppercase, get A and see that next is underscore, then going to next and trying scanning but yes with underscores it should be simpler.

@metagn
Copy link
Contributor

metagn commented Jan 5, 2024

This discussion flared up again because some people want to do this.

type Foo = object
const FOO = 123

Before this, tons of people complained about not being able to use underscores on their own, like __init__ or _member. This concern has not been addressed by any of the suggestions in this thread.

For any kind of distinction that style insensitivity in Nim removes, there are people that use that distinction in other languages, to the point that it supposedly stops them from using Nim. This has not changed since 8 years ago when I found out about Nim, and basically nothing has been done about it (I think the Foo vs foo distinction came before). Honestly the idea that the convenience style insensitivity provides is worth losing users over is outrageous.

But it's still a feature that was promised to be part of Nim, hence 100+ people voted to keep it in #456. Removing it wouldn't be a development of Nim, it would create a new language called "Nim with complete identifier sensitivity". Many people would prefer or feel better about the original Nim over this new language. On the other end, it'll only be one thing potential new users/companies won't care about anymore while they find 20 other reasons not to use Nim. It's not a great compromise.

I understand the desire here to find middle ground, but changing style insensitivity means people who are used to it will have to learn the more complex new rules, and only some of the people who are against the current rules will be accommodated while the rest will be equally placated and driven away due to the extra complexity.

Instead, we can look at what exactly it is that the opponents of style insensitivity want that isn't possible in current Nim, and see if any of it is incompatible with what the proponents want that current Nim allows. If I had to say, it would be like:

Opponents:

  • Define different symbols with names under different casings/underscore insertions
  • Be able to refer to these symbols with at least the exact same identifier style

Proponents:

  • Refer to existing symbols under potentially different casings (openarray and openArray, newHtmlParser/newHTMLParser/newHTMLparser, at least for the sake of not breaking code) and underscore insertions
  • Have this work backwards, i.e. swapping the definition name and the usage name should continue to work

Notice the lack of overlap: the opponents don't care if we access the symbols with a different style than the definition as they're used to it just erroring, the proponents (probably) don't care if we define different symbols that would be considered the same under style insensitivity as, again, this would error in current Nim.


Consider this design (ignore the complexity for the Nim compiler for now):

Allow defining style sensitive different symbols.

let foo = 1
let fO_o = 2

Allow referring to them with their exact definition style.

echo foo # 1
echo fO_o # 2

If we refer to them with a different style, error only if more than 1 symbol matches the current style, otherwise allow the reference.

echo fOo # error: ambiguous between foo and fO_o
let bar = 3
echo bA_r # 3

For routines defined with different styles (probably can be generalized to any overloaded symbol), if a style is used that exactly matches some of the definitions, only consider those as overloads. If a style is used that doesn't match any definition: if every definition has the same style, consider every overload, otherwise error due to ambiguous style.

proc foo(x: int): int = x * 2
proc fO_o(x: int): int = x * 3

echo foo(1) # 2
echo fO_o(1) # 3
echo fOo(1) # error: ambiguous style between foo and fO_o

proc bar(x: int): int = x + 1
proc bar[T](x: T): T = x

echo bar(1) # 2
echo bA_r(1) # 2
echo bAr("abc") # 2

This part compromises on the style insensitivity camp because it doesn't let us swap the usage with any definition and have it still work in every case. But considering --styleCheck:usages exists maybe this isn't such a big issue. There is an alternative that makes less sense but treats each camp equally, on ambiguous style we can match every overload.

proc foo(x: int): int = x * 2
proc fO_o(x: int): int = x * 3

echo foo(1) # 2
echo fO_o(1) # 3
echo fOo(1) # error: both foo and fO_o match

proc bar(x: int): int = x + 1
proc bA_r[T](x: T): T = x # notice weaker match but different style

echo bar(1) # 2
echo bA_r(1) # 1
echo bAr(1) # 2

If this design is sound, then we have something that can make everyone happy, and we can iterate on it with regards to addressing complexity in the Nim compiler; or we can formulate other designs in the same way. But again, we can't compromise too much since Nim already promises style insensitivity and has for a long time.

@Araq
Copy link
Member

Araq commented Jan 5, 2024

This discussion flared up again because some people want to do this.

No benefits are known for SCREAMING constants, so this particular case is completely irrelevant for me personally.

@ASVIEST
Copy link

ASVIEST commented Jan 6, 2024

I personally don't like the fact that foo and fO_o are both compiled. But I think that idea of this RFC is quite logical because it’s strange to make gameStop == gamesTop. As for me we need case insensitivity but restricted where they have different meanings. The purpose of this RFC is to bring some clarity to case insensitivity. Yes maybe it need also aditional rules for equality foo and fO_o and Foo, FOO or make other rules in general.

BTW I've never had a case where when using pure Nim, insensitivity gets in the way. However when wrapping C libraries it happens quite often, e.g. OpenGL, v4l2, libnx..... And I think you could add a pragma to specify case sensitivity: on | off, the same rules would apply for a module and identifiers from that module imported from other modules as case sensitivity in a module with a pragma. I find it quite useful, plus after these C-api wrappers there is likely to be a high-level api for nim.

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 6, 2024

I also don't like the idea of having foo and fO_o in the same codebase, but mostly because in the context of Nim these things would be the same identifier, probably referring to the same thing in the same context, which is a huge code smell (that current Nim doesn't forbid anyway). And I do think that having style insensitivity for identifiers be allowed only if user's chosen style doesn't create any ambiguity when resolving identifier names is an interesting idea, but it's a bit out of scope of this RFC and is probably too complex for both camps to be happy with.

The purpose of this RFC is to highlight a problem with current implementation of style insensitivity. People, when using snake_case or camelCase, don't place underscores and uppercase letters randomly, but tend to use them as delimiters for words inside of a single identifier name. This information is simply lost when Nim parses these names into IR, and in some cases this can create ambiguity amongst 2 different identifiers. And notice that people are mostly discussing snake_case and camelCase when talking about style insensitivity, and not styles like SCREAMINGCASE or nocase, so I personally don't find any value in a system that allows me to do C_R_A_Z_Y, sIlLy stufflikethis, but doesn't let me have new_freedom and new_free_dom in the same context. I consider that for styles that Nim realisitcally supports, case insensitivity is an improper tool to achieve insensitivity amongst them.

If people find genuine issues with proposed changes that would break their workflow, I'd be happy to hear about those and have a discussion. But from what I can tell, these changes shouldn't affect anyone's code (that doesn't smell already) and shouldn't prohibit people from using their favorite style.

@metagn
Copy link
Contributor

metagn commented Jan 6, 2024

People, when using snake_case or camelCase, don't place underscores and uppercase letters randomly

No matter what algorithm we come up with some code will still break. In this sense, people do place underscores and uppercase letters randomly. There is also the problem that the simpler the algorithm is, the more code will break.

If you don't want foo and fO_o to compile when defined in the same module you can easily add a --styleCheck:definitions and tune it further so differences like foo and fo_o are allowed but not foo and fOo, or whatever arbitrary casing scheme you come up with. Again we are dealing with things that are straight up not possible with the current compiler, we can easily restrict it further later.

There is a case which compiles in current Nim:

# a.nim
proc foo(x: int): int = x * 2
# b.nim
proc fO_o(x: int): int = x * 3
import a, b

Trying to use either foo will error here due to overload ambiguity but changing one of them to x: T would cause a subtle behavior difference.

For this case, we can add another warning, for whenever we use a symbol when a symbol with another style was available.

These warnings need to be opt in though because they will complain that people who prefer style sensitivity are using style sensitivity.

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 6, 2024

If you don't want foo and fO_o to compile when defined in the same module you can easily add a --styleCheck:definitions and tune it further so differences like foo and fo_o are allowed but not foo and fOo, or whatever arbitrary casing scheme you come up with.

From what I understand, --styleCheck simply disallows ambiguous styles for identifiers, but it doesn't change how identifiers represented internally, so it doesn't solve the problem. The problem is that I do want foo and fOo to compile (like it currently does in Nim), but I want them to represent 2 different identifiers (unlike what it currently does), the same way I want foo and f_oo to compile and be 2 different things. Just to reiterate - the issue here is identifier equality, not what style I want to restrict my codebase to.

I don't see how this is not possible to do in current compiler, since it requires a change in a compiler part that is already there and is responsible for current behavior in the first place (unless compiler, for some reason, is built around the fact that all identifiers internally are lowercase (except maybe for the first letter) and is now tightly coupled to this implementation detail).

There is also the problem that the simpler the algorithm is, the more code will break.

If someone wants to use my fooBar as foo_bar, I see no issue with that, but what value do we gain from someone writing my fooBar as f_o_o_BaR? Yes, the proposed algorithm and proposed changes to it will break someone's code, if all they do is SCREAMINTHEIRCODEBASES, orpretendlikeunderscoresdontexist, or ins_ert u_ndersc_ore_s rand___oml_y w_it_h no th_oug_ht o___r re_as_on. Nim currently supports these cases, but the question is, should it? I would be very surprised if out of 100+ people that wanted Nim to keep style insensitivity, there would be anyone who writes fooBar as f_o_o_BaR. If they do, then once again, I'd love to hear their thoughs and reasoning. All I see is a genuine problem with current approach: just because someone decided they want to turn my GL_BYTE into Glbyte in their codebase, I'm not allowed to have more direct C lib bindings, like GL_BYTE and GLbyte and watch out for compounds (which, admittedly, not a huge issue, but is still a non issue in any other programming language). Thus I want you to maybe consider "No, Nim should have more strict rules as to how identifier names are translated amongst the styles we support" as the answer.

Once again, if I'm missing something, and there is a genuine case where the proposed changes would break an otherwise reasonable code that doesn't adhere to either snake or camel case (or some mixture of both), then I'd like to see such an example (that isn't foobarbaz).

There is a case which compiles in current Nim:

This will also compile with proposed changes, but will not be ambiguous, because you would be able to call foo with foo, and call fO_o with fO_o, fOO or f_o_o.

@ZoomRmc
Copy link

ZoomRmc commented Jan 6, 2024

the idea that the convenience style insensitivity provides is worth losing users over is outrageous.

The idea that the convenience style insensitivity is critical for deciding on using or ditching the language is hilarious. 🤷

@Araq
Copy link
Member

Araq commented Jan 6, 2024

I'm sure that's the rule that has been proposed here, but let me spell it out to see if there is disagreement:

"Two identifiers are equal if they have separators in the same positions. A separator is either the underscore or a invisible between a case switch (lowercase followed by uppercase or vice versa). The first character is always case sensitive and other characters are not. An underscore cannot be followed by an underscore."

This implies:

foo != foO
fo_o != foo
fo_o == foO
NoSmoking = No_smoking
GlFloat == Gl_float
AbCD = Ab_cd
HtmlParser = Html_Parser
HTMLParser = HTMLP_arser # wrong, but that's life

This is pretty intuitive, keeps style intensivity's benefits and tooling can easily provide a transition path.

@kuchta
Copy link

kuchta commented Jan 6, 2024

@Araq What about "A separator is either the underscore or a invisible between a case switch" (when lowercase followed by uppercase or one left if uppercase followed by lowercase). IMHO HTMLParser is more common style then HTMLparser so that

HtmlParser == Html_Parser
HTMLParser == HTML_Parser

@metagn
Copy link
Contributor

metagn commented Jan 6, 2024

From what I understand, --styleCheck simply disallows ambiguous styles for identifiers

I am suggesting a new option entirely that would go with my proposal, it would just go under the styleCheck umbrella.

it doesn't change how identifiers represented internally, so it doesn't solve the problem.

Yes, internally all identifiers are stored in a style insensitive way. You are suggesting to store them under a new scheme (no?), I am saying changing the scheme is going to cause more pain.

The problem is that I do want foo and fOo to compile ... I want them to represent 2 different identifiers ... I want foo and f_oo to compile and be 2 different things

I don't want to indulge misunderstandings further here. The compiler, on a technical level, should be able to respect any different strings. The same way the compiler would allow foo and fOo or foo and f_oo to coexist, it shouldn't suddenly complain when fooBar and foo_bar coexist; we should be able to let the user decide which pairs make sense (ideally). We can do this in the frontend without changing the semantics of the language, which is why I mentioned the possibility of an option under the styleCheck umbrella. We can 1. let the user know when they defined the same thing in camel case and snake case (who would do this?), this is what I meant by --styleCheck:definitions, again we can let the user specify that for example only ambiguity between snake case and camel case is bad, with some option name like --styleCheck:definitions:snakeCamelCase; 2. let the user know when they wrote something that matches the style of one identifier but could pass for another under style insensitivity, maybe under an option name like --styleCheck:alternatives. Sorry that I rambled here, I just wanted to clear up any misunderstandings, feel free to ignore if there isn't anything to discuss

I don't see how this is not possible to do in current compiler, since it requires a change in a compiler part that is already there

I meant that you cannot define both foo and fO_o using the current compiler, "the current language" might have been more appropriate.

If someone wants to use my fooBar as foo_bar, I see no issue with that, but what value do we gain from someone writing my fooBar as f_o_o_BaR?

I don't mean to be rude, but this is not an argument for your suggestion, it's a defense under the hypothetical argument "we should allow f_o_o_BaR". The point isn't that f_o_o_BaR makes sense, it's that what makes sense was decided years or decades ago, we would just be continuing the cycle by trying to attack f_o_o_BaR. Stuff like this has plagued discussion on this issue for so many years (haha Nim allows f_o_o_BaR, haha Python errors at runtime when you don't capitalize one letter).

just because someone decided they want to turn my GL_BYTE into Glbyte in their codebase, I'm not allowed to have more direct C lib bindings, like GL_BYTE and GLbyte

I mean, this is the user's fault, there's nothing honest about changing GL_BYTE to Glbyte.


The idea that ... style insensitivity is critical for deciding on using or ditching the language is hilarious.

Yeah but are we really in a position to bargain against people who think this? We are only so influential.


A separator is either the underscore or a invisible between a case switch

I did not interpret the proposal like this at all, it's not what "convert snake case to camel case" usually means. The comment:

I think it would be better to convert every uppercase letter in a continuous sequence of uppercase letters to lowercase, except the first and the last ones

Implies that HTMLParser becomes HtmlParser = Html_parser. What I thought the proposal was initially suggesting and what each algorithm that googling "convert camel case to snake case" gives does is interpret HTMLParser as H_t_m_l_parser.

It's interesting though, and I'm not saying these conversion schemes don't make sense (if this was not clear) but I can't say that changing to a new scheme is a path we should be pursuing. To repeat, changing the scheme would:

  • inevitably break code (I know, lame)
  • require Nim users to get used to it
  • require Nim learners to understand it
  • probably not make people who hate style insensitivity feel better to a significant degree

If we are at the point that these are acceptable compromises then we can go nuts but I still have doubts.


Final note: the majority use of style insensitivity is not the simultaneous use of camel case and snake case, most people adhere to NEP1 which says to use camel case. The majority use is cases like GC_ref or openArray or builtin where it's not clear in what configuration to separate the words.

@Araq
Copy link
Member

Araq commented Jan 6, 2024

probably not make people who hate style insensitivity feel better to a significant degree

I agree but that's a (good!) argument against any proposal like this altogether. For now I'm interested in exploring the proposal, we can always reject it later. Of course, you can also say "waste of time" already.

@Zectbumo
Copy link

Zectbumo commented Jan 6, 2024

I like where this is going. Does this proposal handle all uppercase identifiers like this?
NOW_HERE = NowHere
NOWHERE = Nowhere
I'm happy to see that NOW_HERE != NOWHERE

@Araq
Copy link
Member

Araq commented Jan 6, 2024

@kuchta > What about "A separator is either the underscore or a invisible between a case switch" (when lowercase followed by uppercase or one left if uppercase followed by lowercase).

Is that the same as "separator between uppercase letters AB if B is followed by a lowercase letter?"

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 6, 2024

I'm sure that's the rule that has been proposed here, but let me spell it out to see if there is disagreement

The ruleset is almost correct. Its just that a separator is either an underscore or a case switch from lowercase to uppercase (and not vice-versa). Also, like @metagn pointed out, a continuous sequence of uppercase letters gets converted into a separate word, except for the last character in such sequence, so people can use newHTMLParser and newHtmlParser interchangeably (but not newHTMLparser, that would be equivalent to newHtmLparser!).

@Zectbumo
Copy link

Zectbumo commented Jan 6, 2024

So AF_INET would be AfIneT?

@IgorGRBR
Copy link
Author

IgorGRBR commented Jan 6, 2024

Perhaps there should be an exception would be made for the last character in sequence if that last character happens to be last character in identifier.

So AF_INET would be AfInet.

@TTSKarlsson
Copy link

Maybe we could let the user decide what names the variables have, instead of having each variable name have x^n number of name permutations? It would make it easier for everyone, no inconsistencies. More power to the user.

@IgorGRBR
Copy link
Author

Maybe we could let the user decide what names the variables have, instead of having each variable name have x^n number of name permutations? It would make it easier for everyone, no inconsistencies. More power to the user.

Sorry, didn't notice your comment earlier. Can you elaborate more on this? People here have already suggested using "style skins", and we already discussed their issues.

@TTSKarlsson
Copy link

Sorry, didn't notice your comment earlier. Can you elaborate more on this? People here have already suggested using "style skins", and we already discussed their issues.

I should probably read up on these style skins first, maybe they solve the issue already.

@lou15b
Copy link

lou15b commented Dec 6, 2024

Sorry for the late comment; I found this RFC when I searched for why the new Roadmap (Issue #556) doesn't mention fixing the style sensitivity issue for Version 3. Now I see why.

Anyway, here are my thoughts on the matter.

To me the aim of this discussion should be: how to facilitate the writing of code that is as easy as possible for people other than the original author to read and understand. It should not be about accommodating every individual's particular coding taste.

For this reason I am against "style skins". That idea goes in the opposite direction from code readability - in order to understand someone else's code, one must first understand the style that is specified by the skin being used. Adding hoops to jump through when reading code is not my idea of a step forward.

A major component of clear, understandable code is the use of descriptive identifiers, each of which may comprise several words. The situation we are faced with is that there are two main conventions/religions for separating words within an identifier:

  • Use an underscore as a word separator (snake_case)
    • This suffers the disadvantage of longer identifiers (i.e. more typing, longer lines of code).
  • Use the occurrence of a character case change to indicate the beginning of a new word (camelCase/HumpCase)
    • This suffers from some ambiguities - for example when the identifier includes acronym(s).

From what I have read to date, it seems that the only thing that the proponents of the above two conventions have in common is unwillingness to use the other.

Given that, the goal of this discussion should be to find a way to reconcile the above two conventions in a way that promotes the writing of easily readable code.

So the challenge is to define a small, unambiguous set of simple rules for separating words within an identifier. Those rules must allow use of either case change or underscore as a word separator, such that identifiers that are comprised of the same sequence of words would be deemed to be the same - whether underscores are used, or case change, or some combination. That is, the compiler's internal representations of them are identical.

There is one further necessary requirement. In general, a FFI may use use a totally different, arbitrary convention. In order to facilitate interactions with a FFI, it must be possible to write identifiers whose internal representation is exactly as typed.

To that end, building on the discussion so far in this RFC, I propose the following set of rules. They are in decreasing order of precedence, and are in addition to the current constraints on identifiers as defined in the Manual (case sensitivity of the first character, restrictions on the use of underscore).

  • Any sequence of consecutive embedded digits (i.e. numeric characters) is a single, separate, word.
  • An underscore ('_') with a digit both immediately before and immediately after is ignored.
  • Except for the previous rule, the character following an underscore is the beginning of the next word.
  • A case change from one character to the next (lower to upper or upper to lower) indicates the beginning of a new word. The new word begins at the upper case character.

Finally, overriding the above rules:

  • Any identifier enclosed by backticks (`) has an internal representation exactly as it is typed.

Applying the above rules to the examples mentioned in previous comments, plus a few more, gives:

foo != foO
fo_o != foo
fo_o == foO
NoSmoking == No_smoking
GlFloat == Gl_float
AbCD == Ab_cd
HTMLParser == HTML_parser == HtmlParser == Html_parser
GC_ref == GcRef == Gc_ref
NOW_HERE == NowHere == Now_here
NOWHERE == Nowhere
AF_INET == AfInet == Af_inet
row26column5 == row26Column5 == row_26_column_5
foo12_345 == foo12345 == foo_12345
`GL_TEXTURE_MAG_FILTER` != gl_texture_mag_filter
`GL_TEXTURE_MAG_FILTER` != GlTextureMagFilter

@Araq
Copy link
Member

Araq commented Dec 6, 2024

@lou15b For Nim 3 I am thinking of "just make it case sensitive but add {.snake_case.} and {.camelCase.} pragmas" which can be used for an entire module affecting all the imported symbols to adhere to the requested style.

@ASVIEST
Copy link

ASVIEST commented Dec 6, 2024

@lou15b For Nim 3 I am thinking of "just make it case sensitive but add {.snake_case.} and {.camelCase.} pragmas" which can be used for an entire module affecting all the imported symbols to adhere to the requested style.

I think {.insensitive.} pragma is also need, because of c wrappers.
Question about implementation: we need to compute 3 hashes for 1 ident for ident id matching?

@Araq
Copy link
Member

Araq commented Dec 7, 2024

@ASVIEST I haven't thought about it much but in the new implementation there would be no hashing beyond what is used for identical strings. Instead on import there would be "try to import FooBar. Failed? Ok, so try FOO_BAR then".

@lou15b
Copy link

lou15b commented Dec 7, 2024

@Araq

just make it case sensitive

I think this would considerably simplify a number of things, not the least of which is to eliminate all the bike-shedding about naming conventions from discussions about the Nim language. I'm all for it.

add {.snake_case.} and {.camelCase.} pragmas which can be used for an entire module affecting all the imported symbols to adhere to the requested style.

I'm not sure what this means. If it means "convert all imported symbols to the specified style" then there must be rules defined for word separation, in order to properly convert those symbols.

@Araq
Copy link
Member

Araq commented Dec 7, 2024

If it means "convert all imported symbols to the specified style" then there must be rules defined for word separation, in order to properly convert those symbols.

Sure but it doesn't look hard, lower-followed-by-upper or underscore is a word separation.

@lou15b
Copy link

lou15b commented Dec 7, 2024

True. And one can always fall back on the default - use the symbols as originally typed.

@Araq
Copy link
Member

Araq commented Dec 7, 2024

There is an issue though, if I import fooBar as foo_bar and then the module later also adds foo_bar my code changes its meaning... Probably not too problematic though as a module that exports both fooBar and later foo_bar is not realistic. Or maybe within a module we can enforce that the used style for declared identifiers uses camelCase or underscores consistently.

@ASVIEST
Copy link

ASVIEST commented Dec 7, 2024

There is an issue though, if I import fooBar as foo_bar and then the module later also adds foo_bar my code changes its meaning... Probably not too problematic though as a module that exports both fooBar and later foo_bar is not realistic. Or maybe within a module we can enforce that the used style for declared identifiers uses camelCase or underscores consistently.

Shouldn't it be in stylecheck? And I think that you can write

import foo {.insensitive.}

Then names should not be in collision

@Araq
Copy link
Member

Araq commented Dec 8, 2024

@ASVIEST I didn't bind the insensitive to a particular import as you did so your solution escaped me.

@ASVIEST
Copy link

ASVIEST commented Dec 9, 2024

@ASVIEST I didn't bind the insensitive to a particular import as you did so your solution escaped me.

I came up with this quite a long time ago when I was playing with the compiler. The idea was to make a pragma .{sensitivity: sense}. that has several uses:
In the module:

# a.nim
{.sensitivity: sense.}
const GL_FLOAT* = 0x1406
type GLfloat* = float32

const megabyte* = 1024*8
import a # behaves by default as import a {. sensitivity: sense.} I.e. those ids that have become the same are treated sensitively (GL_FLOAT, GLfloat), the rest are insensitive. 
assert GL_FLOAT ==  0x1406 # valid
assert GLfloat is float32 # valid
assert megabyte == 1024*8 #valid

Besides partial, there is sense and insense in this design (although I think insense can be made partial)

# b.nim
import a {.sensitivity: sense.} 
var megaByte* = "1024 * 8" #no error
# c.nim
import a {.sensitivity: sense.} # or partial
import b {.sensitivity: sense.} # or partial
assert megaByte == "1024 * 8"
assert megabyte == 1024*8

Import with sensitivity defines the sensitivity in which the imported identifiers will be represented in the scope of current module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants