Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebol syntax and space significance #2094

Open
rebolbot opened this issue Nov 17, 2013 · 42 comments
Open

Rebol syntax and space significance #2094

rebolbot opened this issue Nov 17, 2013 · 42 comments

Comments

@rebolbot
Copy link
Collaborator

Submitted by: Ladislav

One of the Rebol design principles is to respect existing standards where it is of advantage. Space significance is one of the standards (space significance belongs to typographic standards) respected by Rebol. Examples 1 to 2 demonstrate space significance in Rebol. Rebol user's guide mentions siginficance of space in Rebol stating:

"White-space is used in general for delimiting"

Advantages of space significance:

  • The rule assigning significance to space is simple.
  • Space significance makes the language syntactically richer, allowing it to be more expressive than the languages where space is insignificant. Being able to use words like a+b from example 1 helps users to be more creative, flexible and more understandable when introducing new words. Example 2 demonstrates that space significance is used in Rebol to distinguish a single three-element path from a two-element path followed by a single slash and a paren. It also demonstrates that space significance allows to introduce paths containing paren expressions to Rebol syntax. Without space significance the source code could not be interpreted as a syntax containing Rebol paths. Instead (like in C), both syntax variants would be interpreted as two ways how to describe an expression containing two division operators.
  • Space significance is a typographic standard meant to enhance readability of the text for humans. See, e.g., http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Brackets_and_parentheses
  • Compliance to standards is a property making the language convenient for people accustomed to respective standards.

However, Rebol is not consistent in attributing significance to space as examples 3 to 11 demonstrate. I am not the only user who classifies the behaviour demonstrated by the exceptional examples "unexpected" (see also #2088).

Disadvantages of space-insignificant exceptional behaviour:

  • Complicates space significance rule by introducing unexpected behaviour (exceptions).
  • Exceptions can be demonstrated to introduce inconsistencies in the sense that MOLD and LOAD functions behave in an incompatible way, not "understanding" each other. Observing incompatible behaviour of MOLD and LOAD we have to conclude that there is a bug in the language design.
  • Makes the language syntactically poorer and less expressive, disallowing it to express useful informations.
  • Ignores the information given by the user. If a user writes the text in two forms, once separating some parts of the text by whitespace and at another occasion writing the parts together, the information provided by user should be respected acknowledging the difference.

The main disadvantage of the exceptional behaviour is that the exceptions "steal" the (otherwise valid and sometimes already produced by the MOLD function) syntax. The syntax ("ignored" by the LOAD function) is useful, e.g., for paths that already exist and work in Rebol. This is demonstrated in example 5.

; example 1
; in Rebol space is significant
>> not-equal? [a+b] [a + b]
== true

; example 1a
; in Rebol space is significant
>> not-equal? [#{}] [# {}]
== true

; example 1b
; in Rebol space is significant
>> not-equal? [#" "] [# " "]
== true

; example 1c
; in Rebol space is significant
>> not-equal? [%""] [% ""]
== true

; example 1d
; in Rebol space is significant
>> not-equal? [#[false]] [# [false]]
== true

; example 2
; in Rebol space is significant
>> not-equal? [a/(b)/(c)] [a/(b) / (c)]
== true

; example 3
; bug, MOLD and LOAD give different "answers"
>> not-equal? [/a/b] [/a /b]
== false

; example 4
; bug, MOLD and LOAD give different "answers"
>> not-equal? [#a/b] [#a /b]
== false

; example 5
; bug, MOLD and LOAD give different "answers"
>> not-equal? [(a)/(b)] [(a) / (b)]
== false
; to demonstrate why is this a problem notice that
>> reduce [to path! [(a) (b)]]
== [(a)/(b)]
; which is Rebol syntax produced by MOLD function
; however, LOAD function handles space as "insignificant" and
>> equal? reduce [to path! [(a) (b)]] [(a)/(b)]
== false

; example 6
; bug
>> not-equal? [[][]] [[] []]
== false

; example 7
; bug
>> not-equal? ["a"b] ["a" b]
== false

; example 8
; bug, MOLD and LOAD give different "answers"
>> not-equal? [[a]/(b)] [[a] / (b)]
== false

; example 9
; bug
>> not-equal? [[a]b] [[a] b]
== false

; example 10
; bug
>> [%{a/b}]
== [%"" "a/b"]

; example 11
; bug
>> not-equal? [<a>b] [<a> b]
== false

CC - Data [ Version: r3 master Type: Bug Platform: All Category: Syntax Reproduce: Always Fixed-in:none ]

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim

I think some datatypes still need more checks post manipulation, to make sure they remain valid (like all the checks added to word creation)

some of your examples may simply need to be clarified in what constitutes a separator in Rebol and are easily understood. There is a bit of formalism in this area which remains to be written I think.

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg

Ladislav, are you just saying that you want to require REBOL input to match the output in your examples? e.g., [(b)/(c)] would mean something different than [(b) / (c)]? If so, I disagree. Number 8 is the most confusing, because of the new empty file behavior, but I don't see how the others violate typographic principles. A parenthetical statement is what it is, whether or not I forget to put spaces around it.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav

" [(b)/(c)] would mean something different than [(b) / (c)]? " - of course! see example 5! In it, [(b)/(c)] should mean something completely different than [(b) / (c)]!

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav

"A parenthetical statement is what it is, whether or not I forget to put spaces around it." - that does not hold in Rebol since in Rebol space is significant as demonstrated in example 2

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg

Regarding paren statements, what I'm saying is that (and I don't know what "typographic standards" you're referring to, specifically), I can choose (at my discretion(read whim)), whether or not to use spaces around parens, without changing the meaning. If my parenthetical statement is a list of things like ((apple) (orange)(banana) ), that's OK, too. Of course, there may be meaning read into it, associating (read whim) closely with "discretion" or (orange) with (banana).

How this appears to me is that you're conflating the lexical form with runtime type information. Given [(b)/(c)] and [(b) / (c)], and knowing REBOL's rules, I expect both to produce the same result. However, if I know that [(b)/(c)] is a path, I don't expect the same result. And I do understand that reflection needs that extra information to work correctly. As you have pointed out many times in the past, things can look the same and not be the same.

Maybe I don't understand the need or value behind this, because I don't know what standards it is not compliant with and it has never caused me problems in the past.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav

"I can choose (at my discretion(read whim)), whether or not to use spaces around parens, without changing the meaning" - that is not true:

a/(b)/(c)

is not equivalent to

a/(b) / (c)

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav

"I don't know what "typographic standards" you're referring to, specifically" - I mean the typographic standards establishing how space characters shall be used around parentheses. Such standards exist in our country and in Europe in general. Typographic standards exist in US as well, making the expressions

(a)b

and

(a) b

typographically inequivalent. The typographic standard is that if you mean something like

(a) b

then you must not omit the space if not wanting to change the meaning of the expression.

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg

The point in my message was that I used arbitrary spacing around parens, and you understood my meaning just fine. :) Changing the spacing around any of the parens in my message would not change a thing, would it?

Please point me to an actual standard. I don't know of any. If we are to be standards compliant, we need an exact standard to adhere to, yes? And examples of why the distinction is important and useful (outside REBOL) would help.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav

"The point in my message was that I used arbitrary spacing around parens, and you understood my meaning just fine" - well, I demonstrated that you can not use arbitrary spacing around parens in Rebol, since

a/(b)/c

gives you something else than

a/(b) / (c)

If you do not know that, you may fall victim to an illusion that you are using arbitrary spacing. However, the examples demonstrate that it is just an illusion.

Also, for example, in C you can omit spaces in

a + b

, but that does not work in Rebol, where

a+b

is not equivalent to

a + b

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg
I know spacing is not arbitrary in REBOL. I also know that REBOL tries to be relaxed, rather than strict, in some cases. That's why I asked for examples outside of REBOL.

Are the rules murky (unclear) in the path! versus leading-series examples above? For someone new to REBOL, quite possibly. Will requiring spaces between closing and opening square brackets ([[][]] vs [[] []]) break a lot of my code if the meaning is different (or an error is thrown)? Yes. :)

Again, please point me to a standard and tell me why the distinction is useful (concrete examples work best for me).

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
"Will requiring spaces between closing and opening square brackets ([[][]] vs [[] []]) break a lot of my code if the meaning is different (or an error is thrown)? Yes. :)" - bad luck. However, R3 is still alpha and this is an important amendment to make. You should not have assumed space insignificance when having the sentence describing space as significant in the documentation.

It is possible to make exceptions to this rule, but I demonstrated why it is questionable. The problem with exceptions is that MOLD and LOAD are becoming incompatible, making language design inconsistent (bug-prone), inflexible, irreflexive and unnecessarily limited.

Due to these reasons it is considerably better to remain consistent in R3 and respect the rule that space is significant in Rebol.

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg
"You should not have assumed space insignificance when having the sentence describing space as significant in the documentation."

The original REBOL guide says this: White-space is used in general for delimiting (for separating symbols). These are also general delimiters:
{ } " ; /

And Carl does it, too. ;)

I can't very well vote to follow a standard I can't read.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
" These are also general delimiters:" - I acknowledge that the non-white-space characters mentioned are delimiters (marking for example the contents of blocks, parens, strings, comments, ...), but they are actually not general delimiters:

  • For example, / is essentially not a delimiter, a general delimiter even less so; when used next to any non-ws character it is a part of the syntactic element as in a/b where it is used in a path, %a/b where it is used in a file, http://www.rebol.com where it is used in a URL, /a where it is used in a refinement or // where it is used in a word
  • ( and ) are not general delimiters, they just mark the start and end of paren contents; see example 2 demonstrating that space preceding opening parenthese and space following a closing parenthese are significant allowing paren to be an element of a path in Rebol syntax
  • [ is not a general delimiter either, it just commonly marks the start of block contents; when following the # character, it is just a part of the syntactic element
  • " is not a general delimiter, it just commonly marks the start of a short string; when the starting " character follows the # character or the % character it is just a part of the syntactic element (char or file)
  • { is not a general delimiter, it just marks the start of string contents; when following the # character it is just a part of the syntactic element (binary!)
  • ; is not a general delimiter, it just marks the start of a comment to end of line

To sum up, these non-whitespace characters are are not general delimiters in Rebol.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
I took the time to find some text in English mentioning parentheses and their relation to whitespace in typography.

See http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Brackets_and_parentheses

I think that it can be used as an inspiration. Notice the text: " An opening bracket should be preceded by a space, except in unusual cases; for example, when it is preceded by ... a portion of a word" - if you replace "word" by "syntactic element" you get exactly the sense discussed in this ticket. For example, if wanting to write

a/(b) / [c]

you should never remove the spaces, unless you want to obtain

a/(b)/[c]

, i.e. a different syntax

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg
Thanks very much for taking the time to find a resource. I will try to read it soon and comment.

It also occurred to me that this could affect markup handling quite a bit, correct? That is, "test" would be different than " test ".

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
Your example is not well suited for this page, as it looks. However, the strings you wrote obviously are not equal...

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim
I agree with most of your arguments or reasoning, BUT:

To me the syntax is not defined by separators, which is why rebol's lexical space is so rich. we can't even say that space is a separator, because the use of strings allow it i.e. we are not forced to escape them like in path! datatypes.

The syntax is based on sequences of bytes which terminate by their pattern. As you know we cannot/should not tokenize Rebol by using "separators". This is a strength or Rebol, not as much mathematically as is it in spirit. Trying to describe Rebol to others requires us to use terms others understand, but we are not forced to follow their ideals.

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim
I will not use syntax in which I am forced to use spaces between each use of paren or block characters. it would be extremely annoying.

The rules for a block start at the block marker and end when the block pattern end. The rules for a contruct start at the construct marker, and so one. That is how one defines Rebol. Its an intrinsic part of Rebol. It is also an intrinsic part of PARSE.

Each datatype has its own syntax which is not forced to be symmetric with others. Rebol is about being pragmatic.

A part from extending some types for additional syntax, I do not think we should start making sweeping changes such as trying to force spaces everywhere. We must not mix-up simplicity and consistency. Rebol syntax is not supposed to be 100% consistent, it is supposed to be simple in the majority use case.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
Paraphrasing:

"I will not use syntax in which I am forced to use spaces between each operand and operator. it would be extremely annoying." - my note is that this is how Rebol was designed

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim
I think I didn't use the right words. I should have said I understand not "agree".

I don't give much weight to "typographical standards". This is not a newspaper, its a programing language.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
"I don't give much weight to "typographical standards". This is not a newspaper, its a programing language." - The purpose of typographic standards is to keep the text human-readable. You may prefer unreadable syntax, but I do not mind what kind of text I read, I prefer it to be readable anyway.

Also notice that I do not propose that whitespace has to precede, e.g., an opening paren. Instead I do propose that if an opening paren is preceded by a non-whitespace character, the syntax has some special meaning.

@rebolbot
Copy link
Collaborator Author

Submitted by: rgchris
My inclination is to insist on spaces between values with the exception of two blocks/parens, and with no space required at the beginning/end inside blocks/parens. Even though spaces are not always needed to discern separation of two types, e.g. two tag! types: [], both in code and data (I know, one and the same) it lends to syntax that can be unclear to the point of obfuscation, as well as being inconsistent and creates complexity in describing the language for both authoring and parsing.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
"with the exception of two blocks/parens" - for readability, consistency and possible future improvements it is better to put in the space as well when describing two separate blocks, parens, etc.

"no space required at the beginning/end inside blocks/parens" - yes, that is the standard; the start of the contents is marked by the opening bracket, no additional character is needed, similarly for the end of the contents

@rebolbot
Copy link
Collaborator Author

Submitted by: rgchris
@ladislav, I'd almost agree, but there are some cases, such as function definition or other code controls, where the space is actually a distraction:

func [foo][print foo foo]

either something [
this
][
that
]

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
As I said, space outside of parens, brackets, etc., is standard (typographic standard, and also Rebol standard valid when there is no bug), unless we want to have some special syntax.

The standard syntax is:

func [foo] [print foo foo]

Making the SPEC and BODY argument separate values similarly as in more symbolic

func my-spec my-body

There is no need or intention to "glue" the SPEC and BODY arguments togeter, similarly as in

either something [
this
] [
that
]

, where the TRUE-BLOCK and FALSE-BLOCK arguments are separate values not needing to be visibly "glued together".

Please note that the standard does allow the case when there is a non-whitespace character preceding the opening bracket or the case when there is non-whitespace character following the closing bracket. However, these are reserved for special syntax, when a block (or paren) is actually a part of a larger syntactic element (a part of a path or something else).

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim
Ladislav, please don't start berating other people's opinions when all you have is your own. don't start the I am better at logic game here. This is ALL opinion. there may be majority on one side or another, but its all opinion.

it is only your opinion that "Space significance is one of the standards (...) respected by Rebol". You bring along a false presumption that spaces have a special significance when they do not. the proof is that there are quite a few places in the language where spaces have NO significance (which is exactly what you show, there are others).

I'd really like it if we stopped trying to change things which are not broken in the language. There is no overall separator in Rebol because its not a token language. Each type has a pattern and when that pattern stops, that type ends. there is NO presumption that there should be a space after any of the elements which constitute rebol code. Never have, never should be.

The only reason there are spaces is because at some points, you need to end one pattern and start a new one.

when you talk about things like "You may prefer unreadable syntax, but I do not mind what kind of text I read, I prefer it to be readable anyway." . If it is your opinion that adding spaces is better for readability, then write code that way. I don't find many things hard to read at all which you attempt to promote.

What you are trying to promote here is the same kind of detail which makes other languages tiresome to use. Do I really want to start having damn "syntax" errors every other time I write a line of rebol? DEFINITELY NOT.

Are there ambiguities? yes. Are paths a little screwed up or incomplete in their syntax or error reporting right now? yeah. But its not a problem of the whole language.

what's next? forcing an end of expression character (like ; in C) or adding commas and parens around all of them.. cause you know these REALLY help readability right?

@rebolbot
Copy link
Collaborator Author

Submitted by: abolka
"Each type has a pattern and when that pattern stops, that type ends. there is NO presumption that there should be a space after any of the elements which constitute rebol code."

This claim is demonstrably false, both in theory and in practice.

In fact, most of Rebol's lexical elements indeed require to be followed by EOF or a delimiter such as a whitespace character after the element; other characters besides whitespace which are currently attributed delimiting quality in R3 are: " ( ) / ; [ ] { }

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
Max:
"I'd really like it if we stopped trying to change things which are not broken"
"Are paths ... screwed up ...? yeah"
"But its not a problem..."

Aha, "screwed up" is "not a problem" because it is "not broken". Have to disagree.

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim
"EOF or a delimiter such as a whitespace character after" yep... a delimiter... not a space. Something that isn't in the pattern of the currently loading datatype.

what I am saying is that space is not a special delimiter. its just one of the characters which is used as a delimiter... not as a token delimiter of the language as a whole. There are different end delimiters for different types, based on their lexical qualities.

lets not try to add additional syntax. when the current rules are pretty obvious and clear.

Some types might benefit from better lexical parsing (my personal pet peeve is the /this/is/a/list/of/refinements ambiguous syntax). In this case, all we need is to state that "/" is an invalid separator for refinement! and that specific issue is solved (it would raise an invalid refinement error). We don't need to start tacking on sweeping assumptions that spaces have any special significance.

a block doesn't require any special character after its end marker to know the block ended. you can stick ANY (even EOF) character after it, and if that starts a valid rebol syntax, it will be loaded. there is no need for a space after the end block marker, which you can consider a delimiter in its own right, but it is part of the block syntax itself.

As you know, you cannot just say load "[ 1 2 3" , so within block's point of view "]" is more than just a delimiter, but to other types, it becomes one. there is no point in forcing a space anywhere current delimiters already do their job.

@rebolbot
Copy link
Collaborator Author

Submitted by: maxim
Lad, come on, I hope you meant your last quote as a joke, I really do.
cause cutting up pieces of sentences and putting them back to back out of context is well ... creative, and totally irrelevant :-) !?

the only thing that needs work here is to raise a few more errors when tackling paths and other types which allow the "/" as a delimiter when they really shouldn't.

We could extend paths in many ways, though it would have to be obvious that some of these would not be valid in DO. It would also require very specific documentation of what occurs with binding.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
"a block doesn't require any special character after its end marker to know the block ended" - it looks that I probably did not do a good enough job explaining what this ticket is about even though I had a suspicion I already might have made the ticket too big. Trying my luck to see whether this helps:

Nobody says that a block requires any special character after its end marker to know that the block ended. However, the (special or not) character after the block end marker (if any) has to be examined and there actually are two fundamentally distinct possibilities:

  • either the block is (in a sense) "visually glued" to something becoming a part of some bigger "whole" (syntactic element or construct) containing the block
  • or the block is not "visually glued" to anything and thus, is separate from an eventual (if any) subsequent or preceding value in the source text

For example, if there is a character immediately following the block end marker and it happens to be a slash character, then the slash character is not a block end marker, since no additional block end marker is needed. Let's see an example produced by MOLD function:

>> s: mold to path! [[a] b]
== "[a]/b"

In this case the block actually is an element of a path, i.e., it is "visually glued" into the path. The "visual gluing effect" in the MOLD result string shall be attributed to the fact that the slash character is immediately following the block end marker, i.e., there is no space between the slash character and the block end marker.

The problem is that the LOAD function is incompatible ("disagreeing") with MOLD adding a (significant, unsolicited and wrongly implied) whitespace character producing the result:

>> load s
== [[a] /b]

, which, due to the above reasons is not a correct interpretation of the LOAD argument string.

Summing up, block end marker undoubtedly separates any value inside block from any value outside. However, block end marker does not and by the existing typographic standards cannot separate the block from any subsequent element if no separating character is present.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
To demonstrate a "special benefit" case consider e.g. the heredoc syntax for strings, which, all of a sudden, can be made available, e.g., in the form:

my-string{this is the string contents}, the string has not ended yet, " { {{{}}}}}}}}, only now comes the end}my-string

, since such syntax would be invalid otherwise.

@rebolbot
Copy link
Collaborator Author

Submitted by: Gregg
After some chat on AltMe, I'm not convinced that these changes are all for the better. I support the idea of improving paths, to either support or throw errors when creating path values with non-word leading values. I do not support "auto-gluing" blocks and strings together if there is no space between successive closing and opening brackets.

@rebolbot
Copy link
Collaborator Author

Submitted by: Ladislav
Ammendments to path syntax and the possibility to introduce the above heredoc string syntax are not the only advantages of space significance. In the future, it may be of advantage to define new syntax for vectors or other Rebol values. In this sense, space significance looks like a reasonable way to the future.

@rebolbot
Copy link
Collaborator Author

rebolbot commented Mar 1, 2014

Submitted by: BrianH
I don't remember if there was a ticket for this opinion, but...

In this argument, all attempts to cause syntax changes in Rebol by blaming MOLD are not valid, because we can fix MOLD.

There are several datatypes for which it is possible to construct values which, when rendered in the normal syntax of that type, would not be interpreted by LOAD as being the same value (ignoring binding issues and offset or cyclic references). These are usually structure types containing values that in literal syntax would have syntax conflicts or character set limitations.

I think that for every such datatype, it is possible to know all such combinations ahead of time. It's not completely random, it's a finite set of conflicts that through our own testing we should be able to get a list of. Off the top of my head, combinations of path syntax with certain datatypes (notably dates, and a wide variety of types in the first position), and character set limitations with most word types. There are probably more exceptions, but they're probably a finite set.

I think that in those specific cases, MOLD should generate construction syntax. And if any of those types in particular don't support construction syntax, support for that syntax needs to be added. If we are concerned about not unnecessarily exposing offsets we don't want to, construction syntax of path types can skip the offsets when called with MOLD instead of MOLD/all. But it absolutely needs to be loadable as a value or collection of values of the same types that were molded. This is not a LOAD problem, it's a MOLD problem.

As a bonus, if we fix MOLD of word types, we can get rid of the character set limitations of words. Words of arbitrary characters won't load properly when expressed as normal literals, but they will have a way to load with construction syntax, so we don't have to consider that error useful anymore.

@rebolbot
Copy link
Collaborator Author

rebolbot commented Mar 1, 2014

Submitted by: BrianH

Now, given that we have a serious MOLD problem, let's ignore that for a moment.

We might want to change the whitespace significance of certain sequences of characters in our syntax for reasons which have nothing to do with MOLD, but instead because it would let us add new semantic capabilities to Rebol code. Allowing literal non-construction expression of a particular combination of values makes it possible to express that combination in our source code, without resorting to construction syntax. So it may be the case that we may want to tweak things in some cases because it would result in an expression that would be semantically useful, particularly to the DO dialect.

For instance, example 5. The only escaping mechanism we have for normally-evaluated paths is parens, and parens in paths are not pretty to look at, but for that they work pretty well. The biggest problem is that we can't escape the first value. That's very semantically limiting to us. Because of that limitation we have to convert calls to retrieved or constructed functions, with refinements expressed, into calls to APPLY or DO TO PATH!. If we were to allow example 5 with Ladislav's desired interpretation, that would let us use (awkward) paths-with-parens syntax instead of (much more awkward) positional APPLY or explicit code generation. So, changing example 5 would be a win for us.

Now, for example 4. We don't have a path type that has literal syntax and which evaluates to itself in normal inline evaluation. All of the path types evaluate with side effects. That means that we don't have a path type that can be used as a pure symbol. For words, we have the issue type to serve the role of a pure symbol. So, the for the types of situations where we need a symbol that won't have side effects in normal evaluation, we can use issues; which is what makes them so useful for overlaying a separate evaluation process, such as a preprocessor. However, such preprocessors could benefit from having operations that have options, like regular function calls do, so the lack of a corresponding issue path type limits their semantics. So, changing example 4 by creating a new issue path type could be a win to us, even if we fake it by just making it a regular path! with an issue as the first element. As long as issue-paths, real or fake, evaluated to themselves with no side effects, we could use them.

Now, for the rest of the examples, we gain no benefit from making those distinctions. In some cases (like example 3) we might benefit from triggering a syntax error (because refinement paths are a really bad idea which might be worth making a permanent error just on principle). But in other cases, it doesn't do any harm to be flexible in what we accept.

The only reasons we should trigger syntax errors for these is if you wanted to explicitly make sure they never get used (like commas), or if you have a particular use for that syntax variant planned for the future. But beware, for any syntax variants you want to reserve for future use, the rest of the examples here are mentally ambiguous and so would be poor choices for those future features. The current flexible behavior has the benefit of making an obvious choice for the ambiguous situation, and of precluding the other interpretation that makes it ambiguous in the first place.

As for heredoc syntax, didn't you have a ticket for that, Ladislav? The syntax you came up with for that in #1194 was great, and didn't need any of these changes. What happened to that?

@rebolbot
Copy link
Collaborator Author

rebolbot commented Mar 6, 2015

Submitted by: Ladislav
"In this argument, all attempts to cause syntax changes in Rebol by blaming MOLD are not valid, because we can fix MOLD. " - certainly, it is possible to fix MOLD. However, I am not blaming MOLD. I am saying that there are incompatibilities between LOAD and MOLD, and it is not always MOLD to blame.

Also, I see fixing LOAD/MOLD as a syntax change.

@rebolbot
Copy link
Collaborator Author

rebolbot commented Mar 6, 2015

Submitted by: Ladislav
"As a bonus, if we fix MOLD of word types, we can get rid of the character set limitations of words." - that is a "backwards" proposal - Carl strived to make sure only "legal" words are created. Therefore, no construction syntax for words is necessary.

The construction syntax for paths is not very friendly, I do not think people want to use it; the problem is that it has significant typing overhead.

@rebolbot
Copy link
Collaborator Author

rebolbot commented Mar 6, 2015

Submitted by: fork
I generally support adhering more strongly to the idea that spaces should be significant.

New users often ask why a/b is different from a / b. The mantra is that by sticking to the notion of "words separated by spaces", there is more freedom in carving up the lexical space. By not saying "slash always means divide" there are URLs and file paths and other things that can be encoded as their own literal types, vs. as strings.

Pursuant to this I think spaces should be required around strings and tags. There may be interesting meanings for xxx{yyy} that have yet to be thought of, while there is very little measurable value to having that as a "shorthand" for xxx {yyy}. I'd argue it's a detriment: allowing it lets people write harder-to-read code while stealing potentially-interesting lexical space.

However, I feel that one should be careful with brackets (and likely parentheses, although they may be different). If these two cases are equal:

while[...][...] <=> while[...] [...]

I do not think it's good for that to be different from these two equal cases (if they would be equal):

while [...][...] <=> while [...] [...]

The language is a bit confusing as it stands, and this does not have a "Quality" feel.

One might push for an "Outer Space Proposal", where brackets must be spaced and not crunched as ][. If it were done uniformly, it might be all right, but I suspect that will be an uphill battle. (It will also be a big blow to Rebmu.)

So regardless of how the other space significance issues fall out, I would not be in support of while[...][...] and while [...][...] being two different-but-legal things.

@rebolbot
Copy link
Collaborator Author

Submitted by: MarkI
I can see word[block] becoming a syntax extension (or syntax error).
I can even see word[block-1][block-2] potentially getting the same treatment.
What I cannot see is bare [block-1][block-2] ever being different from [block-1] [block-2].
My current favourite argument is this: whatever it could mean is not worth the saving of one character.
That is to say, if you must have it, you shouldn't mind writing it like ?[block-1][block-2], and I assert that that would in fact be much clearer.
Same for parens, or mixes of parens and blocks, of course.
Unfortunately, the Manual of Style is of no help here: "Avoid adjacent sets of brackets."
By a very similar argument, I'd be against bare "string-1""string-2" being any different from "string-1" "string-2", whether or not {} was used instead of "".
I agree that these would be exceptional cases that "break" space significance; but I disagree that these cases are questionable, and I am aware of and quite unconcerned about the fact that they will mold out differently than they load in. Lots of things do that in Rebol. Who would say that spaces after the opening bracket of a block should be a syntax error/extension? Or having no leading zero on a fraction? Those also disappear, or appear, upon molding.

@rebolbot
Copy link
Collaborator Author

Submitted by: fork
I said above:

"regardless of how the other space significance issues fall out, I would not be in support of while[...][...] and while [...][...] being two different-but-legal things."

But after @earl challenged that, and I re-analyzed my discomfort with this, I've decided to take that back. I perhaps underestimated the adjustment that would occur in perception if people were not conditioned to seeing foo[... or foo( style patterns except in certain limited contexts. If they were rare and sufficiently different, and people were trained early on that while[...][...] gives back an error (not a special "block modifier"), and while [...][...] is okay... it might create new patterns of thought about atomicity.

That perception would be helped by the absence of casual usages of foo{... and foo<.... It would increase the "standout" nature of when a leading item did not have a space before a block or paren. If it stood out, then it would convey the intended meaning. And if things like <foo><bar> or {foo}{bar} were disallowed, that would even further draw the eye to unspaced cases being "the exception to watch for".

So I now align myself with @earl (and any others) who have been suggesting a plan for respecting space significance with four exceptions: ][, ](, )[, )(. These would be equivalent to ] [, ] (, ) [, ) ( respectively.

Although a tweak like this might not have the pleasure of a "absolutist formalism", it is solidly defined and doesn't seem to have any "holes" in a mechanical sense. It also seems to permit the cases that people find hardest to accept about what a "totally Outer-Space" proposal would break. Also, nothing stops anyone--in their own code--from always using the spaced version.

From where I'm sitting right now, it seems a good compromise.

@rebolbot
Copy link
Collaborator Author

rebolbot commented Apr 1, 2015

Submitted by: Gregg
-4 +1

New thoughts about atomicity +1

error? any [word[] word{} word()](until we know what to do with it) +.75

``and{foo}{bar} being disallowed -1

This is tough. We have parts of speech, and we have punctuation. I also want quite a bit of freedom in formatting in order to communicate my thoughts and intent. Not having seen the SO chat on this, it would be great if there were a summary of that.

I don't want to resist change for the sake of resisting change, and I'm all for consistency. My hesitance stems more from "space significance" without regard to punctuation being the goal. -4 considers this, and I think Carl did as well in Rebol's design. We can't separate the two. Or at least I don't think we should.

Questions:

  1. What concrete problem(s) does this solve?
  2. What are the things we need to talk about in our code/data/messages, where it the maximal benefit on the "strictness" line in how we express them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant