-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebol syntax and space significance #2094
Comments
Submitted by: maxim I think some datatypes still need more checks post manipulation, to make sure they remain valid (like all the checks added to word creation) some of your examples may simply need to be clarified in what constitutes a separator in Rebol and are easily understood. There is a bit of formalism in this area which remains to be written I think. |
Submitted by: Gregg Ladislav, are you just saying that you want to require REBOL input to match the output in your examples? e.g., [(b)/(c)] would mean something different than [(b) / (c)]? If so, I disagree. Number 8 is the most confusing, because of the new empty file behavior, but I don't see how the others violate typographic principles. A parenthetical statement is what it is, whether or not I forget to put spaces around it. |
Submitted by: Ladislav " [(b)/(c)] would mean something different than [(b) / (c)]? " - of course! see example 5! In it, [(b)/(c)] should mean something completely different than [(b) / (c)]! |
Submitted by: Ladislav "A parenthetical statement is what it is, whether or not I forget to put spaces around it." - that does not hold in Rebol since in Rebol space is significant as demonstrated in example 2 |
Submitted by: Gregg Regarding paren statements, what I'm saying is that (and I don't know what "typographic standards" you're referring to, specifically), I can choose (at my discretion(read whim)), whether or not to use spaces around parens, without changing the meaning. If my parenthetical statement is a list of things like ((apple) (orange)(banana) ), that's OK, too. Of course, there may be meaning read into it, associating (read whim) closely with "discretion" or (orange) with (banana). How this appears to me is that you're conflating the lexical form with runtime type information. Given [(b)/(c)] and [(b) / (c)], and knowing REBOL's rules, I expect both to produce the same result. However, if I know that [(b)/(c)] is a path, I don't expect the same result. And I do understand that reflection needs that extra information to work correctly. As you have pointed out many times in the past, things can look the same and not be the same. Maybe I don't understand the need or value behind this, because I don't know what standards it is not compliant with and it has never caused me problems in the past. |
Submitted by: Ladislav "I can choose (at my discretion(read whim)), whether or not to use spaces around parens, without changing the meaning" - that is not true: a/(b)/(c) is not equivalent to a/(b) / (c) |
Submitted by: Ladislav "I don't know what "typographic standards" you're referring to, specifically" - I mean the typographic standards establishing how space characters shall be used around parentheses. Such standards exist in our country and in Europe in general. Typographic standards exist in US as well, making the expressions (a)b and (a) b typographically inequivalent. The typographic standard is that if you mean something like (a) b then you must not omit the space if not wanting to change the meaning of the expression. |
Submitted by: Gregg The point in my message was that I used arbitrary spacing around parens, and you understood my meaning just fine. :) Changing the spacing around any of the parens in my message would not change a thing, would it? Please point me to an actual standard. I don't know of any. If we are to be standards compliant, we need an exact standard to adhere to, yes? And examples of why the distinction is important and useful (outside REBOL) would help. |
Submitted by: Ladislav "The point in my message was that I used arbitrary spacing around parens, and you understood my meaning just fine" - well, I demonstrated that you can not use arbitrary spacing around parens in Rebol, since a/(b)/c gives you something else than a/(b) / (c) If you do not know that, you may fall victim to an illusion that you are using arbitrary spacing. However, the examples demonstrate that it is just an illusion. Also, for example, in C you can omit spaces in a + b , but that does not work in Rebol, where a+b is not equivalent to a + b |
Submitted by: Gregg Are the rules murky (unclear) in the path! versus leading-series examples above? For someone new to REBOL, quite possibly. Will requiring spaces between closing and opening square brackets ([[][]] vs [[] []]) break a lot of my code if the meaning is different (or an error is thrown)? Yes. :) Again, please point me to a standard and tell me why the distinction is useful (concrete examples work best for me). |
Submitted by: Ladislav It is possible to make exceptions to this rule, but I demonstrated why it is questionable. The problem with exceptions is that MOLD and LOAD are becoming incompatible, making language design inconsistent (bug-prone), inflexible, irreflexive and unnecessarily limited. Due to these reasons it is considerably better to remain consistent in R3 and respect the rule that space is significant in Rebol. |
Submitted by: Ladislav
To sum up, these non-whitespace characters are are not general delimiters in Rebol. |
Submitted by: Ladislav See http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Brackets_and_parentheses I think that it can be used as an inspiration. Notice the text: " An opening bracket should be preceded by a space, except in unusual cases; for example, when it is preceded by ... a portion of a word" - if you replace "word" by "syntactic element" you get exactly the sense discussed in this ticket. For example, if wanting to write a/(b) / [c] you should never remove the spaces, unless you want to obtain a/(b)/[c] , i.e. a different syntax |
Submitted by: Gregg It also occurred to me that this could affect markup handling quite a bit, correct? That is, "test" would be different than " test ". |
Submitted by: Ladislav |
Submitted by: maxim To me the syntax is not defined by separators, which is why rebol's lexical space is so rich. we can't even say that space is a separator, because the use of strings allow it i.e. we are not forced to escape them like in path! datatypes. The syntax is based on sequences of bytes which terminate by their pattern. As you know we cannot/should not tokenize Rebol by using "separators". This is a strength or Rebol, not as much mathematically as is it in spirit. Trying to describe Rebol to others requires us to use terms others understand, but we are not forced to follow their ideals. |
Submitted by: maxim The rules for a block start at the block marker and end when the block pattern end. The rules for a contruct start at the construct marker, and so one. That is how one defines Rebol. Its an intrinsic part of Rebol. It is also an intrinsic part of PARSE. Each datatype has its own syntax which is not forced to be symmetric with others. Rebol is about being pragmatic. A part from extending some types for additional syntax, I do not think we should start making sweeping changes such as trying to force spaces everywhere. We must not mix-up simplicity and consistency. Rebol syntax is not supposed to be 100% consistent, it is supposed to be simple in the majority use case. |
Submitted by: Ladislav "I will not use syntax in which I am forced to use spaces between each operand and operator. it would be extremely annoying." - my note is that this is how Rebol was designed |
Submitted by: maxim I don't give much weight to "typographical standards". This is not a newspaper, its a programing language. |
Submitted by: Ladislav Also notice that I do not propose that whitespace has to precede, e.g., an opening paren. Instead I do propose that if an opening paren is preceded by a non-whitespace character, the syntax has some special meaning. |
Submitted by: rgchris |
Submitted by: Ladislav "no space required at the beginning/end inside blocks/parens" - yes, that is the standard; the start of the contents is marked by the opening bracket, no additional character is needed, similarly for the end of the contents |
Submitted by: rgchris func [foo][print foo foo]
either something [
this
][
that
] |
Submitted by: Ladislav The standard syntax is: func [foo] [print foo foo] Making the SPEC and BODY argument separate values similarly as in more symbolic func my-spec my-body There is no need or intention to "glue" the SPEC and BODY arguments togeter, similarly as in either something [
this
] [
that
] , where the TRUE-BLOCK and FALSE-BLOCK arguments are separate values not needing to be visibly "glued together". Please note that the standard does allow the case when there is a non-whitespace character preceding the opening bracket or the case when there is non-whitespace character following the closing bracket. However, these are reserved for special syntax, when a block (or paren) is actually a part of a larger syntactic element (a part of a path or something else). |
Submitted by: maxim it is only your opinion that "Space significance is one of the standards (...) respected by Rebol". You bring along a false presumption that spaces have a special significance when they do not. the proof is that there are quite a few places in the language where spaces have NO significance (which is exactly what you show, there are others). I'd really like it if we stopped trying to change things which are not broken in the language. There is no overall separator in Rebol because its not a token language. Each type has a pattern and when that pattern stops, that type ends. there is NO presumption that there should be a space after any of the elements which constitute rebol code. Never have, never should be. The only reason there are spaces is because at some points, you need to end one pattern and start a new one. when you talk about things like "You may prefer unreadable syntax, but I do not mind what kind of text I read, I prefer it to be readable anyway." . If it is your opinion that adding spaces is better for readability, then write code that way. I don't find many things hard to read at all which you attempt to promote. What you are trying to promote here is the same kind of detail which makes other languages tiresome to use. Do I really want to start having damn "syntax" errors every other time I write a line of rebol? DEFINITELY NOT. Are there ambiguities? yes. Are paths a little screwed up or incomplete in their syntax or error reporting right now? yeah. But its not a problem of the whole language. what's next? forcing an end of expression character (like ; in C) or adding commas and parens around all of them.. cause you know these REALLY help readability right? |
Submitted by: abolka This claim is demonstrably false, both in theory and in practice. In fact, most of Rebol's lexical elements indeed require to be followed by EOF or a delimiter such as a whitespace character after the element; other characters besides whitespace which are currently attributed delimiting quality in R3 are: " ( ) / ; [ ] { } |
Submitted by: Ladislav Aha, "screwed up" is "not a problem" because it is "not broken". Have to disagree. |
Submitted by: maxim what I am saying is that space is not a special delimiter. its just one of the characters which is used as a delimiter... not as a token delimiter of the language as a whole. There are different end delimiters for different types, based on their lexical qualities. lets not try to add additional syntax. when the current rules are pretty obvious and clear. Some types might benefit from better lexical parsing (my personal pet peeve is the /this/is/a/list/of/refinements ambiguous syntax). In this case, all we need is to state that "/" is an invalid separator for refinement! and that specific issue is solved (it would raise an invalid refinement error). We don't need to start tacking on sweeping assumptions that spaces have any special significance. a block doesn't require any special character after its end marker to know the block ended. you can stick ANY (even EOF) character after it, and if that starts a valid rebol syntax, it will be loaded. there is no need for a space after the end block marker, which you can consider a delimiter in its own right, but it is part of the block syntax itself. As you know, you cannot just say load "[ 1 2 3" , so within block's point of view "]" is more than just a delimiter, but to other types, it becomes one. there is no point in forcing a space anywhere current delimiters already do their job. |
Submitted by: maxim the only thing that needs work here is to raise a few more errors when tackling paths and other types which allow the "/" as a delimiter when they really shouldn't. We could extend paths in many ways, though it would have to be obvious that some of these would not be valid in DO. It would also require very specific documentation of what occurs with binding. |
Submitted by: Ladislav Nobody says that a block requires any special character after its end marker to know that the block ended. However, the (special or not) character after the block end marker (if any) has to be examined and there actually are two fundamentally distinct possibilities:
For example, if there is a character immediately following the block end marker and it happens to be a slash character, then the slash character is not a block end marker, since no additional block end marker is needed. Let's see an example produced by MOLD function: >> s: mold to path! [[a] b]
== "[a]/b" In this case the block actually is an element of a path, i.e., it is "visually glued" into the path. The "visual gluing effect" in the MOLD result string shall be attributed to the fact that the slash character is immediately following the block end marker, i.e., there is no space between the slash character and the block end marker. The problem is that the LOAD function is incompatible ("disagreeing") with MOLD adding a (significant, unsolicited and wrongly implied) whitespace character producing the result: >> load s
== [[a] /b] , which, due to the above reasons is not a correct interpretation of the LOAD argument string. Summing up, block end marker undoubtedly separates any value inside block from any value outside. However, block end marker does not and by the existing typographic standards cannot separate the block from any subsequent element if no separating character is present. |
Submitted by: Ladislav
, since such syntax would be invalid otherwise. |
Submitted by: Gregg |
Submitted by: Ladislav |
Submitted by: BrianH In this argument, all attempts to cause syntax changes in Rebol by blaming MOLD are not valid, because we can fix MOLD. There are several datatypes for which it is possible to construct values which, when rendered in the normal syntax of that type, would not be interpreted by LOAD as being the same value (ignoring binding issues and offset or cyclic references). These are usually structure types containing values that in literal syntax would have syntax conflicts or character set limitations. I think that for every such datatype, it is possible to know all such combinations ahead of time. It's not completely random, it's a finite set of conflicts that through our own testing we should be able to get a list of. Off the top of my head, combinations of path syntax with certain datatypes (notably dates, and a wide variety of types in the first position), and character set limitations with most word types. There are probably more exceptions, but they're probably a finite set. I think that in those specific cases, MOLD should generate construction syntax. And if any of those types in particular don't support construction syntax, support for that syntax needs to be added. If we are concerned about not unnecessarily exposing offsets we don't want to, construction syntax of path types can skip the offsets when called with MOLD instead of MOLD/all. But it absolutely needs to be loadable as a value or collection of values of the same types that were molded. This is not a LOAD problem, it's a MOLD problem. As a bonus, if we fix MOLD of word types, we can get rid of the character set limitations of words. Words of arbitrary characters won't load properly when expressed as normal literals, but they will have a way to load with construction syntax, so we don't have to consider that error useful anymore. |
Submitted by: BrianH Now, given that we have a serious MOLD problem, let's ignore that for a moment. We might want to change the whitespace significance of certain sequences of characters in our syntax for reasons which have nothing to do with MOLD, but instead because it would let us add new semantic capabilities to Rebol code. Allowing literal non-construction expression of a particular combination of values makes it possible to express that combination in our source code, without resorting to construction syntax. So it may be the case that we may want to tweak things in some cases because it would result in an expression that would be semantically useful, particularly to the DO dialect. For instance, example 5. The only escaping mechanism we have for normally-evaluated paths is parens, and parens in paths are not pretty to look at, but for that they work pretty well. The biggest problem is that we can't escape the first value. That's very semantically limiting to us. Because of that limitation we have to convert calls to retrieved or constructed functions, with refinements expressed, into calls to APPLY or DO TO PATH!. If we were to allow example 5 with Ladislav's desired interpretation, that would let us use (awkward) paths-with-parens syntax instead of (much more awkward) positional APPLY or explicit code generation. So, changing example 5 would be a win for us. Now, for example 4. We don't have a path type that has literal syntax and which evaluates to itself in normal inline evaluation. All of the path types evaluate with side effects. That means that we don't have a path type that can be used as a pure symbol. For words, we have the issue type to serve the role of a pure symbol. So, the for the types of situations where we need a symbol that won't have side effects in normal evaluation, we can use issues; which is what makes them so useful for overlaying a separate evaluation process, such as a preprocessor. However, such preprocessors could benefit from having operations that have options, like regular function calls do, so the lack of a corresponding issue path type limits their semantics. So, changing example 4 by creating a new issue path type could be a win to us, even if we fake it by just making it a regular path! with an issue as the first element. As long as issue-paths, real or fake, evaluated to themselves with no side effects, we could use them. Now, for the rest of the examples, we gain no benefit from making those distinctions. In some cases (like example 3) we might benefit from triggering a syntax error (because refinement paths are a really bad idea which might be worth making a permanent error just on principle). But in other cases, it doesn't do any harm to be flexible in what we accept. The only reasons we should trigger syntax errors for these is if you wanted to explicitly make sure they never get used (like commas), or if you have a particular use for that syntax variant planned for the future. But beware, for any syntax variants you want to reserve for future use, the rest of the examples here are mentally ambiguous and so would be poor choices for those future features. The current flexible behavior has the benefit of making an obvious choice for the ambiguous situation, and of precluding the other interpretation that makes it ambiguous in the first place. As for heredoc syntax, didn't you have a ticket for that, Ladislav? The syntax you came up with for that in #1194 was great, and didn't need any of these changes. What happened to that? |
Submitted by: Ladislav Also, I see fixing LOAD/MOLD as a syntax change. |
Submitted by: Ladislav The construction syntax for paths is not very friendly, I do not think people want to use it; the problem is that it has significant typing overhead. |
Submitted by: fork New users often ask why Pursuant to this I think spaces should be required around strings and tags. There may be interesting meanings for However, I feel that one should be careful with brackets (and likely parentheses, although they may be different). If these two cases are equal: while[...][...] <=> while[...] [...] I do not think it's good for that to be different from these two equal cases (if they would be equal): while [...][...] <=> while [...] [...] The language is a bit confusing as it stands, and this does not have a "Quality" feel. One might push for an "Outer Space Proposal", where brackets must be spaced and not crunched as So regardless of how the other space significance issues fall out, I would not be in support of |
Submitted by: MarkI |
Submitted by: fork "regardless of how the other space significance issues fall out, I would not be in support of But after @earl challenged that, and I re-analyzed my discomfort with this, I've decided to take that back. I perhaps underestimated the adjustment that would occur in perception if people were not conditioned to seeing That perception would be helped by the absence of casual usages of So I now align myself with @earl (and any others) who have been suggesting a plan for respecting space significance with four exceptions: Although a tweak like this might not have the pleasure of a "absolutist formalism", it is solidly defined and doesn't seem to have any "holes" in a mechanical sense. It also seems to permit the cases that people find hardest to accept about what a "totally Outer-Space" proposal would break. Also, nothing stops anyone--in their own code--from always using the spaced version. From where I'm sitting right now, it seems a good compromise. |
Submitted by: Gregg New thoughts about atomicity +1 error? any [word[] word{} word()](until we know what to do with it) +.75 ``and This is tough. We have parts of speech, and we have punctuation. I also want quite a bit of freedom in formatting in order to communicate my thoughts and intent. Not having seen the SO chat on this, it would be great if there were a summary of that. I don't want to resist change for the sake of resisting change, and I'm all for consistency. My hesitance stems more from "space significance" without regard to punctuation being the goal. -4 considers this, and I think Carl did as well in Rebol's design. We can't separate the two. Or at least I don't think we should. Questions:
|
Submitted by: Ladislav
One of the Rebol design principles is to respect existing standards where it is of advantage. Space significance is one of the standards (space significance belongs to typographic standards) respected by Rebol. Examples 1 to 2 demonstrate space significance in Rebol. Rebol user's guide mentions siginficance of space in Rebol stating:
"White-space is used in general for delimiting"
Advantages of space significance:
However, Rebol is not consistent in attributing significance to space as examples 3 to 11 demonstrate. I am not the only user who classifies the behaviour demonstrated by the exceptional examples "unexpected" (see also #2088).
Disadvantages of space-insignificant exceptional behaviour:
The main disadvantage of the exceptional behaviour is that the exceptions "steal" the (otherwise valid and sometimes already produced by the MOLD function) syntax. The syntax ("ignored" by the LOAD function) is useful, e.g., for paths that already exist and work in Rebol. This is demonstrated in example 5.
CC - Data [ Version: r3 master Type: Bug Platform: All Category: Syntax Reproduce: Always Fixed-in:none ]
The text was updated successfully, but these errors were encountered: