make S"..." and "..." throw errors identically #107

StefanKarpinski · 2011-07-11T01:24:10Z

We're close to this, but the error messaging of these two forms is still quite different — the cases in which errors occur is (nearly) the same, but the way errors are shown is quite different. For maximum seamlessness between the two forms, all errors should appear the same.

vtjnash · 2013-02-01T06:06:25Z

@StefanKarpinski when do these throw an error? and what is an example of the output in each case?

JeffBezanson · 2013-03-08T04:20:37Z

I guess this is an example:

julia> "\xz"
ERROR: syntax: invalid escape sequence

julia> E"\xz"
ERROR: \x used with no following hex digits

I almost wonder if escape sequences and interpolation should just be features of all strings, period, and custom string literals are just a way to invoke a macro on that parser output. For interpolation, you'd get multiple arguments, some of which are strings. This has the advantage of being really predictable --- the escaping rules never change --- plus custom string literals can support interpolation, and you don't have to know about calling unescape or interp_parse. \ and " are unavoidably metacharacters (for example L"\" is never going to work), so why not just add $ to that list.
Or if there were no interpolation, it would be even better, but let's not gild the lilly :)

cc @nolta

StefanKarpinski · 2013-03-08T04:24:01Z

Regular expressions would not fare well under such a plan.

JeffBezanson · 2013-03-08T04:35:07Z

All you have to do is escape $ and . I'm not even sure it's unequivocally good that regexes don't support escape sequences right now.

pao · 2013-03-08T04:52:17Z

Not sure what you're getting at Jeff. PCRE supports escapes just fine:

julia> m = match(r"\012", "\012")
RegexMatch("\n")

JeffBezanson · 2013-03-08T04:55:12Z

Oh, I saw this:

julia> r"\u2200"
ERROR: compile: PCRE does not support \L, \l, \N{name}, \U, or \u at position 2 in "\\u2200"

So we are dealing with 2 escape syntaxes, which is also not great.

pao · 2013-03-08T04:57:58Z

Ahh, fair enough.

StefanKarpinski · 2013-03-08T05:32:15Z

Ah, yeah, that's a good point. Slashes, however still present a problem as far as I can tell. How can you possibly make r"\d" work?

StefanKarpinski · 2013-03-08T05:32:39Z

Backslashes, rather.

JeffBezanson · 2013-03-08T05:56:34Z

Under this design r"<string>" would always be equivalent to Regex("<string>") with the exact same text in both cases. So you would write r"\\d".

StefanKarpinski · 2013-03-08T06:06:44Z

Honestly, I feel that's unacceptable.

JeffBezanson · 2013-03-08T06:50:38Z

In that case it might help just to get rid of the str macros that produce strings, and only use custom string literals for different types of objects. The idea then is that custom string literals are different enough not to want the same features that normal string literals have.

Meanwhile, b_str and I_str are now special cases in the parser since they support interpolation, so they are no longer legit string macros. But maybe b_str isn`t as useful as we thought; we haven't used it anywhere (though maybe somebody has).

So you see I find this messy and unsatisfying. True, r"\\d" is very annoying, but there are no deep weird fiddly bugs involved. However we could improve the situation without changing the language by removing all str macros except r_str and v_str.

One telling flaw is that a str macro can't end with a single backslash. Another one, AFAICT, is that there is no way to put a " in a regex literal without also inserting a backslash. Granted those cases are rarer than \d, but they are why I prefer to have one universal escaping scheme and stick with it. Nobody knows what r"\"" means.

Come to think of it, maybe this is a regression?

julia> r"\""
r"\\""

If these things don't support escapes, then we should say you just can't have embedded quote characters, and you get whatever text is before the next " no matter what.

pao · 2013-03-08T13:58:44Z

Another one, AFAICT, is that there is no way to put a " in a regex literal without also inserting a backslash.

That could be fixed with allowing triplequoting for regexes, which would be useful with the m suffix anyways.

StefanKarpinski · 2013-03-08T15:32:30Z

There are a few separate issues at play here for non-standard string literals:

The parser handling quote escaping (good).
The parser handling general escaping (bad).
The parser handling interpolation (mixed).

The reason the L"\"" thing is weird is the same reason you can't put a single " into a regex literal without a backslash: the parser knows to ignore escaped quotes for parsing but doesn't remove them. This is not really right and there's no fully correct way around it; the solution is that the parser should handle a layer of escaping. This is slightly tricky to make so it interacts well with additional layers of escaping, but it can be done.

The issue of how regular expressions handle things like r"\u2200" is just a bug. I passed things through to PCRE as-is because it seemed to work and I didn't really know what else to do at the time. The correct behavior is to provide a full, correct translation from surface escapes to escapes and/or raw unicode data that PCRE understands. This is actually an argument against the parser handling escaping, rather than for it, since otherwise we'll never have the flexibility and power to get this right.

We've come around to the idea that the parser should handle interpolation in standard string literals, which I think is good, and has nothing to do with general escaping in standard or non-standard string literals. When x"\a$b\c" is encountered, this could be parsed as a call to @x_str L"\a" b L"\c" so that the parser handles everything dealing with quoting but the varargs @x_str macro still gets to handle escaping (typically using the escape_string utility function). This would even allow interpolation into regular expressions, although, of course, that would entail recompilation on every execution. Non-standard string literals that don't handle interpolation at all could simply emit a clear error message if you try to do interpolation.

The main downside of doing interpolation for all string literals is that you can't have non-standard literals where $ as just a character. That's especially problematic for regular expressions where $ already has a meaning and is extremely common. Perl handles this in classic fashion by cleverly disambiguating the two cases for you in the parser. We could try to do something like that, but it would get pretty insane.

I want to point out that my original proposal was that the parser handle string interpolation for standard string literals while no interpolation would be done for non-standard string literals. I'm not clear on why that's not an acceptable solution. The only counterargument I've seen anywhere is b"" strings, which do binary interpolation in their current form, but as @JeffBezanson pointed out, are not used anywhere.

nolta · 2013-03-08T21:20:48Z

This might be a band-aid, but it appears we can fix the r"\u2200" problem by adding JAVASCRIPT_COMPAT to the list of DEFAULT_OPTS in regex.jl:

julia> s = "\u2200"
"∀"

julia> ismatch(r"\u2200", s)
true

StefanKarpinski · 2013-03-08T21:33:16Z

Well, that's certainly better than failing :-)

JeffBezanson · 2013-03-08T22:17:55Z

Thanks mike for fixing that.

I don't see why it's bad for the parser to handle escaping in normal strings. It's part of the lexical syntax. There is also escaping in char literals, which there aren't macros for.

In custom string literals, the options are (1) have no parser escaping of quotes at all, or (2) make the sequences \\ and \" special, for inserting a single backslash or a quote. So you could still have r"\d", but r"\\" would be interpreted by the parser.

StefanKarpinski · 2013-03-08T22:22:59Z

I have no problem with the parser handling escaping in normal strings or characters. That's completely sensible. I was exclusively talking about non-standard strings, and I was arguing for option (2), making sure that it plays well with subsequent layers of unescaping. That still doesn't address the interpolation issue.

JeffBezanson · 2013-03-08T22:27:29Z

Ok, so r"\\" will be a regex of a single backslash (which gives a PCRE error, which is fine)? This is fine with me.

For interpolation, let's just get rid of b_str and friends. Or we could keep only b_str (since it yields an array, not a string), but not have it support interpolation. The less interpolation the better :)

StefanKarpinski · 2013-03-08T22:33:57Z

Sure, that seems fine. As long as r"\d" works. I haven't entirely thought through how the parser unescaping needs to interact with the macro unescaping though. Seems like you have a handle on it.

StefanKarpinski · 2013-03-08T22:34:18Z

Sorry.

vtjnash · 2013-03-09T10:30:31Z

so, how are these supposed to be written now:

    const path_separator    = "\\"
    const path_separator_re = r"[/\\]+"
    const path_absolute_re  = r"^(?:\w+:)[/\\]"
    const path_directory_re = r"(?:^|[/\\])\.{0,2}$"
    const path_dir_splitter = r"^(.*?)([/\\]+)([^/\\]*)$"
    const path_ext_splitter = r"^((?:.*[/\\])?(?:\.|[^/\\\.])[^/\\]*?)(\.[^/\\\.]*|)$"

    function splitdrive(path::String)
        m = match(r"^(\w+:|\\\\\w+\\\w+|\\\\\?\\UNC\\\w+\\\w+|\\\\\?\\\w+:|)(.*)$", path)
        m.captures[1], m.captures[2]
    end

JeffBezanson · 2013-03-09T10:48:02Z

There are two options: replace \\ with \\\\, or change the parser again so that only \" is special in custom string literals. The second option is better for the code here, and its only cost is that a regex literal can't end in a backslash.

JeffBezanson · 2013-03-09T11:24:02Z

Ok for now I have put it so those regexes will continue to work as before.

mgkuhn · 2022-08-03T13:29:02Z

The “band-aid” fixed the match(r"\u2200", "\u2200") example simply because that is where JavaScript and Julia syntax overlap, but it fails already at minor variations where they don't, such as match(r"\U102200", "\U102200"). It just partially masks the fact that in the first string the \ is a PCRE escape character, whereas in the second string it is a Julia string-literal escape character. On the other hand, it now confuses users of the \x syntax in PCRE, which was disabled by adding the PCRE.JAVASCRIPT_COMPAT regex compile flag (or today PCRE.ALT_BSUX as it's called in PCRE2). See #46137.

Handle :cfunction exprs. Fixes #105

…#107)

ghost assigned StefanKarpinski Jul 11, 2011

nolta added a commit that referenced this issue Mar 8, 2013

fix r"\u2220" bug mentioned in #107

7909e3d

StefanKarpinski closed this as completed Mar 8, 2013

StefanKarpinski reopened this Mar 8, 2013

JeffBezanson closed this as completed in 7123ad3 Mar 8, 2013

iceblue25 mentioned this issue Sep 14, 2015

build fail on windows casued by typo in source code #5426

Closed

kleinhenz mentioned this issue Nov 7, 2016

RFC: add raw_str macro for raw strings with no interpolation/unescaping #19254

Closed

kleinhenz mentioned this issue Jan 6, 2017

RFC: add raw_str macro for strings with no interpolation/unescaping #19900

Merged

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

Fix #107

be55b83

StefanKarpinski added a commit that referenced this issue Feb 8, 2018

rename hash-sha1 to git-tree-sha1 (#107)

d96e19b

vchuravy mentioned this issue Feb 8, 2018

use git subtree to graft Pkg3, etc. into stdlib #25942

Closed

jiaqiwang969 mentioned this issue Jan 11, 2021

Darwin/ARM64 tracking issue #36617

Closed

31 tasks

bryannagle mentioned this issue Apr 1, 2021

Julia 1.6 Dependency "Snappy" Fails to build on Alpine Linux #40299

Closed

LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Oct 11, 2021

Fix JuliaLang#106, fix JuliaLang#107

6aaf697

senji77 mentioned this issue Nov 4, 2021

Error using pyplot #42934

Closed

mgkuhn mentioned this issue Aug 3, 2022

Regex bug: Unicode hex ranges not supported #46137

Open

tpresser570 mentioned this issue Jul 1, 2023

Cannot use DifferentialEquations on Mac M2 - ARM chip #50382

Closed

Keno pushed a commit that referenced this issue Oct 9, 2023

Merge pull request #107 from JuliaDebug/teh/fix_105

f92487c

Handle :cfunction exprs. Fixes #105

udesou pushed a commit to udesou/julia that referenced this issue Nov 22, 2023

bugfix: don't set pool_live_bytes to zero at the end of GC (JuliaLang…

6afe64e

…#107)

NHDaly pushed a commit that referenced this issue May 22, 2024

bugfix: don't set pool_live_bytes to zero at the end of GC (#107)

b351973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make S"..." and "..." throw errors identically #107

make S"..." and "..." throw errors identically #107

StefanKarpinski commented Jul 11, 2011

vtjnash commented Feb 1, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

pao commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

pao commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

pao commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

nolta commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

vtjnash commented Mar 9, 2013

JeffBezanson commented Mar 9, 2013

JeffBezanson commented Mar 9, 2013

mgkuhn commented Aug 3, 2022

make S"..." and "..." throw errors identically #107

make S"..." and "..." throw errors identically #107

Comments

StefanKarpinski commented Jul 11, 2011

vtjnash commented Feb 1, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

pao commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

pao commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

pao commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

nolta commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

JeffBezanson commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

StefanKarpinski commented Mar 8, 2013

vtjnash commented Mar 9, 2013

JeffBezanson commented Mar 9, 2013

JeffBezanson commented Mar 9, 2013

mgkuhn commented Aug 3, 2022