Skip to content

msaboff/proposal-regexp-dotall-flag

 
 

Repository files navigation

ECMAScript proposal: s (dotAll) flag for regular expressions

Status

This proposal is in stage 3 of the TC39 process.

Motivation

In regular expression patterns, the dot . matches a single character, regardless of what character it is. In ECMAScript, there are two exceptions to this:

ECMAScript recognizes the following line terminator characters:

  • U+000A LINE FEED (LF) (\n)
  • U+000D CARRIAGE RETURN (CR) (\r)
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

However, there are more characters that, depending on the use case, could be considered as newline characters:

  • U+000B VERTICAL TAB (\v)
  • U+000C FORM FEED (\f)
  • U+0085 NEXT LINE

This makes the current behavior of . problematic:

  • By design, it excludes some newline characters, but not all of them, which often does not match the developer’s use case.
  • It’s commonly used to match any character, which it doesn’t do.

The proposal you’re looking at right now addresses the latter issue.

Developers wishing to truly match any character, including these line terminator characters, cannot use .:

/foo.bar/.test('foo\nbar');
// → false

Instead, developers have to resort to cryptic workarounds like [\s\S] or [^]:

/foo[^]bar/.test('foo\nbar');
// → true

Since the need to match any character is quite common, other regular expression engines support a mode in which . matches any character, including line terminators.

  • Engines that support constants to enable regular expression flags implement DOTALL or SINGLELINE/s modifiers.
    • Java supports Pattern.DOTALL.
    • C# and VB support RegexOptions.Singleline.
    • Python supports both re.DOTALL and re.S.
  • Engines that support embedded flag expressions implement (?s).
  • Engines that support regular expression flags implement the flag s.

Note the established tradition of naming these modifiers s (short for singleline) and dotAll.

One exception is Ruby, where the m flag (Regexp::MULTILINE) also enables dotAll mode. Unfortunately, we cannot do the same thing for the m flag in JavaScript without breaking backwards compatibility.

Proposed solution

We propose the addition of a new s flag for ECMAScript regular expressions that makes . match any character, including line terminators.

/foo.bar/s.test('foo\nbar');
// → true

High-level API

const re = /foo.bar/s; // Or, `const re = new RegExp('foo.bar', 's');`.
re.test('foo\nbar');
// → true
re.dotAll
// → true
re.flags
// → 's'

FAQ

What about backwards compatibility?

The meaning of existing regular expression patterns isn’t affected by this proposal since the new s flag is required to opt-in to the new behavior.

How does dotAll mode affect multiline mode?

This question might come up since the s flag stands for singleline, which seems to contradict m / multiline — except it doesn’t. This is a bit unfortunate, but we’re just following the established naming tradition in other regular expression engines. Picking any other flag name would only cause more confusion. The accessor name dotAll gives a much better description of the flag’s effect. For this reason, we recommend referring to this mode as dotAll mode rather than singleline mode.

Both modes are independent and can be combined. multiline mode only affects anchors, and dotAll mode only affects ..

When both the s (dotAll) and m (multiline) flags are set, . matches any character while still allowing ^ and $ to match, respectively, just after and just before line terminators within the string.

Specification

Implementations

About

Proposal to add the `s` (`dotAll`) flag to regular expressions in ECMAScript.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 100.0%