Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StringFormatted, PrintFormatted, PrintToFormatted #2200

Merged
merged 1 commit into from
Mar 29, 2018

Conversation

ChrisJefferson
Copy link
Contributor

@ChrisJefferson ChrisJefferson commented Feb 23, 2018

This PR adds a simple method of formatting strings, strongly inspired by Python's "format" function. It is probably easiest to demonstrate by example:

# {} goes through a list in turn
gap> Format("I have {} cats and {} dogs", [3,2]);
"I have 3 cats and 2 dogs"
# Or use {<number>} to take a specific element from a list
gap> Format("I have {2} cats and {1} dogs", [3,2]);
"I have 2 cats and 3 dogs"
# Or use {<string>} to pick elements of a record
gap> Format("I have {apple} cats and {pear} dogs", rec(apple:=3, pear:=2);
"I have 3 cats and 2 dogs"
# Use {{ and }} to actually print a { or a }
gap> Format("I want an {{x and a {x}", rec(x := 3));
"I want an {x and a 3"

There is also a FormatTo variant, which prints to a stream.

I've been using this for a while, as I find building these types of simple format strings is quite painful in GAP. There is some other stuff from Python (padding, formatting of floating points), I haven't implemented yet, and might, but I thought first I'd check if this function seemed generally interesting to people.

Copy link
Member

@markuspf markuspf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems I see coming is that

lib/string.gi Outdated

InstallGlobalFunction(FormatTo, function(stream, s, args)
local pos, len, outstr, nextbrace, endbrace, id, argcounter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a (good) reason why you don't use args...?

There is the option to pass a record, but that can be handled even in the presence of args...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I started off with a list, but there isn't really a good reason for that. I'll switch to args...

lib/string.gi Outdated

InstallGlobalFunction(FormatTo, function(stream, s, args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

lib/string.gi Outdated
if Int(id) < 1 or Int(id) > Length(args) then
ErrorNoReturn("out of bounds in Format -- asked for {",Int(id),"} when there are only ",Length(args), " arguments");
fi;
AppendTo(stream, String(args[Int(id)]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String is problematic because not many objects support String and just return <object>, also one could argue that String builds up a buffer, whereas PrintTo or AppendTo let the object's implementor decide what to do.

why not just args[Int(id)]?

There is of course the question whether you want to support String, Display and View as format strings...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #1829

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to pick which we want (string, display or view), but if you just pass in strings then of course they'll be printeed fine (I could check IsString to avoid the extra call), but it seems a pain to make people call String if they just want to pass in say a permutation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about DisplayString (they generally don't compose very well) but one way or another it could be useful to be able to include either the ViewString or the String of an object, depending on what you want the formatted string for. Could be an option encoded somehow in the format-string, or a second function FormatView (or some better name).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with names elsewhere the function should ideally be a noun describing the result, rather than a verb which suggests it modifies its argument. We have FormattedString but it's marked obsolete, so could possibly be resused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re String/Display/View: In a printf style function, I'd handle that with different placeholders in the format string, say: Format("%{display} vs. %{view} vs. %{string}", myObj, myObj, myObj); to get a comparison of the three strings.

Looking at https://pyformat.info, I guess the "pythonic" way would be to use a "value conversion" suffix, like Format("{0!d} vs. {0!v}. vs {0!s}", myObj). If we default to using String, then there could also be something to force "raw" output, i.e. no conversion. Or conversely, we could default to not using String (as @markuspf suggested), but still allow people to enable it via a value conversion suffix.

@markuspf
Copy link
Member

I think we want this, but it might need some additional thought...

@frankluebeck
Copy link
Member

I have never missed such a function but can see that people find this useful. I'm only a bit concerned about its proposed name. I had a vague plan to propose "Format" as a name for some improved version of String which gets a second argument like '"Text", '*HTML", "LaTeX" (and maybe further arguments). Some version of this exists in Jean Michel's version of GAP3.

Would it be ok for you to change the name? Say, ComplementedString?

@ChrisJefferson
Copy link
Contributor Author

ChrisJefferson commented Mar 2, 2018

I am happy to change the name. I started with Format just because that is what Python uses, and I based my inspiration on that. I will think about other possible names.

PrintFormatted and PrintFormattedTo (or even.. Printf and PrintfTo, although it's not the same style as C printf?)

@fingolfin
Copy link
Member

While I don't mind PrintFormatted, I find the argument @frankluebeck raises as to why Format is problematic somewhat strange -- essentially it seems a case of "calling dibs" on a name?!? I.e. you seem to say that there is a possibility you might (or not) one day submit code which may perhaps want to use the name Format for another function?

That's not an argument I'd expect. Perhaps what is meant here is that "Jean Michel's version of GAP3" has a function Format, and the desire here is to have a similar function with the same name in GAP 4, which I'd find somewhat more compelling. Albeit not very much.

Moreover, the function in this PR seems to me (quite subjectively, of course) more useful to a wider audience than a function which produces HTML or LaTeX output (I can't really judge the latter, though, as I don't really know what it's supposed to be). So I'd rather have this here under the name Format than some function to produce HTML/LaTeX whatever which I likely would never use.

I am not so happy with the name PrintFormatted, because it suggests to me that output will be sent to stdout, not into a string. That is something I find a bit weird in the current PR, too, though; i.e. that Formatted returns a string, while FormattedTo prints into a stream. I'd find FormattedString (new name for Format), PrintFormatted (does not exist in this PR right now) and PrintFormattedTo (new name for FormatTo) more logical then. (This also matches an earlier suggestion by @stevelinton).

@fingolfin
Copy link
Member

Of course, as I just realize, those names kind of clash with GAPDoc's PrintFormattedString sigh

@frankluebeck
Copy link
Member

@fingolfin: I don't understand what you want to say with your lengthy comment. I certainly didn't give a strong argument and this is not the place to discuss another function.

But I think there is a point that Format might be a compact name for more general functionality and should not be used up for the specialized string substitution function proposed here.

Of course, if people disagree we will find another name for future functions.

In the context of PrintFormattedString the Formatted means that the line breaks should not be changed while the string is printed.

@ChrisJefferson
Copy link
Contributor Author

I've now gone for StringFormatted, PrintFormatted and PrintToFormatted (which are, hopefully obviously, designed to mirror String, Print and PrintTo, with a formatting option).

@codecov
Copy link

codecov bot commented Mar 25, 2018

Codecov Report

Merging #2200 into master will increase coverage by 0.01%.
The diff coverage is 96.73%.

@@            Coverage Diff             @@
##           master    #2200      +/-   ##
==========================================
+ Coverage   72.65%   72.66%   +0.01%     
==========================================
  Files         478      478              
  Lines      246633   246726      +93     
==========================================
+ Hits       179187   179284      +97     
+ Misses      67446    67442       -4
Impacted Files Coverage Δ
lib/string.gi 47.87% <96.73%> (+6.56%) ⬆️
src/hpc/traverse.c 95.75% <0%> (-0.59%) ⬇️
src/hpc/threadapi.c 37.41% <0%> (ø) ⬆️
hpcgap/lib/package.gi 68.48% <0%> (+0.04%) ⬆️
lib/package.gi 69.89% <0%> (+0.04%) ⬆️
hpcgap/lib/hpc/stdtasks.g 38.87% <0%> (+0.25%) ⬆️
lib/record.gi 69.82% <0%> (+0.86%) ⬆️
hpcgap/lib/hpc/queue.g 69.59% <0%> (+3.19%) ⬆️

lib/string.gd Outdated
## <Func Name="PrintToFormatted" Arg='stream, string, data...'/>
##
## <Description>
## These functions format an input string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they, though? Don't they rather take a format specification, and a bunch of input data, and use the format string to format the data, and turn that into a string (resp. print it to a stream)?

Looking at Python's str.format:

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument. [... example ...] See Format String Syntax for a description of the various formatting options that can be specified in format strings.

Perhaps this can serve as some inspiration?

lib/string.gd Outdated
## <Ref Func="PrintToFormatted"/> appends the formatted string to <A>stream</A>,
## which can be either an output stream or a filename.
## <P/>
## <A>data</A> is a list of values to printed, using the formatting rules below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to be printed"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this can be confusing: the user might think that the second (resp third, in case of PrintToFormatted) argument should be a list, containing the values. Perhaps like this:

All arguments after <A>string</A> form the list <A>data</A> of values to be printed, ...

lib/string.gd Outdated
## <A>string</A> is treated as a normal string, except for occurrences
## of <C>{</C> and <C>}</C>, which follow special rules, as follows:
##
## The contents of <C>{ }</C> form a small language. The format is <C>{id!format}</C>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"form a small language" -> is that helpful the reader? What helpful information is it meant to convey? I can't think of any right now, to me it sounds mostly scary. But of course that may be highly subjective!

lib/string.gd Outdated
## </Item>
## </List>
##
## The <C>format</C> decides how the variable is printed. <C>format</C> must be one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"decides" -> "determines"

lib/string.gd Outdated
## <Mark>An integer <C>i</C></Mark> <Item>
## Take the <C>i</C>th element of <A>data</A>.
## </Item>
## <Mark>No variable</Mark> <Item>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "No variable" mean here? Perhaps "Not given" or, better, "No id given" ?

lib/string.gd Outdated
## <Mark>No variable</Mark> <Item>
## Take the <C>j</C>th element of <A>data</A>, where <C>j</C> is the
## number of occurrences of <C>{}</C> in the string, including this one,
## up to this point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also count {!format}?

lib/string.gd Outdated
## </Item>
## <Mark>A string <C>str</C></Mark> <Item>
## Take the <C>str</C> member of the first element of <A>data</A>, which
## must be a record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have "definitions" before using them, in this case: saying that the first entry must be a record before makign use of it (by taking about its "str member"). How about this:

If this is used, then the first element of data must be a record r; in this case, the value r.(str) is taken.

@ChrisJefferson
Copy link
Contributor Author

This is now (I think) fixed up.

lib/string.gd Outdated
## Each replacement field contains a numeric or positional argument,
## describing the element of <A>data</A> to replace the braces with.
##
## There are three Formatting functions, which differ only in how they
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting -> formatting ?

lib/string.gd Outdated
## inserted into <A>string</A>, using the following formatting rules:
## <P/>
## <A>string</A> is treated as a normal string, except for occurrences
## of <C>{</C> and <C>}</C>, which follow special rules, as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, re-reading this now, it occurred to me that it's unclear how one can insert braces into the format string that are not interpolated away? In Python this can be done via double braces {{ and }}.

UPDATE: after looking at the example below, and studying the code, it seem you support this, too! Great! But it's not documented, is it? Should be.... ;-)

lib/string.gd Outdated
##
## <Description>
## These functions perform a string formatting operation.
## They accept an input string, which can contain fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "format string" instead of "input string"?

lib/string.gd Outdated
## <Description>
## These functions perform a string formatting operation.
## They accept an input string, which can contain fields
## to be replaced, which are denoted with {}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds as if only {} is allowed.

How about this (and then you can remove or shorten the mention of data later on):

They accept a format string, followed by an arbitrary number of additional arguments forming a list data.
The format string can contain replacement fields delimited by braces {}.

lib/string.gd Outdated
## </Item>
## <Mark>No id given</Mark> <Item>
## Take the <C>j</C>th element of <A>data</A>, where <C>j</C> is the
## number of occurrences of <C>{}</C> in the string with no id given,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

... the number of previous/prior/preceding replacement fields without an id ...

BTW, this deviates from what Python does; there, either no replacement field has an id, or all must have one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mis-interpreted what python did. I think I'll switch to this, as it seems simpler.

lib/string.gi Outdated
pos := 0;

if not (IsOutputStream(stream) or IsString(stream)) or not IsString(s) then
ErrorNoReturn("Usage: PrintToFormatted(<stream>, <string>, <args>...)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace args by data to match documentation

lib/string.gi Outdated

InstallGlobalFunction(PrintToFormatted, function(stream, s, args...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace args by data to match documentation? (Or vice versa, adjust the documentation)

lib/string.gi Outdated

if toprint.id = "" then
if Length(args) < argcounter then
ErrorNoReturn("out of bounds in StringFormatted -- used ",argcounter," {} when there are only ",Length(args), " arguments");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "{}" in this error message to "replacement fields without id"

lib/string.gi Outdated
AppendTo(stream, ViewString(var));
elif toprint.type = "d" then
AppendTo(stream, DisplayString(var));
else ErrorNoReturn("Invalid formatting option: '", toprint.type, "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a third name: after "format" and "type" this is now called "formatting option".

"SymmetricGroup( [ 3 .. 5 ] ) Alt( [ 1, 3 .. 5 ] ) <object>\n"
gap> StringFormatted("{a!s} {b!v} {c!d}", rec(a := r1, b := r2, c := r3));
"SymmetricGroup( [ 3 .. 5 ] ) Alt( [ 1, 3 .. 5 ] ) <object>\n"
gap> StringFormatted("{a!x}", rec(a := r1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it difficult to read this test file, and to see what's there -- so if I wanted to add more tests, I'd struggle to decide where to do so.

Perhaps consider structuring this test file a bit by inserting some empty lines to separate tests into "logical" groups? (Well, empty lines followed by a comment, to suppress whitespace diffs in the test; like this:

gap> StringFormatted("{1}", rec());
"rec(  )"

# test format options
gap> r1 := SymmetricGroup([3..5]);;
gap> r2 := AlternatingGroup([1,3,5]);;

lib/string.gd Outdated
##
## A single brace can be outputted by doubling, so <C>{{</C> in the format string
## produces <C>{</C> and <C>}}</C> produces <C>}</C>.
## </P>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax error, change </P> to <P/>

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am pretty much happy now. One XML syntax error, and two minor phrasing suggestions left.

lib/string.gd Outdated
## <Description>
## These functions perform a string formatting operation.
## They accept a format string, which can contain replacement fields
## which are delimated by braces {}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delimated -> delimited

lib/string.gd Outdated
## which can be either an output stream or a filename.
## <P/>
## The arguments after <A>string</A> form a list <A>data</A> of the values to be
## inserted into <A>string</A>, using the following formatting rules:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the values are not inserted into string (which indeed can be immutable). Perhaps

... form a list data of values used to substitute the replacement fields in string, according to the following formatting rules

This PR adds a simple method of formatting strings, strongly
inspired by Python's "format" function. These 3 functions
mirror Sting, Print and PrintTo.
@fingolfin fingolfin added kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements kind: new feature labels Mar 29, 2018
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No complaints from me this time around :-)

@fingolfin fingolfin changed the title Add a simple Format function Add StringFormatted, PrintFormatted, PrintToFormatted Mar 29, 2018
@fingolfin fingolfin merged commit a29a129 into gap-system:master Mar 29, 2018
@ChrisJefferson ChrisJefferson deleted the format branch June 11, 2018 19:02
@fingolfin fingolfin added the release notes: added PRs introducing changes that have since been mentioned in the release notes label Sep 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements kind: new feature release notes: added PRs introducing changes that have since been mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants