Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate optimal formatters at compile time #613

Closed
vitaut opened this issue Nov 24, 2017 · 31 comments
Closed

Generate optimal formatters at compile time #613

vitaut opened this issue Nov 24, 2017 · 31 comments

Comments

@vitaut
Copy link
Contributor

vitaut commented Nov 24, 2017

As pointed out by Louis Dionne in #546:

Not only could you perform the safety checks at compile-time, but you could also generate optimal code by parsing everything at compile-time.

This can make the format API as efficient as the write API and eliminate the need in the latter.

@daltonhildreth
Copy link

Perhaps I'm misunderstanding this issue, but is this even possible right now?

From my brief attempts at playing around, it would seem you need constexpr function parameters to do this. You would somehow need to split a string in a function at compile-time, yet I know no valid way to achieve this:

constexpr void split(constexpr const char* s) {
    // I'm not super familiar with fmt's internals/API
    // count_replacements would be constexpr and count the number of {}'s, or similar.
    std::array<const char*, count_replacements(s) + 1> text;
    std::array<fmt::?, count_replacements(s)> formatting;
    //...
}

The problem is this also needs to be able to run at run-time, which makes no sense for the template of std::array (or C arrays).

Perhaps this is possible with odd manipulation of template character packs? I haven't figured out a proof-of-concept for that, though.

@vitaut
Copy link
Contributor Author

vitaut commented Oct 22, 2018

is this even possible right now?

It should be possible.

it would seem you need constexpr function parameters to do this

This is correct and {fmt} emulates them with some constexpr lambda and macro magic:

#define FMT_STRING(s) [] { \

You can overload on constexprness of a format string with the help of the is_compile_string trait.

@bebuch
Copy link

bebuch commented Dec 13, 2018

A better solution will be enabled by C++20:

https://wg21.link/P0732

@vitaut
Copy link
Contributor Author

vitaut commented Mar 14, 2019

A prototype of generating optimal formatters at compile time for a subset of std::format/{fmt}'s syntax by Hana: hanickadot/compile-time-regular-expressions@13afb2c

@vitaut
Copy link
Contributor Author

vitaut commented Apr 14, 2019

In C++20 we will be able to do this:

#include <string>

template <typename Char, size_t N>
class basic_fixed_string {
 private:
  Char data[N] = {};

 public:
  constexpr basic_fixed_string(const Char (&s)[N]) {
    for (size_t i = 0; i < N; ++i)
      data[i] = s[i];
  }
};

template <typename Char, size_t N>
basic_fixed_string(const Char (&s)[N]) ->
  basic_fixed_string<Char, N>;

template <basic_fixed_string, typename... Args>
std::string format(const Args&... args) {
  return {};
}

auto s = format<"{}">(42);

which is a pretty horrible API. Need to find a way to pass a compile-time string as an argument to make it usable.

@vitaut
Copy link
Contributor Author

vitaut commented Apr 14, 2019

So far the most usable API is

#include <string>

template <typename Char, Char...>
struct format_string {};

template <typename Char, Char... CHARS>
constexpr format_string<Char, CHARS...> operator""_f() {
  return {};
}

template <typename S, typename... Args>
std::string format(const S&, const Args&... args) {
  return {};
}

auto s = format("{}"_f, 42);

but it requires non-standard UDL extension and passing each character as a separate template argument.

@vitaut
Copy link
Contributor Author

vitaut commented Apr 14, 2019

With P1221 Parametric Expressions we could have the perfect API:

#include <fmt/format.h>

template <typename... Args>
constexpr bool check_format_string(std::string_view format_str) {
  return fmt::internal::do_check_format_string<
    char, fmt::internal::error_handler, Args...>(format_str);
}

using format(constexpr auto format_str, auto... args) {
  constexpr bool b =
    check_format_string<decltype(args)...>(format_str);
  return fmt::format(format_str, args...);
}

auto s = format("{}", 42); // format string checked at compile time

https://godbolt.org/z/O7rgfF

@vitaut
Copy link
Contributor Author

vitaut commented Apr 14, 2019

The UDL version doesn't need to process character by character, it can create a static constexpr char array which can then be processed like the non-type template parameter version.

Yes, but you also need to know the types of formatting arguments. UDL doesn't have access to that and I wasn't able to return a compile-time string from the UDL without splitting it into chars. This is as far as I got: https://godbolt.org/z/pMQUfP

@foonathan
Copy link
Contributor

Yeah, I deleted my comment when I realized my mistake.

@deni64k
Copy link

deni64k commented Apr 30, 2019

@vitaut Hi Victor,

but it requires non-standard UDL extension and passing each character as a separate template argument.

This syntax is a part of the C++ standard
http://eel.is/c++draft/over.literal#def:literal,operator,template_string
and
https://en.cppreference.com/w/cpp/language/user_literal#Literal_operators

// numeric literal operator template
template <char...> auto operator "" _f();

Good news, I even see it in C++11:

The declaration of a literal operator template shall have an empty parameter-declaration-clause and its template-parameter-list shall have a single template-parameter that is a non-type template parameter pack (14.5.3) with element type char.

A valid point though it requires the type to be char, though C++20 (thankfully) becomes more permissive with string literal operator template where the parameter pack is of a class type.

@foonathan
Copy link
Contributor

No, it isn't. It is only valid for numeric literals, as the comment above the example states.

@deni64k
Copy link

deni64k commented Apr 30, 2019

@foonathan True, sorry for misleading. Either way, hopefully, the committee will do something about it. :)

@vittorioromeo
Copy link

In C++20 we will be able to do this:

// ...
auto s = format<"{}">(42);

which is a pretty horrible API. Need to find a way to pass a compile-time string as an argument to make it usable.

I am not sure why you think that, and intuitively I strongly disagree with you. It makes it clear that the format string is a compile-time parameter, and is a logical counterpart to the version that takes a run-time parameter.

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019

I am not sure why you think that,

From my experience with users. They expect intuitive API format("{}", 42) do the right thing like e.g. in Rust. It's very hard to explain that due to language peculiarities we have to move the format string to template parameters.

Something like format("{}"_s, 42) is better but still not as clean as what we could get if we had parametric expressions or similar functionality.

@vittorioromeo
Copy link

Unless something changed, Rust uses a macro for that:

format!(...)

So the syntax is different there, as well. I think that the template parameter syntax is fine and intuitive, TBH. Even more than a literal, which would require a using namespace anyway

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019

Rust uses a macro

Sure and it is great because even a person not familiar with Rust can immediately understand that it is some sort of a call.

format<"{}">(42) on the other hand is template parameterized (!) on the format string and the rest of parameters are passed separately regardless of whether they are known at compile or run time. We can document such API of course and make everyone deal with it but it will be yet another WTF C++ is complex moment.

What's particularly sad is that C++ now has all this nice constexpr machinery that makes all format string processing look like normal code. You can have a parse_format_string("{}") function that will do everything at compile time and that takes the string as a normal argument rather than being template parameterized on it. But once we want to pass additional parameters it all falls apart.

Hopefully C++23 will have some way to express this in a way that normal users can intuitively understand be it something like Rust macros or some other facility.

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019

@SuperV1234, also keep in mind that you are an expert in C++ so things that are intuitive to you may not be intuitive to vast majority of users.

@vittorioromeo
Copy link

vittorioromeo commented May 8, 2019

even a person not familiar with Rust can immediately understand that it is some sort of a call

Anyone with basic C++ knowledge understands that foo<...>(...) is some sort of call as well. In fact, simple Standard utilities like std::get have that syntax.

Mentioning Rust is contradictory - there is a syntactic marker to show that this is not a regular run-time call, which is exactly what <...> would do:

format("{}", foo); // Run-time call
format!("{}", foo); // Macro - operates on AST

format<"{}">(42) on the other hand is template parameterized (!) on the format string and the rest of parameters are passed separately regardless of whether they are known at compile or run time. We can document such API of course and make everyone deal with it but it will be yet another WTF C++ is complex moment.

This honestly makes no sense to me. Since the dawn of time, any C++ user that has basic knowledge of templates (i.e. everyone that knows how to use vector) knows the following:

  • Template arguments are provided in angle brackets, and are compile-time

  • Function arguments are provided in round parenthesis, and are run-time

The real WTF would be having a call that looks like it's run-time, but it's actually compile-time. People were mesmerized and confused by boost::hana when it came out!


that makes all format string processing look like normal code

Why would you want that? Rust got this right: provide syntactical markers to show what things are. If I have a macro, it should be visible on the call site. Similarly, if I have a compile-time parameter, it should be visible on the call site. The C++ way of passing compile-time parameters is and has always been <...>.

I see no advantage in changing that.


But once we want to pass additional parameters it all falls apart.

Can you elaborate on this? Your example is pretty intuitive to me, following the rules I just mentioned:

//           v in round parenthesis, so this is a run-time parameter
format<"{}">(42)
//     ^ in angle brackets, so this is a compile-time parameter

It is clear for a user that, with the above syntax:

int i = 42;
format<"{}">(i); // This works, because `i` is run-time and I am passing it in round parenthesis

std::string s{"{}"};
format<s>(42); // This can't work, because I cannot pass `s` to a compile-time parameter

Hopefully C++23 will have some way to express this in a way that normal users can intuitively understand

Users intuitively understand that templates & template parameters mean compile-time. Since C++98, if you have

template <int I> void foo(int j);

it is obvious to everyone that I needs to be something known at compile-time, while j can be known at run-time (without getting into any technicality about constant expressions). I don't understand why you want to change that - I must be missing something.

@daltonhildreth
Copy link

@vitaut

Hopefully C++23 will have some way to express this in a way that normal users can intuitively understand be it something like Rust macros or some other facility.

What would be more intuitive than either suggested API?

I think using templates for non-type parameters is a little unintuitive, but so are most things in any programming language. Certainly, It's reasonable if I didn't know as much C++ as I do. Also, I don't understand how this is less intuitive than rust's macros in this case.

Using everything in parentheses has some implicit information to it, which may be more intuitive, but could also seem misleading in some situations I imagine.

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019

Anyone with basic C++ knowledge understands that foo<...>(...) is some sort of call as well.

One has to learn templates to understand how to do such as trivial thing as formatting an integer and it still doesn't explain why pass a string as a template argument. This is exactly what I mean by "WTF C++ is complex". Pretty much any other language has an obvious API for this, even C. And printf not only has an intuitive API but also compile-time checks in most compilers even though hardcoded. This is what we should strive for - safety without compromising simplicity and I believe C++ can do better than C (and Rust for that matter - {fmt} is already more expressive than Rust's formatting facility).

utilities like std::get have that syntax

In case of get it actually makes at least some sense because it operates in the TMP territory. In case of format it makes zero sense because all format string processing is constexpr, no templates are involved. So putting a string into template parameters is just a workaround for inability to handle a constexpr argument which is unfortunate.

there is a syntactic marker to show that this is not a regular run-time call

The marker is a red herring - the API is still obvious for almost anyone without any knowledge of macros or Rust. In the end there is still some formatting code executed at runtime which returns a string and that's all that matters.

Since the dawn of time, any C++ user that has basic knowledge of templates (i.e. everyone that knows how to use vector) knows the following

I don't think this is how templates are understood. Templates are not synonymous with compile-time, they define a family of functions or classes. constexpr is how you do all compile-time processing of format strings in {fmt} and you don't create a family of functions for that - you just write normal code which is the same regardless of whether it's run at compile time or runtime.

Why would you want that?

Because I've been answering user questions for years and as I wrote people very much expect format("{}", 42) to do the correct thing. I'm not looking forward to explaining that because they happen to have a literal string they should pass it as a template parameter. It's already obvious that it is a literal string - you don't need any marker for that. Even printf can figure that out.

I see no advantage in changing that.

Noone is changing that because that's not how things work. constexpr doesn't work like that.

Can you elaborate on this?

parse_format_string("{}") already works at compile time. There is no syntactic markers or anything.

Users intuitively understand that templates & template parameters mean compile-time.

Again, no. That's completely orthogonal concepts and hopefully noone teaches templates like that. Fortunately TMP is dying and we'll see less of this in the future.

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019

What would be more intuitive than either suggested API?

format("{}", 42) should Just Work because both the user and the compiler already know that "{}" is a string literal.

I think using templates for non-type parameters is a little unintuitive

The problem is not with non-type template parameters per se. Those are great and there are some reasonable uses of them such as get as Vittorio pointed out. format is just not one of those for reasons explained above.

@daltonhildreth
Copy link

format("{}", 42) should Just Work because both the user and the compiler already know that "{}" is a string literal.

Thats a fair argument for that API, but my comment was towards your C++23 comment implying there could be something even better introduced by that language.

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019 via email

@mwinterb
Copy link
Contributor

mwinterb commented May 8, 2019

Something that has been lost in this recent discussion (and vitaut's twitter poll): format is the primary API, but format_to and format_to_n both exist in fmtlib and p0645 (other interesting functions in that style are also present only in fmtlib). And format_to_n<"{} {}">(buffer, 15, "hello", "henry"); would be absolutely atrocious.

But I'm also of the opinion that get is tolerable given language limitations, but not at all ideal.

@vitaut
Copy link
Contributor Author

vitaut commented May 8, 2019

@mwinterb, that's a great point.

@bebuch
Copy link

bebuch commented May 13, 2019

If I understand P0732 correctly, then this should be valid C++20, although GCC and clang still implement it incompletely:

#include <string>

// constexpr string type
template <int N>
class fixed_string {
 private:
  char data[N] = {};

 public:
  constexpr fixed_string(const char (&s)[N]) {
    for (int i = 0; i < N; ++i) {
      data[i] = s[i];
    }
  }
};

template <size_t N>
fixed_string(const char (&s)[N]) ->
  fixed_string<N>;

// compile time string type
template <fixed_string>
struct ct_string{};

// user defined literal
template <fixed_string Str>
ct_string<Str> operator"" _cts();

// format overload
template <fixed_string Str, typename ... Args>
std::string format(ct_string<Str>, Args&& ... args) {
  return {};
}

// usage
auto s = format("{}"_cts, 42);

@vittorioromeo
Copy link

vittorioromeo commented May 13, 2019

I think that the literal overload is non-Standard, but I am not 100% sure. Also, to be fair, the "usage" example needs to include a using namespace directive. E.g.

using namespace fmt::format::literals;
fmt::format("{}"_cts, 42);

// vs

fmt::format<"{}">(42);

@vitaut
Copy link
Contributor Author

vitaut commented May 14, 2019

If I understand P0732 correctly, then this should be valid C++20

That is my understanding too.

@bebuch
Copy link

bebuch commented May 15, 2019

I think that the literal overload is non-Standard, but I am not 100% sure. Also, to be fair, the "usage" example needs to include a using namespace directive.

Had forgotten the using, thanks for the addition. ;-)

I checked the last working draft of C++20 (n4810), [lex.ext] (5.13.8)p5 was adopted. The UDL is thus valid C++20.

@vitaut
Copy link
Contributor Author

vitaut commented May 15, 2019

The UDL is thus valid C++20.

Thanks for checking, that's great news.

@vitaut
Copy link
Contributor Author

vitaut commented May 15, 2020

Proof-of-concept implementation is there but needs more work to make it production-ready. Some of the issues are tracked in #1324.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants