May 03, 2021

On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:

>

I plan to add an extension to std.format, namely a new format character with the meaning of producing a source code literal. Or more formally, the following snippet should work for every type this extension will support:

enum a = <something>;
enum b = mixin(format!"%S"(a));

static assert(a == b && is(typeof(a) == typeof(b)));

(Please note, that even for floats a == b should hold for all values, but NaNs; I plan to use RYU for this.)

The big question is now, which character to use. I thought of %S like source code literal. Andrei suggested %D like D literal. Both ideas have the disadvantage of using uppercase letters, which would break the of uppercase letters meaning that the output uses uppercase instead of lowercase (i.e. 1E10 instead of 1e10).

A first idea of a lowercase literal might be %l but this might easily be confused with %I and %1 (both don't exist); and also l is used in C's printf for long which we luckily don't need here. Anyway I fear confusion.

What do you think? Which letter would be best?

Please don't do this. Format characters can be customized. Any character you'd introduce for it either wouldn't work for some types or break those types' formatting. Why not introduce a new function like dlangLiteral that takes the value and returns a string? It can be used in format quite easily (like format("pre %s post", dlangLiteral(<something>));) and is explicit and not a special case at all.

May 04, 2021

On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:

>

Please don't do this. Format characters can be customized. Any character you'd introduce for it either wouldn't work for some types or break those types' formatting.

I think, this doesn't hurt: The call of a toString has precedence for compound types like structs and classes (and in most of these cases it won't be possible to add a generic literal at all, see my post above). So, if you use some of the predefined qualifiers, the customized version will always be used, even if it has a completely different meaning. (Admittedly it might cause some confusion, if the customized versions are not well documented.)

>

Why not introduce a new function like dlangLiteral that takes the value and returns a string? It can be used in format quite easily (like format("pre %s post", dlangLiteral(<something>));) and is explicit and not a special case at all.

In my opinion, the main idea behind this formatting routines is, to have a simple and short way for formatting output. We could use your idea for every other format character too, like: format("%s = %s", character('𝜋'), scientificFloatingPoint(3.14)). We don't do that, because it's more convenient to write format("%c = %e", '𝜋', 3.14).

Furthermore, there are all the parameters, that can be applied: format("%-3c = %+.4e", '𝜋', 3.14) is a simple way to change the formatting. Without it it would become something like format("%s = %s", character!(true, false, false, false, false, false, 3)('𝜋'), scientificFloatingPoint!(false, true, false, false, false, false, FormatSpec.UNSPECIFIED, 4)(3.14)).

An other problem will be, when used with arrays, ranges and the like, e.g. you can do something like format("val = [%(%D,\n %)];", my_array); to get an output with each value on a separate line. Without this literal you would at least need to map my_array using dlangLiteral and in generic code this might even cause more trouble.

May 04, 2021

On Tuesday, 4 May 2021 at 07:54:19 UTC, Berni44 wrote:

>

On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:

>

[...]

I think, this doesn't hurt: The call of a toString has precedence for compound types like structs and classes (and in most of these cases it won't be possible to add a generic literal at all, see my post above). So, if you use some of the predefined qualifiers, the customized version will always be used, even if it has a completely different meaning. (Admittedly it might cause some confusion, if the customized versions are not well documented.)

[...]

Just go with %m if you want lowercase, otherwise you would have to use uppercase specifiers.

May 04, 2021

On Tuesday, 4 May 2021 at 07:54:19 UTC, Berni44 wrote:

>

On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:

>

Please don't do this. Format characters can be customized. Any character you'd introduce for it either wouldn't work for some types or break those types' formatting.

I think, this doesn't hurt: The call of a toString has precedence for compound types like structs and classes (and in most of these cases it won't be possible to add a generic literal at all, see my post above). So, if you use some of the predefined qualifiers, the customized version will always be used, even if it has a completely different meaning. (Admittedly it might cause some confusion, if the customized versions are not well documented.)

What you wrote in parentheses is exactly the problem I have with this. Generic code cannot use it because user defined types regularly hook the format. If you use %D format, but the type does not support it (say std.typecons.Tuple), it will throw a FormatException.
So you're stuck between a rock and a hard place: Give %D preference over custom format specifiers rendering those that use %D invalid or let %D do its custom stuff if potentially supported rendering %D useless in generic code where most of its use-cases would lie.

> >

Why not introduce a new function like dlangLiteral that takes the value and returns a string? It can be used in format quite easily (like format("pre %s post", dlangLiteral(<something>));) and is explicit and not a special case at all.

In my opinion, the main idea behind this formatting routines is, to have a simple and short way for formatting output. We could use your idea for every other format character too, like: format("%s = %s", character('𝜋'), scientificFloatingPoint(3.14)). We don't do that, because it's more convenient to write format("%c = %e", '𝜋', 3.14).

Yes, you could. But you could use format specifiers like %-3.8f without losses to get to the same result. And that's the difference between introducing a format specifier character that should have generic meaning and introducing, well, anything else. There was no problem introducing separators like %,3d and neither would there be a problem introducing %y for int or double (whatever it does), or, for a concrete example, %S for bool to return TRUE instead of true.

The problem is introducing generic format specifier characters.

>

An other problem will be, when used with arrays, ranges and the like, e.g. you can do something like format("val = [%(%D,\n %)];", my_array); to get an output with each value on a separate line. Without this literal you would at least need to map my_array using dlangLiteral and in generic code this might even cause more trouble.

If you want that, you need to allow something that's currently illegal. As a comparison, %.*f could be introduced if * precision weren't already a thing (compare with %,3d) because in any reasonable implementation, * instead of precision would be an error. What we could do is special casing %$ to mean what you want. Currently, no matter what type you're formatting, %$ is an error in FormatSpec. You can give it semantics, no problem, including one that ignores custom formatting. Even better, %$ looks like it's a special case and not some odd-but-legal custom specifier.

Changing the meaning of %D begs for trouble.

May 05, 2021

On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:

>

So you're stuck between a rock and a hard place: Give %D preference over custom format specifiers rendering those that use %D invalid or let %D do its custom stuff if potentially supported rendering %D useless in generic code where most of its use-cases would lie.

I fear, I can't follow you. Seems like I don't get your point. Maybe you can give an example?

> >

In my opinion, the main idea behind this formatting routines is, to have a simple and short way for formatting output. We could use your idea for every other format character too, like: format("%s = %s", character('𝜋'), scientificFloatingPoint(3.14)). We don't do that, because it's more convenient to write format("%c = %e", '𝜋', 3.14).

Yes, you could. But you could use format specifiers like %-3.8f without losses to get to the same result.

??? Again I'm stuck. What has %-3.8f with what I wrote above to do?

>

And that's the difference between introducing a format specifier character that should have generic meaning and introducing, well, anything else. There was no problem introducing separators like %,3d and neither would there be a problem introducing %y for int or double (whatever it does), or, for a concrete example, %S for bool to return TRUE instead of true.

The problem is introducing generic format specifier characters.

What is the difference between "generic" (which as far as I understand you oppose) and adding %D for bool, integers, floats, characters, strings, arrays and AAs (which you sound as being OK with, and which is, what I plan to do)?

>

What we could do is special casing %$ to mean what you want. Currently, no matter what type you're formatting, %$ is an error in FormatSpec. You can give it semantics, no problem, including one that ignores custom formatting. Even better, %$ looks like it's a special case and not some odd-but-legal custom specifier.

Using $ would cause real troubles, because it's already used for positional arguments. What would format("%1$d", 'a'); be supposed to produce? 'a'd or 97?

>

Changing the meaning of %D begs for trouble.

%D has currently no meaning, so we cannot change it; we can just add it.

I hope, we can figure this out somehow - I sense, that you've got an important point, but I don't understand it. Seems like we are talking past each other.

May 05, 2021

On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:

>

What is the difference between "generic" (which as far as I understand you oppose) and adding %D for bool, integers, floats, characters, strings, arrays and AAs (which you sound as being OK with, and which is, what I plan to do)?

[...]

>

%D has currently no meaning, so we cannot change it; we can just add it.

%D does currently have a meaning, though. It means "custom format specifier."

Here's the scenario that could potentially lead to trouble:

  1. Some existing library uses %D as a custom format specifier in their toString methods, with a meaning other than "format as D source code."

  2. %D is added to std.format with the meaning "format as D source code," and a default implementation for types that do not have custom toString methods.

  3. A new library is written that takes advantage of (2) and uses %D in generic code to format arbitrary values for the purpose of code generation.

  4. Someone uses the library from (1) and the library from (3) in the same project, and library (3) ends up producing garbage, because library (1)'s %D doesn't work the way library (3) expects it to.

The "correct" place to fix this is in library (1), but doing so would require a breaking change. In practice, this means that libraries like the one in (3) will never be able to completely rely on the new standard for %D, and will always have to include some kind of workaround in case they are used with types like the ones in library (1).

May 05, 2021

Discussion

On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:

>

On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:

>

So you're stuck between a rock and a hard place: Give %D preference over custom format specifiers rendering those that use %D invalid or let %D do its custom stuff if potentially supported rendering %D useless in generic code where most of its use-cases would lie.

I fear, I can't follow you. Seems like I don't get your point. Maybe you can give an example?

I'm speaking of aggregate types (structs, classes, etc.) that implement toString that takes a FormatSpec parameter alongside the sink to describe the format according to which it should be formatted. An example is std.typecons.Tuple which apart from %s accepts %(...%) and %(...%|...%). If you try to format it with %D, it throws a FormatException. But like any aggregate type, it could start accepting %D tomorrow.

The new format implementation could do three things when encountering %D for formatting an object of a type with custom formatting:

  1. Because it accepts custom formatting, use it, even if it fails (throws FormatException).
  2. Because it accepts custom formatting try it. If it fails (i.e. throws FormatException), fall back to non-custom %D behavior. (If it succeeds, use the successful result.)
  3. Ignore the custom formatting because %D is special.

None of these solutions is great.

  1. means %D cannot be relied upon in generic code, i.e. where the type of what you're formatting isn't up to you but someone else. Relied upon means in the way you intend %D to be used: A compiler-readable representation of the object.
  2. It could fail in other ways. (Still the best.)
  3. Breaks code, at least theoretically. Also, even if today no one actually uses %D, it might be the perfect match for a future aggregate type, but you blocked it.
> > >

In my opinion, the main idea behind this formatting routines is, to have a simple and short way for formatting output. We could use your idea for every other format character too, like: format("%s = %s", character('𝜋'), scientificFloatingPoint(3.14)). We don't do that, because it's more convenient to write format("%c = %e", '𝜋', 3.14).

Yes, you could. But you could use format specifiers like %-3.8f without losses to get to the same result.

??? Again I'm stuck. What has %-3.8f with what I wrote above to do?

Er, you started with scientific notation stuff. My point is that introducing new constructs in the format specification such as width and precision is would not be an issue if it weren't there already, but introducing a format specification character with special meaning is.

> >

And that's the difference between introducing a format specifier character that should have generic meaning and introducing, well, anything else. There was no problem introducing separators like %,3d and neither would there be a problem introducing %y for int or double (whatever it does), or, for a concrete example, %S for bool to return TRUE instead of true.

The problem is introducing generic format specifier characters.

What is the difference between "generic" (which as far as I understand you oppose) and adding %D for bool, integers, floats, characters, strings, arrays and AAs (which you sound as being OK with, and which is, what I plan to do)?

Because %D for bool, integers (note that according to Walter, bool is an integer type), floats, arrays, and AAs is nothing different from %s. The only part where you'd need something different than %s is characters, strings. That would be handy to have, I must admit. You can mimic it using arrays tho:

auto str = format("prefix %s %(%s%) %s postfix", "before", [ "a\nbc" ], "after");
assert(str == `prefix before "a\nbc" after postfix`);

And it's almost perfect! It works for character types, numeric types, arrays, and AAs, too. Only for user-defined types, you have no control, because it does what the user-defined toString implementation defines %s to do. In fact, %s might not even work with a user-defined type! It could throw an exception (a FormatException if it's reasonable).

The only thing it doesn't do is respecting wstring and dstring literals. I cannot really estimate if that would be a problem, but I guess for the most part, it wouldn't.

> >

What we could do is special casing %$ to mean what you want. Currently, no matter what type you're formatting, %$ is an error in FormatSpec. You can give it semantics, no problem, including one that ignores custom formatting. Even better, %$ looks like it's a special case and not some odd-but-legal custom specifier.

Using $ would cause real troubles, because it's already used for positional arguments. What would format("%1$d", 'a'); be supposed to produce? 'a'd or 97?

The $ only has that meaning if it's preceded by a number. %N$…c has a meaning for N a number and c a character possibly preceded by other formatting stuff. But %$ is undefined in the sense that it is an error to use it.

> >

Changing the meaning of %D begs for trouble.

%D has currently no meaning, so we cannot change it; we can just add it.

%D potentially has a meaning for existing (or future) user-defined types. On the other hand, %$ has not, because it's not up to a user-defined type to define its meaning but to format (FormatSpec to be precise) because currently, FormatSpec does not support %$ to begin with.

>

I hope, we can figure this out somehow - I sense, that you've got an important point, but I don't understand it. Seems like we are talking past each other.

I guess you thought primarily about the built-in types while I primarily thought about user-defined types. I'm happy to clarify.

Implementation

Now, let's talk about the implementation. It's far easier to talk about that in terms of a function. Let's call it unMixin because the goal is that mixin(unMixin(obj)) results in obj or a copy of obj. On the other hand, we cannot expect unMixin(mixin(str)) to return str because str could contain unnecessary information and even if it doesn't, it can contain context-dependent information that unMixin cannot generally retrieve.

Simplest example: If unMixin(1) returns "1", we're good for 1. If it returns "cast(int) 1", we're also good.

May 05, 2021

On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:

>

Implementation

Now, let's talk about the implementation. It's far easier to talk about that in terms of a function. Let's call it unMixin because the goal is that mixin(unMixin(obj)) results in obj or a copy of obj. On the other hand, we cannot expect unMixin(mixin(str)) to return str because str could contain unnecessary information and even if it doesn't, it can contain context-dependent information that unMixin cannot generally retrieve.

Simplest example: If unMixin(1) returns "1", we're good for 1. If it returns "cast(int) 1", we're also good.

I've done some experiments and the results are mixed.

The easiest by far is typeof(null). For scalar types and strings, the aforementioned %(%s%) can be used.

Pointers and slices aren't too hard either.

For structs without a constructor, unMixin is actually easy; if it has a constructor, the object cannot be described by a constructor call since who knows what the constructor does and maybe there isn't even a simple constructor call that will result in the given object. It can be done, but it's ugly and hacky.

Because unions can have sub-structs and stuff, I gave up on them.

I have not too much experience with D's classes, but from my estimation, it cannot be done. It looks like you need typeid at compile-time (at CTFE to be precise) which isn't available.

My take on it so far: https://run.dlang.io/gist/c98ef765cb8921595d5e41fc11c89ca7?args=-unittest%20-main

May 06, 2021

On Wednesday, 5 May 2021 at 17:02:42 UTC, Paul Backus wrote:

>

Here's the scenario that could potentially lead to trouble:

  1. Some existing library uses %D as a custom format specifier in their toString methods, with a meaning other than "format as D source code."

  2. %D is added to std.format with the meaning "format as D source code," and a default implementation for types that do not have custom toString methods.

  3. A new library is written that takes advantage of (2) and uses %D in generic code to format arbitrary values for the purpose of code generation.

  4. Someone uses the library from (1) and the library from (3) in the same project, and library (3) ends up producing garbage, because library (1)'s %D doesn't work the way library (3) expects it to.

First of all: Thanks for clarifying. I think, I understand the problem now.

>

The "correct" place to fix this is in library (1), but doing so would require a breaking change. In practice, this means that libraries like the one in (3) will never be able to completely rely on the new standard for %D, and will always have to include some kind of workaround in case they are used with types like the ones in library (1).

In my opinion, the error is in (3): The new library assumes, that %D can be used with every type (and will always have the meaning "D literal"), which in my opinion is wrong:

It does not even hold for established characters, for example take %b: For bools, integers, characters and enums if their base type is one of the first three, this has currently the meaning "format as unsigned binary number". It currently cannot be used for anything else where std.format is responsible for.

But of course it can be used in any custom type (be it one of phobos or an external library or what ever). And no one will stop anyone from using it in a completely different way, e.g. as bitmap of the type or whatever.

So in my opinion in the above scenario the library in (3) should clearly state in its docs, that it can only be used with code that uses %D in the sense of being a "D literal". And the library in (1) should clearly state in its docs, what %D means, if it has a meaning. And with that it should be clear, that you cannot use (1) and (3) together in one project, at least not without adding some clue.

Now I think, I can go back to this:

> >

%D has currently no meaning, so we cannot change it; we can just add it.

%D does currently have a meaning, though. It means "custom format specifier."

But doesn't that apply to every format specifier?

May 06, 2021

On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:

>

I guess you thought primarily about the built-in types while I primarily thought about user-defined types. I'm happy to clarify.

Yes, thank you. That already helped a lot, although I fear, we still don't agree on most of the points with regard to the content... :-s

>

The new format implementation could do three things when encountering %D for formatting an object of a type with custom formatting:

For me, this seems to be the wrong way to think about it. format doesn't encounter specifiers, but objects (in the wider sense). And in case of structs, classes and so on it delegates the handling of formatting to them, without even looking at the specifier (with the exception of %s which sometimes plays a special role). It's then up to that struct or class to define the meaning of %D for that specific struct or class.

>

note that according to Walter, bool is an integer type

Yeah, but std.format handles them in a special formatValueImpl, that's why I treat them separately.

>

Because %D for bool, integers ([...]), floats, arrays, and AAs is nothing different from %s.

That's not true: bytes need a cast, longs a trailing 'L', like reals, floating point numbers are truncated with %s and don't provide the correct value and so on. There are a lot of subtle differences and that's why I think it would be a good thing to have this new format character.

>

The only part where you'd need something different than %s is characters, strings. That would be handy to have, I must admit. You can mimic it using arrays tho

That was actually the starting point for me that led me to a desire for having %D: %s for arrays tries to mimic the intended result of %D (but fails at several places to do so correctly) and therefore treats characters and strings special. This led to the abuse of the --flag (in "%-(...%)) which now causes a lot of problems. I thought long about how this could be fixed: With %D available, there would be a smoother transition be possible, because people using %s inside of %(...%) could just replace it with %D to get the current result and that eventually will make it possible to give %s (and the --flag) its correct meaning back. (Of course this still needs deprecation cycles and maybe a preview switch or what else - it's still not easy.)

>

And it's almost perfect! It works for character types, numeric types, arrays, and AAs, too.

As I wrote above: That might look so at first sight, but it isn't the case.

>

The $ only has that meaning if it's preceded by a number. %N$…c has a meaning for N a number and c a character possibly preceded by other formatting stuff. But %$ is undefined in the sense that it is an error to use it.

But people will start to use it with width and other parameters and will report issues. Let along, that it will complicate the format spec parser significantly and thus might even introduce more bugs. I'm sorry, but with %$ you'll opening the box of pandora.

>

Now, let's talk about the implementation.

Sorry, but as long as we do not even agree on the goal, this is not useful.