Wanted: Format character for source code literal (page 3)

May 07, 2021

Re: Wanted: Format character for source code literal

Posted by Q. Schroll
in reply to Berni44

Permalink

Q. Schroll

Posted in reply to Berni44

Permalink

On Thursday, 6 May 2021 at 08:49:16 UTC, Berni44 wrote:

On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:

The new format implementation could do three things when encountering %D for formatting an object of a type with custom formatting:

For me, this seems to be the wrong way to think about it. format doesn't encounter specifiers, but objects (in the wider sense). And in case of structs, classes and so on it delegates the handling of formatting to them, without even looking at the specifier (with the exception of %s which sometimes plays a special role).

The role of %s is special, but not too special either. It just gives a best effort result where other formats would just fail. The task to return a string representation that can be interpreted back is nothing to be delegated to a user-defined routine.

It's then up to that struct or class to define the meaning of %D for that specific struct or class.

This makes %D unreliable for meta-programming. And this is the problem I have with this, because creating a compiler-readable string from an object is a meta-programming tool. I have no idea what else you'd even do with it.

Here's the showstopper: Adding a toString that accepts format specifiers becomes a potentially breaking change as it will change the meaning of %D silently.

> >

Because %D for bool, integers ([...]), floats, arrays, and AAs is nothing different from %s.

That's not true: bytes need a cast, longs a trailing 'L',

It depends what you want to do with it. If you want the immediate type of the literal to be what you plugged in, then yes. If being equal suffices, "1" and "true" are the same.

like reals, floating point numbers are truncated with %s and don't provide the correct value

That, on the other hand, is a problem. I don't know how big that problem practically is because real cannot even be formatted at CTFE and double and float aren't that common of things at compile-time. I guess the only sane result for floating point values is %a with sufficient digits anyways and that is largely apart from %s even if you add a gigantic precision.

It's a breaking change fixing %s for floating point values in the sense that the representation consists of enough decimals to accurately represent the number.

and so on. There are a lot of subtle differences

The problem of strings and chars is obvious, the case for exact types is, too. Floating point types didn't cross my mind, but please elaborate, what else is it? I'm honestly interested.

If %(%s%) does not give you proper char or string, I'd consider it a bug.

and that's why I think it would be a good thing to have this new format character.

I agree with you that a new format is necessary to achieve this if done with a format character to begin with. I do question whether format characters are the right approach. To me, this looks more like a code generation tool than value formatting.

> >

The only part where you'd need something different than %s is characters, strings. That would be handy to have, I must admit. You can mimic it using arrays tho

That was actually the starting point for me that led me to a desire for having %D: %s for arrays tries to mimic the intended result of %D (but fails at several places to do so correctly) and therefore treats characters and strings special. This led to the abuse of the --flag (in "%-(...%)) which now causes a lot of problems. I thought long about how this could be fixed: With %D available, there would be a smoother transition be possible, because people using %s inside of %(...%) could just replace it with %D to get the current result and that eventually will make it possible to give %s (and the --flag) its correct meaning back. (Of course this still needs deprecation cycles and maybe a preview switch or what else - it's still not easy.)

The %-(...%) a hack, but it can be questioned whether removing it is even worth the trouble. It just breaks things. The minus has otherwise no meaning for arrays. It's just weird.

> >

And it's almost perfect! It works for character types, numeric types, arrays, and AAs, too.

As I wrote above: That might look so at first sight, but it isn't the case.

Right. I was a little enthusiastic about it.

> >

The $ only has that meaning if it's preceded by a number. %N$…c has a meaning for N a number and c a character possibly preceded by other formatting stuff. But %$ is undefined in the sense that it is an error to use it.

But people will start to use it with width and other parameters and will report issues. Let along, that it will complicate the format spec parser significantly and thus might even introduce more bugs. I'm sorry, but with %$ you'll opening the box of pandora.

It requires a single check: Is the % character followed by $? The whole point of %$ would be that it is not customizable. You cannot add any specification. If something comes before $, it isn't %$, and if something comes behind, it's not part of the format specifier, but just text.

I've been thinking about this a little. What is your goal? Maybe we're talking at cross purposes. I guess you want a format specifier that formats any built-in type in a way that represents the object precisely. In a sense, you want a good %s and not a not-really-the-best-effort %s. My understanding was you want to represent objects as strings in a way that can be used by the compiler to reconstruct the object, and for what else than meta-programming would one do that? It's in a sense trivial for built-in types because it's a finite set of types.

Thinking about it, you can easily wrap objects in a struct and make it do The Right Thing™. It doesn't complicate the format implementation.

On Friday, 7 May 2021 at 23:51:13 UTC, Q. Schroll wrote:

> >

looking at the specifier (with the exception of %s which sometimes plays a special role).

The role of %s is special, but not too special either. It just gives a best effort result where other formats would just fail.

That's not, what I meant. What I meant was, that some custom toString versions are only called, when %s is used (all, that do not take the format string or a FormatSpec as parameter).

The task to return a string representation that can be interpreted back is nothing to be delegated to a user-defined routine.

Contrary, it's the only place where this can be done. The routines in std.format cannot know, what these objects need to be constructed. Maybe it is not even possible at all.

That, on the other hand, is a problem. I don't know how big that problem practically is because real cannot even be formatted at CTFE

It can. I wrote that code. It's part of master since about two weeks.

I guess the only sane result for floating point values is %a with sufficient digits anyways

That's one reason, why I want to add %D: For floating point values I'd like to implement RYU (or something similar), which guarantees to emit a value, that produces exactly the same result, when read in.

It's a breaking change fixing %s for floating point values in the sense that the representation consists of enough decimals to accurately represent the number.

And that's the reason, why I want to add %D and not to change %s.

If %(%s%) does not give you proper char or string, I'd consider it a bug.

Please define "proper char or string" first.

Sounds like you are thinking about a serialization tool or something like this. That's not what I plan. I think just about formatting values.

The %-(...%) a hack, but it can be questioned whether removing it is even worth the trouble.

If I were the only one to decide, I would remove it immediately, because in my eyes it is a bug. An yes, it's worth the trouble.

It just breaks things.

That's why I haven't filed a PR yet. But I'm looking forward to a possibility to change this. And again here %D will help.

The minus has otherwise no meaning for arrays.

It has: Left justification instead of right justification. It is just not/buggy implemented.

It requires a single check: Is the % character followed by $? The whole point of %$ would be that it is not customizable.

I want to have it customizable. For example I'd like to have output, that vertically aligns.

Yes, that could be said so. I don't know the whole history of %s, but I think, its first meaning was "string" and this later was misunderstood to produce something, that is similar to a literal. This makes %s a mix. %D would take one of these meanings away from %s, giving it back its original meaning.

My understanding was you want to represent objects as strings in a way that can be used by the compiler to reconstruct the object,

Well, that's what a source code literal is supposed to be, isn't it? But of course it is not limited to this use. People might use it to automatically generate asserts for unittests. Or to compare output of different runs, where you can be sure, that the differences are not due to rounding effects or such things, but real differences. Or whatever they want to do with it.

Thinking about it, you can easily wrap objects in a struct and make it do The Right Thing™. It doesn't complicate the format implementation.

Of course there are always workarounds. With that argument you can question every function in phobos...

Forums