Thread overview
Escaping control in formatting
Apr 23, 2012
Denis Shelomovskij
Apr 23, 2012
Dmitry Olshansky
Apr 23, 2012
kenji hara
Apr 23, 2012
Denis Shelomovskij
Apr 23, 2012
kenji hara
Apr 23, 2012
Denis Shelomovskij
Apr 23, 2012
Denis Shelomovskij
Apr 24, 2012
kenji hara
Apr 24, 2012
Denis Shelomovskij
April 23, 2012
I've never used new excellent range formatting syntax by Kenji Hara until now. And I've met with difficulties, because "%(%(%c%), %)" is the most common format for string array for me and it neither obvious nor elegant. It occurs that "%c" disables character escaping. What the hell? Why? Not obvious at all.

So I think it will be good to add 'Escaping' part after 'Precision' in format specifications:

Escaping:
  empty
  !-
  !+
  !'
  !"
  !?'
  !?"
  !?!

Escaping affect formatting depending on the specifier as follows.

Escaping    Semantics
  !-      disable escaping, for a range it also disables [,]
  !+      enable escaping using single quotes for chars and double quotes for strings
  !'      enable escaping using single quotes
  !"      enable escaping using double quotes
  !?'     like !' but without adding the quotes and [,] for a range
  !?"     like !" but without adding the quotes and [,] for a range
  !?!     enable escaping, both single and double quotes will be escaped without adding any quotes and [,] for a range

Escaping is enabled by default only for associative arrays, ranges (not strings), user-defined types, and all its sub-elements.

I'd like to remove "%c"'s ability to magically disable escaping and it looks possible until it is documented.

Look at the example:
---
import std.stdio;

void main() {
    writeln("    char");
    char c = '\'';
    writefln("unescaped: %s."  ,   c  );
    writefln(`escaped+': %(%).`, [ c ]); // proposal: %!+s or %!'s
    writefln(`escaped+": %(%).`, [[c]]); // proposal: %!"s
    writeln (`  escaped: \t.`);          // proposal: %!?'s
    writeln();
    writeln("    string");
    string s = "a\tb";
    writefln("unescaped: %s."  ,  s );
    writefln(`escaped+": %(%).`, [s]); // proposal: %!+s or %!"s
    writeln (`  escaped: a\tb.`);      // proposal: %!?"s
    writeln();
    writeln("    strings");
    string[] ss = ["a\tb", "cd"];
    writefln("unescaped: %(%(%c%)%).", ss); // proposal: %!-s
    writefln(`escaped+": %(%).`      , ss);
    writeln (`  escaped: a\tbcd.`    , ss); // proposal: %!?"s
}
---

If it will be accepted, I can volunteer to try to implement it. If not, escaping should be at least documented (and do not forget about "%c"'s magic!).

Any thoughts?

P.S.
If it has already been discussed, please give me a link.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 23, 2012
On 23.04.2012 16:36, Denis Shelomovskij wrote:
> I've never used new excellent range formatting syntax by Kenji Hara
> until now. And I've met with difficulties, because "%(%(%c%), %)" is the
> most common format for string array for me and it neither obvious nor
> elegant. It occurs that "%c" disables character escaping. What the hell?
> Why? Not obvious at all.

Does %(%s, %)  not work?

[snip]

-- 
Dmitry Olshansky
April 23, 2012
2012$BG/(B4$B7n(B23$BF|(B21:36 Denis Shelomovskij <verylonglogin.reg@gmail.com>:
> I've never used new excellent range formatting syntax by Kenji Hara until now. And I've met with difficulties, because "%(%(%c%), %)" is the most common format for string array for me and it neither obvious nor elegant. It occurs that "%c" disables character escaping. What the hell? Why? Not obvious at all.
>
> So I think it will be good to add 'Escaping' part after 'Precision' in format specifications:
>
> Escaping:
>  empty
>  !-
>  !+
>  !'
>  !"
>  !?'
>  !?"
>  !?!
>
> Escaping affect formatting depending on the specifier as follows.
>
> Escaping    Semantics
>  !-      disable escaping, for a range it also disables [,]
>  !+      enable escaping using single quotes for chars and double quotes for
> strings
>  !'      enable escaping using single quotes
>  !"      enable escaping using double quotes
>  !?'     like !' but without adding the quotes and [,] for a range
>  !?"     like !" but without adding the quotes and [,] for a range
>  !?!     enable escaping, both single and double quotes will be escaped
> without adding any quotes and [,] for a range
>
> Escaping is enabled by default only for associative arrays, ranges (not strings), user-defined types, and all its sub-elements.
>
> I'd like to remove "%c"'s ability to magically disable escaping and it looks possible until it is documented.
>
> Look at the example:
> ---
> import std.stdio;
>
> void main() {
>    writeln("    char");
>    char c = '\'';
>    writefln("unescaped: %s."  ,   c  );
>    writefln(`escaped+': %(%).`, [ c ]); // proposal: %!+s or %!'s
>    writefln(`escaped+": %(%).`, [[c]]); // proposal: %!"s
>    writeln (`  escaped: \t.`);          // proposal: %!?'s
>    writeln();
>    writeln("    string");
>    string s = "a\tb";
>    writefln("unescaped: %s."  ,  s );
>    writefln(`escaped+": %(%).`, [s]); // proposal: %!+s or %!"s
>    writeln (`  escaped: a\tb.`);      // proposal: %!?"s
>    writeln();
>    writeln("    strings");
>    string[] ss = ["a\tb", "cd"];
>    writefln("unescaped: %(%(%c%)%).", ss); // proposal: %!-s
>    writefln(`escaped+": %(%).`      , ss);
>    writeln (`  escaped: a\tbcd.`    , ss); // proposal: %!?"s
> }
> ---
>
> If it will be accepted, I can volunteer to try to implement it. If not, escaping should be at least documented (and do not forget about "%c"'s magic!).
>
> Any thoughts?

Please give us use cases.
I cannot imagine why you want to change/remove quotations but keep
escaped contents.

> P.S.
> If it has already been discussed, please give me a link.

As far as I know, there is not yet discussions.

Kenji Hara
April 23, 2012
23.04.2012 18:54, kenji hara написал:
> Please give us use cases. I cannot imagine why you want to
> change/remove quotations but keep escaped contents.

Sorry, I should mention that !' and !" are optional and aren't commonly
used, and all !?* are very optional and are here just for completeness
(IMHO).

An example is generating a complicated string for C/C++:
---
myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`,
                   str1, str2, str3)
---

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 23, 2012
2012$BG/(B4$B7n(B24$BF|(B1:14 Denis Shelomovskij <verylonglogin.reg@gmail.com>:
> 23.04.2012 18:54, kenji hara $B'_'Q'a'Z'c'Q'](B:
>
>> Please give us use cases. I cannot imagine why you want to change/remove quotations but keep escaped contents.
>
>
> Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO).
>
> An example is generating a complicated string for C/C++:
> ---
> myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`,
>                   str1, str2, str3)
> ---
>
>
> --
> $B'%'V'_'Z'c(B $B'#(B. $B':'V']'`'^'`'S'c'\'Z'[(B
> Denis V. Shelomovskij

During my improvements of std.format module, I have decided a design. If you format some values with a format specifier, you should unformat the output with same format specifier.

Example:
    import std.format, std.array;

    auto aa = [1:"hello", 2:"world"];
    auto writer = appender!string();
    formattedWrite(writer, "%s", aa);

    aa = null;

    auto output = writer.data;
    formattedRead(output, "%s", &aa);  // same format specifier
    assert(aa == [1:"hello", 2:"world"]);

More details:
    https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264

I call this "reflective formatting", and it supports simple text based
serialization and de-serialization.
Automatic quotation/escaping for nested elements is necessary for the feature.

But your proposal will break this design very easy, and it is impossible to unformat the outputs reflectively.

For these reasons, your suggestion is hard to accept.

Kenji Hara
April 23, 2012
23.04.2012 21:15, kenji hara написал:
> 2012年4月24日1:14 Denis Shelomovskij<verylonglogin.reg@gmail.com>:
>> 23.04.2012 18:54, kenji hara написал:
>>
>>> Please give us use cases. I cannot imagine why you want to
>>> change/remove quotations but keep escaped contents.
>>
>>
>> Sorry, I should mention that !' and !" are optional and aren't commonly
>> used, and all !?* are very optional and are here just for completeness
>> (IMHO).
>>
>> An example is generating a complicated string for C/C++:
>> ---
>> myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`,
>>                    str1, str2, str3)
>> ---
>>
>>
>> --
>> Денис В. Шеломовский
>> Denis V. Shelomovskij
>
> During my improvements of std.format module, I have decided a design.
> If you format some values with a format specifier, you should unformat
> the output with same format specifier.
>
> Example:
>      import std.format, std.array;
>
>      auto aa = [1:"hello", 2:"world"];
>      auto writer = appender!string();
>      formattedWrite(writer, "%s", aa);
>
>      aa = null;
>
>      auto output = writer.data;
>      formattedRead(output, "%s",&aa);  // same format specifier
>      assert(aa == [1:"hello", 2:"world"]);
>
> More details:
>      https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264
>
> I call this "reflective formatting", and it supports simple text based
> serialization and de-serialization.
> Automatic quotation/escaping for nested elements is necessary for the feature.
>
> But your proposal will break this design very easy, and it is
> impossible to unformat the outputs reflectively.
>
> For these reasons, your suggestion is hard to accept.
>
> Kenji Hara

Is there sum misunderstanding?

Reflective formatting is good! But it isn't what you always want. It is needed mostly for debug purposes. But debugging is one of two usings of formatting, the second one is just writing something somewhere.

There are already some non-reflective constructs (like "%(%(%c%), %)" for a range and "X%sY%sZ" for strings) and I just propose adding more comfortable ones because every second time I use formatting I use it for writing (I mean not for debugging).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 23, 2012
23.04.2012 21:49, Denis Shelomovskij написал:
> 23.04.2012 21:15, kenji hara написал:
>> 2012年4月24日1:14 Denis Shelomovskij<verylonglogin.reg@gmail.com>:
>>> 23.04.2012 18:54, kenji hara написал:
>>>
>>>> Please give us use cases. I cannot imagine why you want to
>>>> change/remove quotations but keep escaped contents.
>>>
>>>
>>> Sorry, I should mention that !' and !" are optional and aren't commonly
>>> used, and all !?* are very optional and are here just for completeness
>>> (IMHO).
>>>
>>> An example is generating a complicated string for C/C++:
>>> ---
>>> myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`,
>>> str1, str2, str3)
>>> ---
>>>
>>>
>>> --
>>> Денис В. Шеломовский
>>> Denis V. Shelomovskij
>>
>> During my improvements of std.format module, I have decided a design.
>> If you format some values with a format specifier, you should unformat
>> the output with same format specifier.
>>
>> Example:
>> import std.format, std.array;
>>
>> auto aa = [1:"hello", 2:"world"];
>> auto writer = appender!string();
>> formattedWrite(writer, "%s", aa);
>>
>> aa = null;
>>
>> auto output = writer.data;
>> formattedRead(output, "%s",&aa); // same format specifier
>> assert(aa == [1:"hello", 2:"world"]);
>>
>> More details:
>> https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264
>>
>>
>> I call this "reflective formatting", and it supports simple text based
>> serialization and de-serialization.
>> Automatic quotation/escaping for nested elements is necessary for the
>> feature.
>>
>> But your proposal will break this design very easy, and it is
>> impossible to unformat the outputs reflectively.
>>
>> For these reasons, your suggestion is hard to accept.
>>
>> Kenji Hara
>
> Is there sum misunderstanding?
>
> Reflective formatting is good! But it isn't what you always want. It is
> needed mostly for debug purposes. But debugging is one of two usings of
> formatting, the second one is just writing something somewhere.
>
> There are already some non-reflective constructs (like "%(%(%c%), %)"
> for a range and "X%sY%sZ" for strings) and I just propose adding more
> comfortable ones because every second time I use formatting I use it for
> writing (I mean not for debugging).
>

Completely forgot. %!+s in my proposal is exactly for reflective formatting (e.g. "X%!+sY%!+sZ" in reflective for strings).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 24, 2012
2012$BG/(B4$B7n(B24$BF|(B2:49 Denis Shelomovskij <verylonglogin.reg@gmail.com>:
> 23.04.2012 21:15, kenji hara $B'_'Q'a'Z'c'Q'](B:
>>
>> 2012$BG/(B4$B7n(B24$BF|(B1:14 Denis Shelomovskij<verylonglogin.reg@gmail.com>:
>>>
>>> 23.04.2012 18:54, kenji hara $B'_'Q'a'Z'c'Q'](B:
>>>
>>>
>>>> Please give us use cases. I cannot imagine why you want to change/remove quotations but keep escaped contents.
>>>
>>>
>>>
>>> Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO).
>>>
>>> An example is generating a complicated string for C/C++:
>>> ---
>>> myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`,
>>>                   str1, str2, str3)
>>> ---
>>>
>>>
>>> --
>>> $B'%'V'_'Z'c(B $B'#(B. $B':'V']'`'^'`'S'c'\'Z'[(B
>>> Denis V. Shelomovskij
>>
>>
>> During my improvements of std.format module, I have decided a design. If you format some values with a format specifier, you should unformat the output with same format specifier.
>>
>> Example:
>>     import std.format, std.array;
>>
>>     auto aa = [1:"hello", 2:"world"];
>>     auto writer = appender!string();
>>     formattedWrite(writer, "%s", aa);
>>
>>     aa = null;
>>
>>     auto output = writer.data;
>>     formattedRead(output, "%s",&aa);  // same format specifier
>>
>>     assert(aa == [1:"hello", 2:"world"]);
>>
>> More details:
>>
>> https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264
>>
>> I call this "reflective formatting", and it supports simple text based
>> serialization and de-serialization.
>> Automatic quotation/escaping for nested elements is necessary for the
>> feature.
>>
>> But your proposal will break this design very easy, and it is impossible to unformat the outputs reflectively.
>>
>> For these reasons, your suggestion is hard to accept.
>>
>> Kenji Hara
>
>
> Is there sum misunderstanding?
>
> Reflective formatting is good! But it isn't what you always want. It is needed mostly for debug purposes. But debugging is one of two usings of formatting, the second one is just writing something somewhere.
>
> There are already some non-reflective constructs (like "%(%(%c%), %)" for a
> range and "X%sY%sZ" for strings) and I just propose adding more comfortable
> ones because every second time I use formatting I use it for writing (I mean
> not for debugging).
>
>
> --
> $B'%'V'_'Z'c(B $B'#(B. $B':'V']'`'^'`'S'c'\'Z'[(B
> Denis V. Shelomovskij

My concern is that the proposal is much complicated and less useful
for general use cases.
You can emulate such formatting like follows:

import std.array, std.format, std.stdio;
import std.range, std.uni;
void main()
{
    auto strs = ["It's", "\"world\""];
    {
        // emulation of !?"
        auto w = appender!string();
        foreach (s; strs)
            formatStrWithEscape(w, s, '"');
        writeln(w.data);
    }
    {
        // emulation of !?'
        auto w = appender!string();
        foreach (s; strs)
            formatStrWithEscape(w, s, '\'');
        writeln(w.data);
    }
}
void formatStrWithEscape(W)(W writer, string str, char quote)
{
    writer.put(quote);
    foreach (dchar c; str)
        formatChar(writer, c, quote);
    writer.put(quote);
}
// copy from std.format
void formatChar(Writer)(Writer w, in dchar c, in char quote)
{
    if (std.uni.isGraphical(c))
    {
        if (c == quote || c == '\\')
            put(w, '\\'), put(w, c);
        else
            put(w, c);
    }
    else if (c <= 0xFF)
    {
        put(w, '\\');
        switch (c)
        {
        case '\a':  put(w, 'a');  break;
        case '\b':  put(w, 'b');  break;
        case '\f':  put(w, 'f');  break;
        case '\n':  put(w, 'n');  break;
        case '\r':  put(w, 'r');  break;
        case '\t':  put(w, 't');  break;
        case '\v':  put(w, 'v');  break;
        default:
            formattedWrite(w, "x%02X", cast(uint)c);
        }
    }
    else if (c <= 0xFFFF)
        formattedWrite(w, "\\u%04X", cast(uint)c);
    else
        formattedWrite(w, "\\U%08X", cast(uint)c);
}

I can agree changing private functions in std.format, e.g. formatChar, to public undocumented, but cannot agree adding such complicated rule into supported format specifier.

Kenji Hara
April 24, 2012
On Tuesday, 24 April 2012 at 04:55:34 UTC, kenji hara wrote:
> My concern is that the proposal is much complicated and less useful
> for general use cases.
> You can emulate such formatting like follows:

IMHO addition of %!+s and %!-s alone and removing %c's magic will only simplify formatting for the user. It was hard (for me) to understand current escaping rules because it's undocumented and looks dissonant (for me) because of the fact that escaping is a part of formatting but user is unable to control it unless magical %c is used.

I agree that !', !", and !?* of course aren't commonly used as I have already written. Personally I don't need them at all.

But this is a common pattern for me: `xformat("My pets: %(%!-s, %)", petsAsStrings)`. And "My pets: %(%(%c%), %)" is too complicated, dissonant and not general (will not work if I'll give it pets as int[] e.g.) that I never use it. I use `.joiner(", ")` instead and every time I do it I think that something is really wrong with array formatting in Phobos.

--
Денис В. Шеломовский
Denis V. Shelomovskij