Jump to page: 1 2 3
Thread overview
[Issue 5354] New: formatValue: range templates introduce 3 bugs related to class & struct cases
Dec 15, 2010
Denis Derman
Dec 15, 2010
Denis Derman
Dec 15, 2010
Nick Voronin
Dec 15, 2010
Nick Voronin
Dec 15, 2010
Nick Voronin
Dec 15, 2010
Denis Derman
Dec 15, 2010
Nick Voronin
Dec 15, 2010
Denis Derman
Dec 16, 2010
Denis Derman
Dec 30, 2010
Rob Jacques
Jan 23, 2011
Denis Derman
Jan 24, 2011
Rob Jacques
Jan 24, 2011
Max Samukha
Jan 24, 2011
Denis Derman
Jan 24, 2011
Max Samukha
Oct 20, 2011
Kenji Hara
Jun 12, 2012
Kenji Hara
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354

           Summary: formatValue: range templates introduce 3 bugs related
                    to class & struct cases
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: denis.spir@gmail.com


--- Comment #0 from Denis Derman <denis.spir@gmail.com> 2010-12-15 01:36:05 PST ---
formatValue: range templates introduce 3 bugs related to class & struct cases

This issue concerns class case, the struct case, and the 3 range cases of the set of formatValue templates in std.format. As this set is currently written and commented (1), it seems to be intended to determine the following cases (about class/struct/range only):

* An input range is formatted like an array.
* A class object is formatted using toString.
* A struct is formatted:
    ~ using an input range interface, if it implements one,
    ~ using toString, if it defines it,
    ~ in last resort, using the type's 'stringof' property.

To be short: I think the right thing to do is to remove range cases. Explanations, details, & reasoning below.

In the way the set of templates is presently implemented, and because of how template selection works (as opposed to inheritance, eg), the following 3 bugs come up:

1. When a class defines an input range, compiler-error due to the fact that
both class and input range cases match:
    /usr/include/d/dmd/phobos/std/format.d(1404): Error: template
std.format.formatValue(Writer,T,Char) if (is(const(T) == const(void[])))
formatValue(Writer,T,Char) if (is(const(T) == const(void[]))) matches more than
one template declaration,
/usr/include/d/dmd/phobos/std/format.d(1187):formatValue(Writer,T,Char) if
(isInputRange!(T) && !isSomeString!(T) && isSomeChar!(ElementType!(T))) and
/usr/include/d/dmd/phobos/std/format.d(1260):formatValue(Writer,T,Char) if
(is(T == class))
This, due to inheritance from Object, even if no toString is _explicitely_
defined.

2. For a struct, a programmer-defined output format in toString is shortcut if ever the struct implements a range interface!

3. If a range's element type (result type of front) is identical to the range's own type, writing runs into an infinite loop... This is well possible, for instance a textual type working like strings in high-level/dynamic languages (a character is a singleton string).

To solve these bugs, I guess the following changes would have to be done:
* The 3 ranges case must have 2 additional _negative_ constraints:
    ~ no toString defined on the type
    ~ (ElementType!T != T)
* The struct case must be split in 2 sub-cases:
    ~ use toString if defined
    ~ [else use range if defined, as given above]
    ~ if neither toString nore range, use T.stringof

I have tried to implement and test this modif, but ran into build errors (seemingly unrelated, about isTuple) I could not solve.

Now, I think it is worth wondering whether all these complications, only to
have _default_ formatValue's for input ranges, is worth it at all. On one hand,
in view of the analogy, it looks like a nice idea to have them expressed like
arrays. On the other, when can this feature be useful?
An first issue comes up because there is no way, AFAIK, to tell apart inherited
and explicite toString methods of classes: is(typeof(val.toString() == string))
is always true for a class. So that the range case would never be triggered for
classes -- only for structs.
So, to use this feature, (1) the type must be a struct (2) which defines no
toString (3) whch implements a range interface, and (4) the range's element
type must not be the range type itself. In addition, the most sensible output
form for it should be precisely the one of an array.
Note that unlike for structs, programmers cannot define custom forms of array
output ;-) This is the reason why a default array format is so helpful -- but
this reason does not exist for structs, thank to toString (and later writeTo).
If no default form exists for ranges, then in the rare cases where a programmer
would implement a range interface on a struct _and_ need to re-create an
array-like format for it, this takes a few lines in toString, for instance:
    string toString () {
        string[] contents = new string[this.elements.length];
        foreach (i,e ; this.elements)
            contents[i] = to!string(this.elements[i]);
        return format("[%s]", join(contents, ", "));
    }

As a conclusion, I would recommend to get rid of the (3) range cases in the set of formatValue templates. (This would directly restore correctness, I guess --showing that range cases where probably added later.)

(1) There is at least a doc/comment error, namely for the struct case (commentted as AA instead). Also, the online doc does not hold template constraints, so that it is not possible to determine which one is selected in given situations.


Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #1 from Denis Derman <denis.spir@gmail.com> 2010-12-15 01:45:30 PST ---
started thread: http://lists.puremagic.com/pipermail/digitalmars-d/2010-December/090043.html

I marked the bug(s) with keyword 'spec', as it depends on: how do we want struct/class/range formatting semantics to be?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354


Nick Voronin <elfy.nv@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |elfy.nv@gmail.com


--- Comment #2 from Nick Voronin <elfy.nv@gmail.com> 2010-12-15 09:59:49 PST ---
My thoughts:

1. Using Range interface for formatting classes and structs is a good thing and should stay.

2. There is a conflict of priority between using toString and iterating through a Range. It's worse for classes where toString is always present and can't be used to deduce programmer's intent. IMHO it's more important to keep things uniform, than to make best guess in every case, so iterating through range must have priority over using toString. At least unless there is more direct way to tell what's programmer's intent about default formatting of struct or class.

3. Range with (ElementType!T == T) must be either detected throughout all library as a special case or not detected as a Range at all. I'm under impression that algorithms (not just formatting routines) expect that front() yields some value. This value /may/ be another Range, there may be hierarchical structures containing Ranges of Ranges, yet this hierarchy is expected to be finite, so full traversal of it is possible. I expect there are more trouble waiting to happen with Ranges like that if they go generally undetected. I may be wrong here, it would be great to have someone with knowledge of both current practice and original intent clarify this matter.

4.
> Also, the online doc does not hold template constraints, so that it is not possible to determine which one is selected in given situations.
+1!

5. attached a testcase of various combination (class|struct, normal range|recursive range|no range, has override for toString|no override toString) and patch which makes all cases compile and print uniform output for struct and class. For this case changes are really very simple, constraints still look manageable, and one can still enjoy specific formatting for ranges.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #3 from Nick Voronin <elfy.nv@gmail.com> 2010-12-15 10:02:09 PST ---
Created an attachment (id=849)
proposed patch

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #4 from Nick Voronin <elfy.nv@gmail.com> 2010-12-15 10:03:24 PST ---
Created an attachment (id=850)
testcase for various types

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #5 from Denis Derman <denis.spir@gmail.com> 2010-12-15 10:45:23 PST ---
(In reply to comment #2)
> My thoughts:
> 
> 1. Using Range interface for formatting classes and structs is a good thing and should stay.

Why? Please criticise my arguments above, especially:
* Formatting a type exactly according to the builtin default format of an array
has no reason to be a common case. Note that a range interface is only _one
aspect_ of a type.
* Even in this case, writing a 3-4 line toString is not a big deal.
* Introducing default array-like formatting for ranges also introduces semantic
and implementation issues & complication of the code base.

> 2. There is a conflict of priority between using toString and iterating through a Range. It's worse for classes where toString is always present and can't be used to deduce programmer's intent. IMHO it's more important to keep things uniform, than to make best guess in every case, so iterating through range must have priority over using toString. At least unless there is more direct way to tell what's programmer's intent about default formatting of struct or class.

No! _Default_ range interface formatting cannot have priority over
_explicitely_ defined formatting by the programmer. This is a serious
conceptual bug. A programmer who defines toString *wants* it to be used, else
why would one define it at all? You take here precedence considerations upside
down.
[See also (*) below.]

> 3. Range with (ElementType!T == T) must be either detected throughout all library as a special case or not detected as a Range at all. I'm under impression that algorithms (not just formatting routines) expect that front() yields some value. This value /may/ be another Range, there may be hierarchical structures containing Ranges of Ranges, yet this hierarchy is expected to be finite, so full traversal of it is possible. I expect there are more trouble waiting to happen with Ranges like that if they go generally undetected. I may be wrong here, it would be great to have someone with knowledge of both current practice and original intent clarify this matter.

Agreed. In addition to my example above (of a string type behaving like in most
high-level languages): common forms of link-list, tree, graph hold nodes which
themselves are lists, trees, graphs.
They must be properly considered as ranges. This special case needs not be
detected, I guess. The bug is not due to their recursive nature (else we could
never write out a tree ;-), but lies somewhere in D's current writing algorithm
for ranges (*). Indeed, the recursive call should end some day, namely on
terminal nodes...
Actually, in such cases of recursive range, I would simply recommand toString
to be defined [because leaf nodes must end formatting recursion, again see
(*)]. And default range formatting should neven be used.

> 4.
> > Also, the online doc does not hold template constraints, so that it is not possible to determine which one is selected in given situations.
> +1!
> 
> 5. attached a testcase of various combination (class|struct, normal range|recursive range|no range, has override for toString|no override toString) and patch which makes all cases compile and print uniform output for struct and class. For this case changes are really very simple, constraints still look manageable, and one can still enjoy specific formatting for ranges.

(*) The bug seems to be similar to left-recursive PEG-parsing: when I try to write out a struct object implementing the input range interface, I get an "infinite" series of '[', then segfault. The error seems to be writing out the opening character '[' for each nesting level before having computed the whole string at this level -- which can be empty ot otherwise end the recursion. Actually, more fondamentally, the error is precisely caused by ignoring the user-defined toString that would end recursion by a special, non-recursive, form for terminal elements (leaves). One more reason to respect programmer-defined toString instead of shortcutting it.

Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #6 from Nick Voronin <elfy.nv@gmail.com> 2010-12-15 11:46:24 PST ---
> > 1. Using Range interface for formatting classes and structs is a good thing and should stay.
> 
> Why? Please criticise my arguments above, especially:
> * Formatting a type exactly according to the builtin default format of an array
> has no reason to be a common case. Note that a range interface is only _one
> aspect_ of a type.

It looks for me that foremost property of Range is that it can be iterated and something can be accessed through it. It makes perfect sense that default formatting tries exactly this -- iterate and format what can be accessed. Now if we bundle data and Range interface together all kind of funny things happen. If we separate data and Range object -- everything makes sense. Data is stored in container which may or may not define toString, while Range only gives generic access to underlying data. Of course one may define toString for Range object, but if you think of a Range this way -- as a separate concept with limited purpose -- there is no need for it.

In a sense I disagree with the notion of "range interface is only _one aspect_ of a type." I think Range should be considered foremost aspect of a type... Well, just my opinion, of course. for me mixing Range interface with other things is not a good practice.

> * Even in this case, writing a 3-4 line toString is not a big deal.

True. But 3-4 line for every Range? Of course one may just provide template for currently default formatting of Ranges and let user decide what to use. Actually I think this is what the issue boils down to: we need proper way to define custom formatting which would be preferred over library generics if provided. Something of higher level than toString.

> * Introducing default array-like formatting for ranges also introduces semantic and implementation issues & complication of the code base.

I don't see it. Unability to override default formatting is an issue, yet default formatting in itself is a good thing.

> No! _Default_ range interface formatting cannot have priority over _explicitely_ defined formatting by the programmer.

I would totally agree with you if there was any way to distinguish overridden toString for classes from original one. I don't know one, so I place priority on uniformity, simplicity and predictability. Structs and classes behaving same way is a good thing.

> This is a serious conceptual bug.

I would say it just "conceptual". It's not pretty, it may be somewhat limiting ATM, but it's better than increasing complexity, generating more special cases, placing a burden on programmers for what should be provided by library automagically... (*) I mean it's way easier to cope with clearly stated limits that deal with mess of complex condition and special cases. Alternative would be cleaner design for whole system of object to string conversion.

(*) Note, default formatting is widely used inside of library for debugging purposes, it must deal with all sort of objects in uniform way and not place any requirements on code. When _programmer_ wants to format object he's free to call toString directly or even use custom method for converting. One or another way for defaults does not really limit programmer other than how he sees some debug messages.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #7 from Denis Derman <denis.spir@gmail.com> 2010-12-15 14:36:42 PST ---
(In reply to comment #6)
> > > 1. Using Range interface for formatting classes and structs is a good thing and should stay.
> > 
> > Why? Please criticise my arguments above, especially:
> > * Formatting a type exactly according to the builtin default format of an array
> > has no reason to be a common case. Note that a range interface is only _one
> > aspect_ of a type.
> 
> It looks for me that foremost property of Range is that it can be iterated and something can be accessed through it. It makes perfect sense that default formatting tries exactly this -- iterate and format what can be accessed. Now if we bundle data and Range interface together all kind of funny things happen. If we separate data and Range object -- everything makes sense. Data is stored in container which may or may not define toString, while Range only gives generic access to underlying data. Of course one may define toString for Range object, but if you think of a Range this way -- as a separate concept with limited purpose -- there is no need for it.
> 
> In a sense I disagree with the notion of "range interface is only _one aspect_ of a type." I think Range should be considered foremost aspect of a type... Well, just my opinion, of course. for me mixing Range interface with other things is not a good practice.
> 
> > * Even in this case, writing a 3-4 line toString is not a big deal.
> 
> True. But 3-4 line for every Range? Of course one may just provide template for currently default formatting of Ranges and let user decide what to use. Actually I think this is what the issue boils down to: we need proper way to define custom formatting which would be preferred over library generics if provided. Something of higher level than toString.
> 
> > * Introducing default array-like formatting for ranges also introduces semantic and implementation issues & complication of the code base.
> 
> I don't see it. Unability to override default formatting is an issue, yet default formatting in itself is a good thing.
> 
> > No! _Default_ range interface formatting cannot have priority over _explicitely_ defined formatting by the programmer.
> 
> I would totally agree with you if there was any way to distinguish overridden toString for classes from original one. I don't know one, so I place priority on uniformity, simplicity and predictability. Structs and classes behaving same way is a good thing.
> 
> > This is a serious conceptual bug.
> 
> I would say it just "conceptual". It's not pretty, it may be somewhat limiting ATM, but it's better than increasing complexity, generating more special cases, placing a burden on programmers for what should be provided by library automagically... (*) I mean it's way easier to cope with clearly stated limits that deal with mess of complex condition and special cases. Alternative would be cleaner design for whole system of object to string conversion.
> 
> (*) Note, default formatting is widely used inside of library for debugging purposes, it must deal with all sort of objects in uniform way and not place any requirements on code. When _programmer_ wants to format object he's free to call toString directly or even use custom method for converting. One or another way for defaults does not really limit programmer other than how he sees some debug messages.

Well, our views are clearly pointing to opposite directions and cannot
compromise.
First, you seem to consider ranges as types, while for me they are aspects of
types, implemented as parts of type interfaces. For me, they just play a role,
possibly among others.
I agree it's nice to have a default (array-like) output form for types that
happen to implement a range interface if, and only if, the programmer does not
specify any custom form. I also agree uniformity may be a nice _option_ in some
particuliar cases; as long as it is chosen by the programmer, not imposed. In
which proportion of cases will the default range format happily fit the
programmer's needs for a type that (also) implements the range interface? Say
you wraps a custom string type in a struct to provide specific functionality,
or a set of filenames and dirnames representing a dir structure, or a symbol
table; will it fit?
The case of ranges is completely different from the one of arrays, precisely.
First, because array types are types; second because array types can only be
that, there is no "array aspect" of a type that would also be something else;
third, because one cannot specify any output form of an array. For all these
reasons, D's default format for arrays is a great feature (and languages that
do not provide any such feature are painful). But none of these reasons apply
to range interfaces.

I agree the impossiblity to distinguish explicite and inherited toString for
classes is an issue. But for this reason, your choice is to ignore the
programmer's explicite intent in all other cases. I find this totally
inacceptable.
Firstly for debug, as you say, programmers want feedback output to be exactky
the way they state it to be; not in a default form that may by chance express
half of what they need in a form that more or less fits their wishes.

I don't understand you point about "Now if we bundle data and Range interface
together all kind of funny things happen." A type that implement a range always
holds data, usually provides many other features that just range/iteration, and
sometimes provides several ranges: for instance, a tree can hold differents
kind of data fields, expose various operations like inserting a subtree, and
have several ranges to iterates depth-first or breadth-first, or only on
leaves, etc.
Maybe an different point of view would be to find a way for the user to express
"use the range interface for output formatting".

(Now, basta.)

Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 16, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #8 from bearophile_hugs@eml.cc 2010-12-15 16:06:00 PST ---
For a different but related thing, see the Comment 8 of bug 3813:

http://d.puremagic.com/issues/show_bug.cgi?id=3813#c8

It says that I prefer lazy sequences to be printed in a way different from arrays, for example:

[0; 1; 2; 3; 4]

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 16, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5354



--- Comment #9 from Denis Derman <denis.spir@gmail.com> 2010-12-15 22:47:49 PST ---
(In reply to comment #8)
> For a different but related thing, see the Comment 8 of bug 3813:
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=3813#c8
> 
> It says that I prefer lazy sequences to be printed in a way different from arrays, for example:
> 
> [0; 1; 2; 3; 4]

+++

I would also find it better that ranges do not _exactly_ look like arrays.

Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2 3