Jump to page: 1 2
Thread overview
[Design] return char[] or string?
Jul 29, 2007
Stewart Gordon
Jul 29, 2007
Kirk McDonald
Jul 30, 2007
Regan Heath
Aug 21, 2007
Stewart Gordon
Aug 22, 2007
Regan Heath
Aug 22, 2007
Manfred Nowak
Aug 23, 2007
Regan Heath
Aug 23, 2007
Manfred Nowak
Aug 23, 2007
Regan Heath
Aug 23, 2007
Manfred Nowak
Aug 23, 2007
Stewart Gordon
Aug 23, 2007
Derek Parnell
July 29, 2007
While I haven't got into using D 2.x, I've already begun thinking about making libraries compatible with it.  On this basis, a design decision to consider is whether functions that return a string should return it as a char[] or a const(char)[].  (I use "string" with its general meaning, and "const(char)[]" to refer to that specific type.  Obviously for 1.0 compatibility, I'd have to use the "string" alias wherever I want const(char)[].)

Obviously, a function that takes a string as a parameter has to take in a const(char)[], to be able to accept a string literal or otherwise a constant string.  But what about the return type?

Looking through the 2.x version of std.string, they all return const(char)[] rather than char[].  (Except for those that return something else such as a number.)  This is necessary in most cases because of the copy-on-write policy.

But otherwise, it seems that both have their pros and cons.

There seem to be two cases to consider: libraries targeted specifically at D 2.x, and libraries that (attempt to) support both 1.x and 2.x.  At the moment, it's the latter that really matters.

Let's see.  The string-returning functions in my library more or less fall into these categories:
(a) functions that build a string in a local variable, which is then returned
(b) functions that return a copy of a member variable
(c) property setters and the like that simply pass the argument through
(d) functions that call a function in Phobos and return the result

In the case of (a), there is no obvious benefit to returning a const(char)[] rather than a char[].

Many of the cases of (b) are property getters.  If we have such things returning a const(char)[], then the getter no longer needs to copy the member variable.  Though versioning would be needed to implement this behaviour without causing havoc under 1.x.  The alternative, leaving them returning char[], leads to inconsistency with (c), which would have to return const(char)[].

That leaves (d), to which the obvious answer is to return whatever type the Phobos function returns.

On one hand, if the string is generated on the fly, and so altering it would not cause a problem, it seems wasteful to return a const(char)[] only for the caller to have to .dup it if it wants to modify it.

On the other hand, from the library user's point of view, it can be seen as a confusing inconsistency if some functions return char[] and others const(char)[], when no difference in the semantics of what's returned accounts for this.  It also borders on breaking the encapsulation principle, whereby internal implementation details should not be exposed in my library's API.

What do you people think?

Stewart. 

July 29, 2007
Stewart Gordon wrote:
> While I haven't got into using D 2.x, I've already begun thinking about making libraries compatible with it.  On this basis, a design decision to consider is whether functions that return a string should return it as a char[] or a const(char)[].  (I use "string" with its general meaning, and "const(char)[]" to refer to that specific type.  Obviously for 1.0 compatibility, I'd have to use the "string" alias wherever I want const(char)[].)
> 
> Obviously, a function that takes a string as a parameter has to take in a const(char)[], to be able to accept a string literal or otherwise a constant string.  But what about the return type?
> 
> Looking through the 2.x version of std.string, they all return const(char)[] rather than char[].  (Except for those that return something else such as a number.)  This is necessary in most cases because of the copy-on-write policy.
> 
> But otherwise, it seems that both have their pros and cons.
> 
> There seem to be two cases to consider: libraries targeted specifically at D 2.x, and libraries that (attempt to) support both 1.x and 2.x.  At the moment, it's the latter that really matters.
> 
> Let's see.  The string-returning functions in my library more or less fall into these categories:
> (a) functions that build a string in a local variable, which is then returned
> (b) functions that return a copy of a member variable
> (c) property setters and the like that simply pass the argument through
> (d) functions that call a function in Phobos and return the result
> 
> In the case of (a), there is no obvious benefit to returning a const(char)[] rather than a char[].
> 
> Many of the cases of (b) are property getters.  If we have such things returning a const(char)[], then the getter no longer needs to copy the member variable.  Though versioning would be needed to implement this behaviour without causing havoc under 1.x.  The alternative, leaving them returning char[], leads to inconsistency with (c), which would have to return const(char)[].
> 
> That leaves (d), to which the obvious answer is to return whatever type the Phobos function returns.
> 
> On one hand, if the string is generated on the fly, and so altering it would not cause a problem, it seems wasteful to return a const(char)[] only for the caller to have to .dup it if it wants to modify it.
> 
> On the other hand, from the library user's point of view, it can be seen as a confusing inconsistency if some functions return char[] and others const(char)[], when no difference in the semantics of what's returned accounts for this.  It also borders on breaking the encapsulation principle, whereby internal implementation details should not be exposed in my library's API.
> 
> What do you people think?
> 
> Stewart.

It's a question of ownership. If the function is returning a new string, and giving ownership of that string to the caller, then it should return a char[]. If the function is returning a string which the caller is merely borrowing, it should return a const(char)[]. In most cases, thinking of things this way causes the return type to be obvious.

And, of course, you can always convert a char[] to a const(char)[].

In (a), the function is returning a new string to the caller; it should return char[].

(b) should usually return const(char)[], unless of course you want the caller to mutate the string. If you're going through the trouble of wrapping a member with a getter/setter, then that probably means you don't want the user messing with it directly.

The other cases are less clear, and will vary from function to function.

-- 
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org
July 30, 2007
Some comments inline...

Kirk McDonald wrote:
> Stewart Gordon wrote:
>> While I haven't got into using D 2.x, I've already begun thinking about making libraries compatible with it.  On this basis, a design decision to consider is whether functions that return a string should return it as a char[] or a const(char)[].  (I use "string" with its general meaning, and "const(char)[]" to refer to that specific type.  Obviously for 1.0 compatibility, I'd have to use the "string" alias wherever I want const(char)[].)
>>
>> Obviously, a function that takes a string as a parameter has to take in a const(char)[], to be able to accept a string literal or otherwise a constant string.  But what about the return type?

It's a pity D cannot differentiate string literals and place those passed as char[] (mutable) parameters in RAM.  Obviously it would have to create a seperate one for each and every use.

>> Looking through the 2.x version of std.string, they all return const(char)[] rather than char[].  (Except for those that return something else such as a number.)  This is necessary in most cases because of the copy-on-write policy.

True, however when you perform 'copy on write' you get a copy of the original and that copy is unique and owned by the copier and therefore can be mutable, or in other words char[] not const(char)[].

>> But otherwise, it seems that both have their pros and cons.
>>
>> There seem to be two cases to consider: libraries targeted specifically at D 2.x, and libraries that (attempt to) support both 1.x and 2.x.  At the moment, it's the latter that really matters.
>>
>> Let's see.  The string-returning functions in my library more or less fall into these categories:
>> (a) functions that build a string in a local variable, which is then returned
>> (b) functions that return a copy of a member variable
>> (c) property setters and the like that simply pass the argument through
>> (d) functions that call a function in Phobos and return the result
>>
>> In the case of (a), there is no obvious benefit to returning a const(char)[] rather than a char[].
>>
>> Many of the cases of (b) are property getters.  If we have such things returning a const(char)[], then the getter no longer needs to copy the member variable.  Though versioning would be needed to implement this behaviour without causing havoc under 1.x.  The alternative, leaving them returning char[], leads to inconsistency with (c), which would have to return const(char)[].
>>
>> That leaves (d), to which the obvious answer is to return whatever type the Phobos function returns.
>>
>> On one hand, if the string is generated on the fly, and so altering it would not cause a problem, it seems wasteful to return a const(char)[] only for the caller to have to .dup it if it wants to modify it.

Indeed and some Phobos function are doing this, it has been a source of irritation for me since the inception of 'const'.

>> On the other hand, from the library user's point of view, it can be seen as a confusing inconsistency if some functions return char[] and others const(char)[], when no difference in the semantics of what's returned accounts for this.  It also borders on breaking the encapsulation principle, whereby internal implementation details should not be exposed in my library's API.

I think perhaps providing more than one overload could help lessen confusion, things like having:

char[] tolowerInplace(char [] s)

in addition to the standard tolower.

> It's a question of ownership. If the function is returning a new string, and giving ownership of that string to the caller, then it should return a char[]. If the function is returning a string which the caller is merely borrowing, it should return a const(char)[]. In most cases, thinking of things this way causes the return type to be obvious.
> 
> And, of course, you can always convert a char[] to a const(char)[].

This is how I tend to think about it also.

> In (a), the function is returning a new string to the caller; it should return char[].
> 
> (b) should usually return const(char)[], unless of course you want the caller to mutate the string. If you're going through the trouble of wrapping a member with a getter/setter, then that probably means you don't want the user messing with it directly.
> 
> The other cases are less clear, and will vary from function to function.

As I mentioned above I have been repeatedly annoyed by a number of Phobos string functions since the introduction of 'const'.

I think in some cases we need to rethink some of the functions and how they work in order to provide a more 'const' aware/friendly library.

Example "string[] split(in string s)" in std.string.

If the input is char[] then this function essentially casts the input to const and if I want to perform further modification of the input I now have to dup the results.

In a sense this function 'takes ownership' of the input and does not give it back again.

I think in this case split should be templated.  If the input is char[] the result should be char[][], if the input is string the result should be string[].

This works fine for cases where the input is not ever copied, but in cases where it is conditionally copied, "string tolower(string s)" in std.string for example.

It cannot know ahead of time whether it's going to need to 'copy on write' so simply templating it doesn't help, however I suggested a possible templated solution which dups only in the case where the input is 'string':

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55337

I figure if want a copy of the input you can manually dup the parameter you pass.

Another solution (also hinted at above) may be to provide more than one overload, you might do this where you cannot easily template a solution to efficiently handle the common case for each input type (mutable/const).

As for your cases mentioned above...

I would probably implement (c), a property setter, as code that sets the member followed by a call to the getter so it would return the same as (b).  That said I haven't written a lot of these so perhaps my experience using them isn't sufficient.

Is there some reason you'd rather return char[] from a setter?

I'm hoping in the case of (d) that phobos will change or provide more overloads to handle the different use-cases.

Regan
August 21, 2007
Regan Heath Wrote:

<snip>
> As for your cases mentioned above...
> 
> I would probably implement (c), a property setter, as code that sets the member followed by a call to the getter so it would return the same as (b).  That said I haven't written a lot of these so perhaps my experience using them isn't sufficient.

I've never really liked this idea.  In general, either it would just return the same string that was passed in, IWC there's no point calling the getter rather than simply returning the argument, or there would be a performance hit where the return value isn't used.

Much better would be if D would chain property assignments implicitly:

http://www.digitalmars.com/d/archives/digitalmars/D/10199.html

If only Walter would finally answer this request (among many others)!

Stewart.
August 22, 2007
Stewart Gordon wrote:
> Regan Heath Wrote:
> 
> <snip>
>> As for your cases mentioned above...
>> 
>> I would probably implement (c), a property setter, as code that
>> sets the member followed by a call to the getter so it would return
>> the same as (b).  That said I haven't written a lot of these so
>> perhaps my experience using them isn't sufficient.
> 
> I've never really liked this idea.  In general, either it would just
> return the same string that was passed in, IWC there's no point
> calling the getter rather than simply returning the argument

I'd hope the call to the getter would be inlined.

The point I see is consistency, the getter might return the stored value with some sort of modification, perhaps due to a change in required functionality at some point, or perhaps because more than one getter uses the same data member.

>, or
> there would be a performance hit where the return value isn't used.

Yeah, that's always going to be a problem.  It's a pity we cannot overload on return type.

> Much better would be if D would chain property assignments
> implicitly:
> 
> http://www.digitalmars.com/d/archives/digitalmars/D/10199.html
> 
> If only Walter would finally answer this request (among many others)!

Yeah, this is another of those cases where a property doesn't quite work the same as a plain old data member, p.property += x; being the more common one.

Regan
August 22, 2007
Regan Heath wrote

> Yeah, this is another of those cases where a property doesn't quite work the same as a plain old data member, p.property += x; being the more common one.

Again I do not see the deeper reason for a whole discussion. This time this discussion about properties. Properties _are_ restricted. If one do not want this restrictions, one can use a class instead.

-manfred
August 23, 2007
Manfred Nowak wrote:
> Regan Heath wrote
> 
>> Yeah, this is another of those cases where a property doesn't
>> quite work the same as a plain old data member, p.property += x;
>> being the more common one.
> 
> Again I do not see the deeper reason for a whole discussion. This time this discussion about properties. Properties _are_ restricted. If one do not want this restrictions, one can use a class instead.

http://www.digitalmars.com/d/property.html
"Properties are member functions that can be syntactically treated as if they were fields"

I was under the impression the main benefit to properties was being able to replace an existing field (one in use by some user code) with a property and have it work without user code changes.

eg.

---------BEFORE--------

class A
{
  public int a;
}

void foo(ref int i) {}

void main()
{
	A a = new A();
        int b;

        b = a.a = 5;
	a.a += 1;
	a.a++;
	foo(a.a);
}

---------AFTER---------

<after>
class A
{
  int _a;
  public int a() { return _a; }
  public int a(int _aa) { _a = _aa; return a(); }
}

void main()
{
	A a = new A();
        int b;

        b = a.a = 5;  //error
	a.a += 1;     //error
	a.a++;        //error
        foo(a.a);     //error
}

Sadly there are plenty of cases where a property cannot be "syntactically treated as a field" but needs a completely different syntax.

Sure, there are other benefits for properties like performing some complex calculation on the input to the setter, or error checking it, or whatever but I don't think this is the core benefit to properties as these can be achieved with plain old methods i.e. set<Propertyname>

Something that would solve 3 of the errors above is the ability to return by 'ref', eg.

class A
{
  int _a;
  public ref int a() { return _a; }
  public ref int a(int _aa) { _a = _aa; return a(); }
}

void main()
{
	A a = new A();
        int b;

        b = a.a = 5;  //error
	a.a += 1;     //ok
	a.a++;        //ok
        foo(a.a);     //ok
}

The problem with the remaining error is that a setter might take and return 2 different types, as mentioned here:
  http://www.digitalmars.com/d/archives/digitalmars/D/10199.html

Regan
August 23, 2007
Regan Heath wrote

> "Properties are member functions that can be syntactically treated as if they were fields"
> 
> I was under the impression the main benefit to properties was being able to replace an existing field (one in use by some user code) with a property and have it work without user code changes.

For me the definition implies that fields can replace properties, but no property can replace a field.

I read the definition above like this:
: if a member function can be syntactically treated as a field, then
: it is a property
i.e.,  properties have less syntactical power than fields.

Because member functions with at most one formal parameter can be treated as fields in assignments, i.e. without the parentheses, D has properties according to the definition in the docs.

All those recognizable errors in yours and Stewart's examples are based on the wrong assumption, that properties are more powerful than fields.

-manfred



August 23, 2007
Manfred Nowak wrote:
> Regan Heath wrote
> 
>> "Properties are member functions that can be syntactically treated
>> as if they were fields"
>>
>> I was under the impression the main benefit to properties was
>> being able to replace an existing field (one in use by some user
>> code) with a property and have it work without user code changes.
> 
> For me the definition implies that fields can replace properties, but no property can replace a field.
> 
> I read the definition above like this:
> : if a member function can be syntactically treated as a field, then : it is a property

Sure, you can reverse the definition if you like.  The end result is that as there are no member functions in D which can be syntactically treated as if they were fields (in all cases) then D does not have properties by this definition.

Note I use "(in all cases)" above, that is the assumption I am making, if any.  I believe this assumption is implied by the definition, just as I believe your assumption below is not.

> i.e.,  properties have less syntactical power than fields.

This is certainly true, but how do you draw that conclusion from the definition? (this is not the assumption to which I refer above)

> Because member functions with at most one formal parameter can be treated as fields in assignments, i.e. without the parentheses, D has properties according to the definition in the docs.

It seems you are giving us a new defintion for properties in D which reads something like:

"properties are member functions with at most one formal parameter (which) can be treated as fields in assignments only"


Lets take a look at the full text of the paragraph in the docs containing the definition of a property:

"Properties are member functions that can be syntactically treated as if they were fields. Properties can be read from or written to. A property is read by calling a method with no arguments; a property is written by calling a method with its argument being the value it is set to."

Where does the definition mention only supporting assignments? (this is your assumption)

Where does it mention "at most one formal parameter"?

> All those recognizable errors in yours and Stewart's examples are based on the wrong assumption, that properties are more powerful than fields.

It's possible my assumption "that properties should be treated syntactically like fields (in all cases)" is incorrect but given the core purpose of properties (to replace fields seamlessly) I don't think it's an outrageous assumption to make.

I am simply expecting to be able to treat properties as fields (syntactically speaking) which is (almost to the letter/word) exactly what the definition states.

We can quible over the definition all we like and frankly I don't really care to.  We all know the docs are seldom precisely defined nor do they necessarily reflect reality.

The simple fact remains that just about every new D user writes:

char[] p;
p.length += 5;

and expects the 'length' "property" to be incremented by 5.

Unless the error cases mentioned are supported by properties then the core reason for having properties (being able to seamlessly refactor replacing a field with a property) is null and void as it will regularly result in changes being required to user code.

Properties are not as useful as many of us wish they were.

I assume you agree that it would be quite nice to be able to use properties in the error cases listed?

Regan
August 23, 2007
Regan Heath wrote
> I assume you agree that it would be quite nice to be able to use properties in the error cases listed?

Not exactly properties, but one should be able to iron out those errors. As I stated before, one can use an inner class:

class A{
  private:
   int _a;
  public:
   Property a;
   this(){ a= new Property;}
   class Property{
     int opCall() { return _a; }
     int opAssign(int assgn) { _a= assgn; return opCall(); }
     int opAddAssign( int add){ _a+= add; return opCall();}
     int opPostInc(){ int tmp= _a++; return tmp;}
   }
}
void foo(inout A.Property i) {} // wart!!
void main(){
	A a = new A();
         int b;
         b = a.a = 5;
	a.a += 1;
	a.a++;
         foo(a.a);
}

In the examples given here two warts are remaining:
- classes cannot derive from basic types, therefore the type of the
formal parameter of `foo' has to be changed
- Stewarts point stays unhandled.

In fact the first wart may be closed by allowing something like `class Property: int' or `alias int Property'.

Stewarts point does not need overloading by return type, a conditional or lazy return `return? <expr>' would be enough. Where `return?' has the semantics to not be evaluated if the value of the expression `<expr>' is not needed or fed as an actual parameter to a function at a position in the formal parameter list, where a `lazy' parameter is declared.

-manfred
« First   ‹ Prev
1 2